Commit Graph

2342 Commits

Author SHA1 Message Date
Haiwei Li fa2b8fcfbe doc: add module design for some defines in hwmgmt_page
GAI Tooling Notice: These contents may have been developed with support from one
or more generative artificial intelligence solutions.

ACRN hypervisor is decomposed into a series of components and modules. The
module design in hypervisor is to add inline doxygen style comments above
functions, macros, structures, etc.

This patch is to add comments for some elements in hwmgmt_page module.

Tracked-On: #8665

Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-08-01 14:50:27 +08:00
Jiaqing Zhao 2dc56a8f23 hv: add GUEST_FLAG_STATELESS flag
GUEST_FLAG_STATELESS indicates guest is running a stateless operating
system and need to be shutdown forcefully without data loss. This flag
is only appalicable to pre-launched VM. For TEE_VM, this flag will be
set implicitly.

Tracked-On: #8671
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-30 09:26:50 +08:00
Haiwei Li c4ea248bc9 hv: remove Service VM delayed loading
Now multiboot modules memory is already reserved from e820 in function
`alloc_mods_memory()` and Service VM will not corrupt pre-launched VM
modules.

So remove the code of Service VM delayed loading.

Tracked-On: #8652
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-07-18 11:26:49 +08:00
Haiwei Li 529ade37a4 config_tools: support vUART Timer pCPU configuration
This patch is to allow user to pin vUART timer to specific pCPU via ACRN
config tool. User can configure by setting "vUART timer pCPU ID" under
Hypervisor->Advanced Parameters.

Tracked-On: #8648
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-07-10 10:26:21 +08:00
Gao, Shiqing 0bcf469758 hv: vtd: fix use of uninitialized variable in dmar_free_irte
This patch fixes the following error:
  error: variable 'sid' is used uninitialized whenever 'if' condition is true
  [-Werror,-Wsometimes-uninitialized]

Tracked-On: #861

Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
2024-07-03 14:55:43 +08:00
YuanXin-Intel e4b1584577 Change Service VM to supervisor role
1. Enable Service VM to power off or restart the whole platform even when RTVM is running.
2. Allow Service VM stop the RTVM using acrnctl tool with option "stop -f".
3. Add 'Service VM supervisor role enabled' option in ACRN configurator

Tracked-On: #8618

Signed-off-by: YuanXin-Intel <xin.yuan@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-06-28 13:35:07 +08:00
Haiwei Li 3d6ca845e2 hv: s3: add timer support
When resume from s3, Service VM OS will hang because timer interrupt on
BSP is not triggered. Hypervisor won't update physical timer because
there are expired timers on pcpu timer list.

Add suspend and resume ops for modules that use timers.

This patch is just for Service VM OS. Support for User VM will be added
in the future.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 81935737ff hv: s3: reset vm after resume
Now only BSP is reset. After Service VM OS resumes from s3, APs'
apic_base_msr are incorrect with x2apic bit en.

To avoid incorrect states, do `reset_vm` after resume.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 9c139681f2 hv: s3: hwp: enable hwp after resume from s3
After Service OS resume from s3, an error occurs:

[3649827us][cpu=1][idle1][sev=2][seq=1749]:= Unhandled exception: 13 (General Protection)

[3658622us][cpu=1][idle1][sev=2][seq=1750]:
Host Registers:

[3664881us][cpu=1][idle1][sev=2][seq=1751]:=  Vector=0x000000000000000D  RIP=0x000000000040F9F0

[3674213us][cpu=1][idle1][sev=2][seq=1752]:=     RAX=0x0000000080003801  RBX=0x0000000001800800  RCX=0x0000000000000774

[3685787us][cpu=1][idle1][sev=2][seq=1753]:=     RDX=0x0000000000000000  RDI=0x0000000000000080  RSI=0x0000000000000000

[3697371us][cpu=1][idle1][sev=2][seq=1754]:=     RSP=0x0000000000616C18  RBP=0x0000000000616C38  RBX=0x0000000001800800

[3708947us][cpu=1][idle1][sev=2][seq=1755]:=      R8=0x0000000000000038   R9=0x0000000000000001  R10=0x00000000000003F8

[3720539us][cpu=1][idle1][sev=2][seq=1756]:=     R11=0x000000000000000D  R12=0x0000000000458245  R13=0x0000000000000000

[3732114us][cpu=1][idle1][sev=2][seq=1757]:=  RFLAGS=0x0000000000010202  R14=0x0000000000000000  R15=0x0000000000000000

[3743699us][cpu=1][idle1][sev=2][seq=1758]:= ERRCODE=0x0000000000000000   CS=0x0000000000000008   SS=0x0000000000000010

[3755305us][cpu=1][idle1][sev=2][seq=1759]:= CR2=0x0000000000000000

The error occurs in `msr_write(MSR_IA32_HWP_REQUEST, reg)`, when HWP is
not available.

This patch is to initialize HWP after resume.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li cdfd35ed3d hv: s3: enable lapic earlier
After Service VM OS resumes from s3, BSP starts APs asynchronously,
followed by IPIs to APs to resume tsc. This process takes place in
function `host_enter_s3`. While, APs' lapic are not ready to accept IPI
interrupt, so BSP fails to resume tsc.

So enable lapic earlier to make sure that APs are ready.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Jiaqing Zhao 53825c5cac e820: properly reserve memory for multiboot modules
In current implementation, if there are multiple continous 4k-aligned
modules, 0-sized e820 entries will be created between these regions.
And for non-4k-aligned modules, when two of them are located in one
page, the second memory range will not be reserved as it was not in
one e820 entry after the first is reserved, making it vulnerable.

This patch fixes it by marking the exact memory range of multiboot
modules as unusable first, then shrinking the e820 entries to page
boundary. If the module crosses multiple e820 entries, possibly due
to a buggy bootloader, hypervisor will panic immediately to prevent
modules getting corrupted.

Tracked-On: #8617
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-06-20 09:10:27 +08:00
Haiwei Li b31fcd3519 hv: cpuid: fix hybrid related cpuid error
Some cpuids will return invalid values on hybrid platform because of the
error in the pointer arithmetic. Add `(void *)` before
`cpu_cpuids.leaves`.

Leaf 0x14 is used to report Intel Processor Trace Enumeration and varies
between P-cores and E-cores on hybrid platform. So add it to
`hybrid_leaves`.

Tracked-On: #8608
Fixes: 59a8cc4c2 ("hv: cpuid: make leaf 0x4 per-cpu in hybrid architecture")
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-06-19 17:07:10 +08:00
andi6 46a860bf04 hv: fix using cpuid does not clear the upper 32-bit registers.
In HV, cpuid uses the lower 32 bits of rax\rbx\rcx\rdx registers to pass parameters,
But the software does not clear the upper 32-bit registers,  if the guest
uses 64-bit variables to pass parameters to cpuid,guest will use rax\rbx\rcx\rdx,
not eax\ebx\ecx\edx, the previous value of the high 32 registers will affect the guest.

Tracked-On: #8605
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: andi6 <andi6@xiaomi.com>
2024-06-19 15:35:26 +08:00
Haiwei Li b885d02396 hv: cpuid: add several leaf to per-cpu list in hybrid architecture
P-cores and E-cores accessing leaf 0x2U/0x14U/0x16U/0x18U/0x1A/0x1C/0x80000006U
will have different information in hybrid architecture.

So add them to per-cpu list in hybrid architecture and directly return
the physical value.

Note: 0x14U is hided and return 0.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Haiwei Li d6fe8b0892 hv: cpuid: make leaf 0x6 per-cpu in hybrid architecture
Leaf 0x6 returns thermal and power management information. In
hybrid architecture, P-cores and E-cores have different information.

Add leaf 0x6 to per-cpu list in hybrid architecture and handle specific
cpuid access.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Haiwei Li 59a8cc4c28 hv: cpuid: make leaf 0x4 per-cpu in hybrid architecture
Leaf 0x4 returns deterministic cache parameters for each level. In
hybrid architecture, P-cores and E-cores have different cache
information.

Add leaf 0x4 to per-cpu list in hybrid architecture and handle specific
cpuid access.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Haiwei Li f7506424e4 hv: cpuid: refactor per-cpu leaves definition
CPUID returns processor identification and feature information.
Different pcpus may return different infos. That is, the info is
per-cpu.

In hybrid architecture, per-cpu leaf is different from the previous. So
introduce a struct percpu_cpuids to indicate the per-cpu leaf. struct
percpu_cpuids will consist of two parts: generic percpu leaves and
hybrid related percpu leaves.

This patch is just to add generic percpu leaves.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Xin Zhang 7edf800f16 Expose CPUID leaf 0x1f to guest with patched x2APIC ID
CPUID leaf 1f is preferred superset of leaf 0b, currently ACRN exposes
leaf 0b but leaf 1f is empty so the 2 leaves mismatch, and so
application will follow the SDM to check 1f first.

Tracked-On: #8608
Signed-off-by: Xin Zhang <xin.x.zhang@intel.com>
2024-05-28 11:02:56 +08:00
Zhangwei6 ddfcb8c3fc hv: enable thermal lvt interrupt
This patch can fetch the thermal lvt irq and propagate
it to VM.

At this stage we support the case that there is only one VM
governing thermal. And we pass the hardware thermal irq to this VM.

First, we register the handler for thermal lvt interrupt, its irq
vector is THERMAL_VECTOR and the handler is thermal_irq_handler().

Then, when a thermal irq occurs, it flags the SOFTIRQ_THERMAL bit
of softirq_pending, This bit triggers the thermal_softirq() function.
And this function will inject the virtual thermal irq to VM.

Tracked-On: #8595

Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-16 09:40:32 +08:00
Zhangwei6 78243c3f49 hv: expose thermal MSRs to VM.
In this phase, we only use one VM to control thermal.
So we make thermal MSRs readable and writable by this VM.

This VM is flagged with GUEST_FLAG_VTM, and can
read/write thermal MSRs.
For the VMs not flagged with GUEST_FLAG_VTM,
can only read these thermal MSRs to get current status.

Tracked-On: #8595
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-16 09:40:32 +08:00
Yonghua Huang 7bcd9d783e hv: refine set_fs_base() function
Leave canary of stack protector untouched on pCPU
 if it has been initialized, instead of generating a new one.

Tracked-On: #8577
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2024-04-23 11:00:43 +08:00
Jiaqing Zhao f6bb15c85c hv: mmu: intiialize ppt_page_pool.bitmap in allocate_ppt_pages()
ppt_page_pool.bitmap should be zero-initialized. Also fixes the wrong
indention in allocate_ppt_pages().

Tracked-On: #8559
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2024-03-25 09:57:08 +08:00
Wu Zhou 29b3d03ac7 hv: vm_event: send event on triple fault handler
In the triple fault handler, post-launched VMs are instantly turned
off. Now a vm event is generated simultaneously. So that
developers can capture the event and decide what to do with it. (e.g.,
logging and populating diagnostics, or poweroff VM)

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-01 17:01:31 +08:00
Wu Zhou 581ec58fbb hv: vm_event: create vm_event support
This patch creates vm_event support in HV, including:
1. Create vm_event data type.
2. Add vm_event sbuf and its initializer. The sbuf will be allocated by
   DM in Service VM. Its page address will then be share to HV through
   hypercall.
3. Add an API to send the HV generated event.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-01 17:01:31 +08:00
Muhammad Qasim Abdul Majeed 3be3b394ad hypervisor: Fix spelling and grammar mistakes.
Tracked-On: #8533
Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-10-24 11:10:47 +08:00
Muhammad Qasim Abdul Majeed ce96ef6bae hypervisor: Fix spelling and grammar mistakes.
Tracked-On: #8533
Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-10-23 16:45:28 +08:00
Qiang Zhang 6a1d91c740 hv: sched: Add sched_params struct for thread parameters
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.

Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Wu Zhou 9a6e940849 hv: signal_event after make_request
make_request sets the request bit, and signal_event wakes the vcpu
thread. If we signal_event comes first, the target vCPU has a chance to
sleep again before processing the request bit.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Wu Zhou 064be1e3e6 hv: support halt in hv idle
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.

When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.

Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.

sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
                                         ^
                              wake---kick|

This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).

vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Yonghua Huang 791019edc5 hv: define a MACRO to indicate maximum memory size
~0UL is widely used to specify the maximum memory size
 when calling e820_alloc_memory(), this patch to define
 a MACRO for it to avoid using this magic number.

Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-12 13:52:48 +08:00
Qiang Zhang a4a73b5aac HV: emulate dummy multi-function dev in Service VM
For a pdev which allocated to prelaunched VM or owned by HV, we need to check
whether it is a multifuction dev at function 0. If yes we have to emulate a
dummy function dev in Service VM, otherwise the sub-function devices will be
lost in guest OS pci probe process.

Tracked-On: #8492
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang bf653d277b HV: init one dev config with service vm config param
When we do init_all_dev_config() in pci.c, the pdevs added to pci dev_config
will be exposed to Service VM or passthru to prelauched VM. The original code
would find service VM config in every pci pdev init loop, this is unnecessary
and definitely impact performance. Here we generate Service VM config pointer
with config tool so that init_one_dev_config() could refer service VM config
directly.

Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang deccb22ea8 hv: rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm
Rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm as
it not only checks whether the PCI device is allocated to a Pre-launched
VM but also associate it with Pre-launched VM's dev_config.

Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang aebc16e9e5 hv: fix Service VM EPT mapping upper bound
On some platforms, the last e820 entry may not be of type E820_TYPE_RAM,
such as E820_TYPE_ACPI_NVS which may also be used by Service VM.
So we need take all e820 entry types into account when finding the upper
bound of Service VM EPT mapping.

Tracked-On: #8495
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-05 11:09:46 +08:00
Muhammad Qasim Abdul Majeed a457e65619 doc: Fix spelling and typo mistakes.
Tracked-On: #8488

Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-09-05 09:34:21 +08:00
Jiaqing Zhao 7bfbdf04b8 doc: remove '@return None' for void functions
doxygen will warn that documented return type is found for functions
that does not return anything in 1.9.4 or later versions. 'None' is
not a special keyword in doxyge, it will recognize it as description
to the return value that does not exist in void functions.

Tracked-On: #8425
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-03 14:56:29 -07:00
Wu Zhou 8af2c263db hv: disable HFI and ITD for guests
The Hardware Feedback Interface (HFI) and Intel® Thread Director (ITD)
features require OS to provide a physical page address to
IA32_HW_FEEDBACK_PTR. Then the hardware will update the processor
information to the page address. The issue is that guest VM will program
its GPA to that MSR, causing great risk of tempering memory.

So HFI and ITD should be made invisible to guests, until we provide
proper virtulization of those features.

Tracked-On: #8463
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-01 14:57:23 +08:00
Wu Zhou fa97e32917 hv: bugfix: skip invalid ffs64 return value
ffs64() returns INVALID_BIT_INDEX (0xffffU) when it tries to deal with
zero input value. This may happen In calculate_logical_dest_mask() when
the guest tries to write some illegal destination IDs to MSI config
registers of a pt-device. The ffs64() return value is used as per_cpu
array index, and it would cause a page fault.

This patch adds protection to the per_cpu array, making this function
return zero on illegal value. As in logical destination's definition, a
zero logical designation would point to no CPU.

Fixes: 89d11d91e
Tracked-On: #8454
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-07-14 17:38:16 +08:00
Wu Zhou 89d11d91e2 hv: bugfix: fix the ptdev irq destination issue
According to SDM Vol3 11.12.10, in x2APIC mode, Logical Destination has
two parts:
  - Cluster ID (LDR[31:16])
  - Logical ID (LDR[15:0])
Cluster ID is a numerical address, while Logical ID is a 16bit mask. We
can only use Logical ID to address multi destinations within a Cluster.

So we can't just 'or' all the Logical Destination in LDR registers to
get one mask for all target pCPUs. This would get a wrong destination
mask if the target Destinations are from different Clusters.

For example in ADL/RPL x2APIC LDRs for core 2-5 are 0x10001 0x10100
0x20001 0x20100. If we 'or' them together, we would get a Logical
Destination of 0x30101, which points to core 6 and another core.
If core 6 is running a RTVM, then the irq is unable to get to
core 2-5, causing the guest on core 2-5 driver fail.

Guests working in xAPIC mode may use 'Flat Model' to select an
arbitrary list of CPUs as its irq destination. HV may not be able to
include them all when transfering to physical destinations, because
the HW is working in x2APIC mode and can only use 'Cluster Model'.

There would be no perfect fix for this issue. This patch is a simple
fix, by just keep the first Cluster of all target Logical Destinations.

Tracked-On: #8435
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-07-05 17:41:16 +08:00
Jiaqing Zhao e92320cf56 hv: allow non-service vm to read MSR_PLATFORM_INFO (CEh)
Guests bootloader may read MSR_PLATFORM_INFO to get TSC frequency for
time measurement, so injecting #GP on read may crash the vm on boot.
This patch emulates MSR_PLATFORM_INFO with 0, same behavior in kvm, to
tell the guest it's a virtualized environment.

Tracked-On: #8406
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-06-19 17:45:11 +08:00
Wu Zhou db83648a8d hv: hide thermal interface from guests
Thermal events are delivered through lapic thermal LVT. Currently
ACRN does not support delivering those interrupts to guests by
virtual lapic. They need to be virtualized to provide guests some
thermal management abilities. Currently we just hide thermal
lvt from guests, including:

1. Thermal LVT:
There is no way to hide thermal LVT from guests. But we need do
something to make sure no interrupt can be actually trigered:
  - skip thermal LVT in vlapic_trigger_lvt()
  - trap-and-emulate thermal LVT in lapic-pt mode

2. As We have plan to introduce virtualization of thermal monitor in the
future, we use a vm flag GUEST_FLAG_VTM which is default 0 to control
the access to it. So that it can help enabling VTM in the future.

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-15 20:36:44 +08:00
Wu Zhou c5d019b836 hv: emulate cpuids and MSRs for VHWP
Changes made by this patch includes:
1. Emulate HWP and pstate MSRs/CPUIDs. Those are exposed to guest when
   the GUEST_FLAG_VHWP is set:
    - CPUID[6].EAX[7,9,10]: MSR_IA32_PM_ENABLE(enabled by hv, always read
      1), MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
      MSR_IA32_HWP_STATUS,
    - CPUID[6].ECX[0]: MSR_IA32_MPERF, MSR_IA32_APERF
    - MSR_IA32_PERF_STATUS(read as base frequency when not owning pCPU)
    - MSR_IA32_PERF_CTL(ignore writes)
2. Always hide HWP interrupt and package control MSRs/CPUIDs:
    - CPUID[6].EAX[8]: MSR_IA32_HWP_INTERRUPT(currently ACRN is not able
      to deliver thermal LVT virtual interrupt to guests)
    - CPUID[6].EAX[11,22]: MSR_IA32_HWP_REQUEST_PKG, MSR_IA32_HWP_CTL

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-09 10:06:42 +08:00
Wu Zhou 2edf141047 hv: add VHWP guest flag and its helper func
Currently CPU frequency control is hidden to guests, and controlled
by hypervisor. While it is sufficient in most cases, some guest OS may
still need CPU performance info to make multi-core scheduling decisions.
This is seen on Linux kernel, which uses HWP highest performance level
as CPU core's priority in multi-core scheduling (CONFIG_SCHED_MC_PRIO).
Enabling this kernel feature could improve performance as single thread
workloads are scheduled on the highest performance cores. This is
significantly useful for guests with hybrid cores.

The concept is to expose performance interface to guest who exclusively
owns pCPU assigned to it. So that Linux guest can load intel_pstate
driver which will then provide the kernel with each core's schedule
priority.

Intel_pstate driver also relies on CONFIG_ACPI_CPPC_LIB to implement
this mechanic, this means we also need to provide ACPI _CPC in DM.

This patch sets up a guest flag GUEST_FLAG_VHWP to indicate whether
the guest can have VHWP feature.

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-09 10:06:42 +08:00
Jiaqing Zhao e4e25f076c hv: sgx: refactor partition_epc()
This patch refactors partition_epc() to make the code easier to
understand, also fixes the maybe-uninitialized warning for gcc-13.

Initializing 'vm_config' to get_vm_config(0) is okay here as scenario
validator ensures CONFIG_MAX_VM_NUM to be always larger than 0.

Tracked-On: #8413
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-06-06 15:22:19 +08:00
Jiaqing Zhao 770cf8c434 hv: emulate MSR_PLATFORM_INFO (17h)
This patch emulates the PLATFORM_INFO MSR in hypervisor to make it
only visible to Service VM, and only processor ratios (bit 15:8,
47:40 and 55:48) and sample part bit (27) are exponsed. This is
intended to prevent Service VM from changing processor parameters
like turbo ratio.

Tracked-On: #8406
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-05-30 15:10:05 +08:00
Qiang Zhang fcb8e9bb2d ptirq: Fix INTx assignment for Post-launched VM
When assigning a physical interrupt to a Post-launched VM, if it has
been assigned to ServiceVM, we should remove that mapping first to reset
ioapic pin state and rte, and build new mapping for the Post-launched
VM.

Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2023-04-13 12:24:57 +08:00
Zhangwei6 38f910e4fb hv: change the version format
The version info is mainly used to tell the user when and where the binary is
compiled and built, this will change the hv version format.

The hv follows the format:
major.minor-stable/unstable-remote_branch-acrn-commit_date-commit_id-dirty
DBG/REL(tag-current_commit_id) scenario@board build by author date.
The '(tag-current_commit_id)' is optional, it exits only when there are
tags for current commit.
e.g.
with tag:
ACRN:\>version
HV: 3.1-stable-release_3.1-2022-09-27-11:15:42-7fad37e02-dirty DBG(tag: v3.1)
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 07:02:37
without tag:
ACRN:\>version
HV: 3.2-unstable-master-2022-11-16-14:34:49-11f53d849-dirty DBG
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 06:49:44

Tracked-On #8303
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
2022-11-21 09:45:26 +08:00
Yuanyuan Zhao 0a4c76357e hv: hide mwait from guest.
When CPU support MONITOR/MWAIT, OS prefer to use it enter
deeper C-state.

Now ACRN pass through MONITOR/MWAIT to guest.

For vCPUs (ie vCPU A and vCPU B) share a pCPU, if vCPU A uses MWait to enter C state,
vCPU B could run only after the time slice of vCPU A is expired. This time slice of
vCPU A is gone to waste.

For Local APIC pass-through (used for RTVM), the guest pay more attention to
timeliness than power saving.

So this patch hides MONITOR/MWAIT by:
    1. Clear vCPUID.05H, vCPUID.01H:ECX.[bit 3] and
    MSR_IA32_MISC_ENABLE_MONITOR_ENA to tell the guest VM's vCPU
    does not support MONITOR/MAIT.
    2. Enable MSR_IA32_MISC_ENABLE_MONITOR_ENA bit for
    MSR_IA32_MISC_ENABLE inject 'GP'.
    3. Trap instruction 'MONITOR' and 'MWAIT' and inject 'UD'.
    4. Clear vCPUID.07H:ECX.[bit 5] to hide 'UMONITOR/UMWAIT'.
    5. Clear  "enable user wait and pause" VM-execution control, so
    UMONITOR/MWAIT causes an 'UD'.

Tracked-On: #8253
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2022-11-04 18:55:52 +08:00
Chenli Wei dcb0f05efc hv: refine the sworld memory allocate
The current code uses a predefined sworld memory array to reserve memory
for trusty VMs, and assume all post launched VMs are trusty VM which is
not correct.

This patch statically reserved memory just for trusty VMs and save 16M
memory for every non trusty VM.

Tracked-On: #6690
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2022-10-19 15:58:25 +08:00
Wu Zhou fbdc2774af hv: add ACRN CPU frequency initializer
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.

This patch is to implement the CPU frequency initializer, which will
setup CPU frequency base on the performance policy type.

Two performance policy types are provided for user to choose from:
  - 'Performance': CPU runs at its CPU runs at its maximum frequency.
    Enable hardware autonomous frequency selection if HWP is presented.
  - 'Nominal': CPU runs at its guaranteed frequency.

The policy type is passed to hypervisor through boot parameter, as
either 'cpu_perf_policy=Nominal' or 'cpu_perf_policy=Performance'.
The default type is 'Performance'.

Both HWP and ACPI p-state are supported. HWP is the first choice, for
it provides hardware autonomous frequency selection, while keeps
frequency transaction time low.

Two functions are added to the hypervisor to call:
  - init_frequency_policy(): called by BSP at start up time. It processes
    the boot parameters, and enables HWP if it is presented.
  - apply_frequency_policy(): called after init_frequency_policy().
    It applies initial CPU frequency policy setting for each core. It
    uses a set of frequency limits data struct to quickly decide what the
    highest/nominal frequency is. The frequency limits are generated by
    config-tools.

The hypervisor will not be governing CPU frequency after initial policy
is applied.

Cores running RTVMs are fixed to nominal/guaranteed frequency, to get
more certainty in latency. This is done by setting the core's frequency
limits to highest=lowest=nominal in config-tools.

Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-10-08 11:13:21 +08:00