1. Enable Service VM to power off or restart the whole platform even when RTVM is running.
2. Allow Service VM stop the RTVM using acrnctl tool with option "stop -f".
3. Add 'Service VM supervisor role enabled' option in ACRN configurator
Tracked-On: #8618
Signed-off-by: YuanXin-Intel <xin.yuan@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
To maximize the cpu utilization, core 0 is usually shared by service
vm and guest vm. But there are no statistics to show the cpu occupation
of each vm.
This patch is to provide cpu usage statistic for users. To calculate
it, a new trace event is added and marked in scheduling context switch,
accompanying with a new python script to analyze the data from acrntrace
output.
Tracked-On: #8621
Signed-off-by: nacui <na.cui@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Haiwei Li <haiwei.li@intel.com>
When resume from s3, Service VM OS will hang because timer interrupt on
BSP is not triggered. Hypervisor won't update physical timer because
there are expired timers on pcpu timer list.
Add suspend and resume ops for modules that use timers.
This patch is just for Service VM OS. Support for User VM will be added
in the future.
Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Now only BSP is reset. After Service VM OS resumes from s3, APs'
apic_base_msr are incorrect with x2apic bit en.
To avoid incorrect states, do `reset_vm` after resume.
Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Virtio legacy device (ver < 1.0) uses a single PIO for all virtqueues.
Notifications from different virtqueues are implemented by writing
virtqueue index to the PIO. Writing different values to the same addr
needs to be mapped to different eventfds by asyncio. This is called
data match feature of asyncio.
v3 -> v4:
* Update the definition of `struct asyncio_desc`
Use `struct acrn_asyncio_info` inside it, instaed of defining the duplicated
fileds.
* Update `add_asyncio` to use `memcpy_s` rather than assigning all the fields
using 5 assignment statements.
* Update `asyncio_is_conflict` for coding style
120-character line is sufficient to write all conditions.
* Update the checks related to `wildcard`
Because we require every conditional clause to have a Boolean type
in the coding guideline.
v2 -> v3:
No change
v1 -> v2:
No change
Tracked-On: #8612
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
This patch can fetch the thermal lvt irq and propagate
it to VM.
At this stage we support the case that there is only one VM
governing thermal. And we pass the hardware thermal irq to this VM.
First, we register the handler for thermal lvt interrupt, its irq
vector is THERMAL_VECTOR and the handler is thermal_irq_handler().
Then, when a thermal irq occurs, it flags the SOFTIRQ_THERMAL bit
of softirq_pending, This bit triggers the thermal_softirq() function.
And this function will inject the virtual thermal irq to VM.
Tracked-On: #8595
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In hcall_vm_intr_monitor(), the default case for intr_hdr->cmd is a
wrong case. So, it should return error code back. But it returns success
code 0 in current codes.
Tracked-On: #8580
Reviewed-by: Fei Li <fei1.li@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
sbuf_put copies sbuf->ele_size of data, and puts into ring. Currently
this function assumes that data size from caller is no less than
sbuf->ele_size.
But as sbuf->ele_size is usually setup by some sources outside of the HV
(e.g., the service VM), it is not meant to be trusted. So caller should
provide the max length of the data for safety reason. sbuf_put() will
return UINT32_MAX if max_len of data is less than element size.
Additionally, a helper function sbuf_put_many() is added for putting
multiple entries.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch creates vm_event support in HV, including:
1. Create vm_event data type.
2. Add vm_event sbuf and its initializer. The sbuf will be allocated by
DM in Service VM. Its page address will then be share to HV through
hypercall.
3. Add an API to send the HV generated event.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Per BVT (Borrowed Virtual Time) scheduler design, following per thread
parameters are required to tune scheduling behaviour.
- weight
The time sharing of a thread on CPU.
- warp
Boost value of virtual time of a thread (time borrowed from future) to
reduce Effective Virtual Time to prioritize the thread.
- warp_limit
Max warp time in one warp.
- unwarp_period
Min unwarp time after a warp.
As of now, only weight is in use to tune virtual time ratio of VCPU
threads from different VMs. Others parameters are for future extension.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.
Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.
When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.
Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.
sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
^
wake---kick|
This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).
vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When bvt scheduler picks up a thread to run, it sets up a counter
‘run_countdown’ to determine how many ticks it should remain running.
Then the timer will decrease run_countdown by 1 on every 1000Hz tick
interrupt, until it reaches 0. The tick interrupt consumes a lot of
power during idle (if we are using HLT in idle thread).
This patch is to switch the 1000 HZ timer to a dynamic one, which only
interrupt on run_countdown expires.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When VHWP enabled, return 0 and px_cnt = 0 on ACRN_PMCMD_GET_PX_CNT,
so that DM will write _CPC to guests' ACPI.
Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When assigning a physical interrupt to a Post-launched VM, if it has
been assigned to ServiceVM, we should remove that mapping first to reset
ioapic pin state and rte, and build new mapping for the Post-launched
VM.
Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_remapping_info records which physical interrupt is mapped to
the virtual interrupt in a VM. As we need to knonw whether a physical
sid has been mapped to a VM and whether a virtual sid in a VM has been
used, there should be two hash tables to link and iterate
ptirq_remapping_info:
- One is used to lookup from physical sid, linking phys_link.
- The other is used to lookup from virtual sid in a VM, linking virt_link
Without this patch, phys_link or virt_link from different
ptirq_remapping_info was linked by one hash list head if they got the
same hash value, as shown in following diagram.
When looking for a ptirq_remapping_info from physical sid, the original
code took all hash list node as phys_link and failed to get
ptirq_remapping_info linked with virt_link and later references
to its members are wrong.
The same problem also occurred when looking for a ptirq_remapping_info
from virtual sid and vm.
---------- <- hash table
|hlist_head| --actual ptirq_remapping_info address
---------- --------- / --used as ptirq_remapping_info
| ... | |phys_link| ___/
---------- --------- --------- / ---------
|hlist_head| -> |phys_link| <-> |virt_link| <-> |phys_link|
---------- --------- --------- ---------
| ... | |virt_link| | other | |virt_link|
---------- --------- | members | ---------
| other | --------- | other |
| members | | members |
--------- ---------
Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For an IO request, hv will check if it was registered in asyncio desc
list. If yes, put the corresponding fd to the shared buffer. If the
shared buffer is full, yield the vcpu and try again later.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add hypercall to add/remove asyncio request info. Hv will record the
info in a list, and when a new ioreq is come, hv will check if it is
in the asyncio list, if yes, queue the fd to asyncio buffer.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current IO emulation is synchronous. The user VM need to wait for the
completion of the the I/O request before return. But Virtio Spec
introduces introduces asynchronous IO with a new register in MMIO/PIO
space named NOTIFY, to be used for FE driver to notify BE driver, ACRN
hypervisor can emulate this register by sending a notification to vCPU
in Service VM side. This way, FE side can resume to work without waiting
for the full completion of BE side response.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Extend sbuf hypercall to support other kinds of share buffer.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
sbuf is now only used for debug purpose, but later, it will be used as a
common interfaces. So, move the sbuf related code out of the debug directory.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
INIT signal has been used to kick off the partitioned pCPU, like RTVM,
whose LAPIC is pass-through. notification IPI is used to kick off
sharing pCPU.
Add mode_to_kick_pcpu in per-cpu to control the way of kicking
pCPU.
Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.
This patch is to remove guest's ability to control CPU frequency by
removing the guests' HWP/EIST CPUIDs and blocking the related MSR
accesses. Including:
- Remove CPUID.06H:EAX[7..11] (HWP)
- Remove CPUID.01H:ECX[7] (EIST)
- Inject #GP(0) upon accesses to MSR_IA32_PM_ENABLE,
MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
MSR_IA32_HWP_STATUS, MSR_IA32_HWP_INTERRUPT,
MSR_IA32_HWP_REQUEST_PKG
- Emulate MSR_IA32_PERF_CTL. Value written to MSR_IA32_PERF_CTL
is just stored for reading. This is like how the native
environment would behavior when EIST is disabled from BIOS.
- Emulate MSR_IA32_PERF_STATUS by filling it with base frequency
state. This is consistent with Windows, which displays current
frequency as base frequency when running in VM.
- Hide the IA32_MISC_ENABLE bit 16 (EIST enable) from guests.
This bit is dependent to CPUID.01H:ECX[7] according to SDM.
- Remove CPID.06H:ECX[0] (hardware coordination feedback)
- Inject #GP(0) upon accesses to IA32_MPERF, IA32_APERF
Also DM do not need to generate _PSS/_PPC for post-launched VMs
anymore. This is done by letting hypercall HC_PM_GET_CPU_STATE sub
command ACRN_PMCMD_GET_PX_CNT and ACRN_PMCMD_GET_PX_DATA return (-1).
Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BVT schedule rule:
When a new thread is wakeup and added to runqueue, it will get the
smallest avt (svt) from runqueue to initiate its avt. If the svt is
smaller than it's avt, it will keep the original avt. With the svt, it
can prevent a thread from claiming an excessive share of CPU after
sleepting for a long time.
For the reboot issue, when the VM is reboot, it means a new vcpu thread
is wakeup, but at this time, the Service VM's vcpu thread is blocked,
and removed from the runqueue, and the runqueue is empty, so the svt is
0. The new vcpu thread will get avt=0. avt=0 means very high priority,
and can run for a very long time until it catch up with other thread's
avt in runqueue.
At this time, when Service VM's vcpu thread wakeup, it will check the
svt, but the svt is very small, so will not update it's avt according to
the rule, thus has a very low priority and cannot be scheduled.
To fix it, update svt in pick_next handler to make sure svt is align
with the avt of the first obj in runqueue.
Tracked-On: #7944
Signed-off-by: Conghui <conghui.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Modified the copyright year range in code, and corrected "int32_tel"
into "Intel" in two "hypervisor/include/debug/profiling.h" and
"hypervisor/include/debug/profiling_internal.h".
Tracked-On: #7559
Signed-off-by: Ziheng Li <ziheng.li@intel.com>
Commit d575edf79a changes the internal
implementation of wait_event and signal_event to use a counter instead
of a boolean value.
The background was:
ACRN utilizes vcpu_make_request and signal_event pair to shoot down
other vcpus and let them wait for signals. vcpu_make_request eventually
leads to target vcpu calling wait_event.
However vcpu_make_request/signal_event pair was not thread-safe,
and concurrent calls of this pair of API could lead to problems.
One such example is the concurrent wbinvd emulation, where vcpus may
concurrently issue vcpu_make_request/signal_event to synchronize wbinvd
emulation.
d575edf commit uses a counter in internal implementation of
wait_event/signal_event to avoid data races.
However by using a counter, the wait/signal pair now carries semantics of
semaphores instead of events. Semaphores require caller to carefully
plan their calls instead of multiply signaling any number of times to the same
event, which deviates from the original "event" semantics.
This patch changes the API implementation back to boolean-based, and
re-resolve the issue of concurrent wbinvd in next patch.
This also partially reverts commit 10963b04d1,
which was introduced because of the d575edf.
Tracked-On: #7887
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now interrupt vector in ACRN hypervisor is maintained as global variable, not
per-CPU variable. If there're more PCI devices, the physical interrupt vectors
are not enough most likely.
This patch would not allocate physical interrupt vector for MSI/MSI-X vectors
if interrupt posting could been used to inject the MSI/MSI-X interrupt to
a VM directly.
Tracked-On: #7275
Signed-off-by: Fei Li <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Many of the license and Intel copyright headers include the "All rights
reserved" string. It is not relevant in the context of the BSD-3-Clause
license that the code is released under. This patch removes those strings
throughout the code (hypervisor, devicemodel and misc).
Tracked-On: #7254
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
NMI is used to notify LAPIC-PT RTVM, to kick its CPU into hypervisor.
But NMI could be used by system devices, like PMU (Performance Monitor
Unit). So use INIT signal as the partition CPU notification function, to
replace injecting NMI.
Also remove unused NMI as notification related code.
Tracked-On: #6966
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
The coding guideline rule C-ST-04 requires that
a 'if' statement followed by one or more 'else if'
statement shall be terminated by an 'else' statement
which contains either appropriate action or a comment.
Tracked-On: #6776
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
HC_GET_PLATFORM_INFO hypercall is not supported anymore,
hence to remove related function and data structure definition.
Tracked-On: #6690
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
This patch adds an option CONFIG_KEEP_IRQ_DISABLED to hv (default n) and
config-tool so that when this option is 'y', all interrupts in hv root
mode will be permanently disabled.
With this option to be 'y', all interrupts received in root mode will be
handled in external interrupt vmexit after next VM entry. The postpone
latency is negligible. This new configuration is a requirement from x86
TEE's secure/non-secure interrupt flow support. Many race conditions can be
avoided when keeping IRQ off.
v5:
Rename CONFIG_ACRN_KEEP_IRQ_DISABLED to CONFIG_KEEP_IRQ_DISABLED
v4:
Change CPU_IRQ_ENABLE/DISABLE to
CPU_IRQ_ENABLE_ON_CONFIG/DISABLE_ON_CONFIG and guard them using
CONFIG_ACRN_KEEP_IRQ_DISABLED
v3:
CONFIG_ACRN_DISABLE_INTERRUPT -> CONFIG_ACRN_KEEP_IRQ_DISABLED
Add more comment in commit message
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Avoid failing hypercalls by returning 0 for empty PX and CX tables on
HC_PM_GET_CPU_STATE/PMCMD_GET_PX_CNT and
HC_PM_GET_CPU_STATE/PMCMD_GET_CX_CNT.
Tracked-On: #6848
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The HV will be built failed with below compiler message:
common/efi_mmap.c: In function ‘init_efi_mmap_entries’:
common/efi_mmap.c:41:11: error: unused variable ‘efi_memdesc_nr’
[-Werror=unused-variable]
uint32_t efi_memdesc_nr = uefi_info->memmap_size / uefi_info->memdesc_size;
^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The root cause is ASSERT() api is for DEBUG only so efi_memdesc_nr is not used
in RELEASE code.
The patch fix this issue by removing efi_memdesc_nr declaration;
Tracked-On: #6834
Signed-off-by: Victor Sun <victor.sun@intel.com>
Since the UUID is not a *must* set parameter for the standard post-launched
VM which doesn't depend on any static VM configuration. We can remove
the KATA related code from hypervisor as it belongs to such VM type.
v2-->v3:
separate the struce acrn_platform_info change of devicemodel
v1-->v2:
update the subject and commit msg
Tracked-On:#6685
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
With current arch design the UUID is used to identify ACRN VMs,
all VM configurations must be deployed with given UUIDs at build time.
For post-launched VMs, end user must use UUID as acrn-dm parameter
to launch specified user VM. This is not friendly for end users
that they have to look up the pre-configured UUID before launching VM,
and then can only launch the VM which its UUID in the pre-configured UUID
list,otherwise the launch will fail.Another side, VM name is much straight
forward for end user to identify VMs, whereas the VM name defined
in launch script has not been passed to hypervisor VM configuration
so it is not consistent with the VM name when user list VM
in hypervisor shell, this would confuse user a lot.
This patch will resolve these issues by removing UUID as VM identifier
and use VM name instead:
1. Hypervisor will check the VM name duplication during VM creation time
to make sure the VM name is unique.
2. If the VM name passed from acrn-dm matches one of pre-configured
VM configurations, the corresponding VM will be launched,
we call it static configured VM.
If there is no matching found, hypervisor will try to allocate one
unused VM configuration slot for this VM with given VM name and get it
run if VM number does not reach CONFIG_MAX_VM_NUM,
we will call it dynamic configured VM.
3. For dynamic configured VMs, we need a guest flag to identify them
because the VM configuration need to be destroyed
when it is shutdown or creation failed.
v7->v8:
-- rename is_static_vm_configured to is_static_configured_vm
-- only set DM owned guest_flags in hcall_create_vm
-- add check dynamic flag in get_unused_vmid
v6->v7:
-- refine get_vmid_by_name, return the first matching vm_id
-- the GUEST_FLAG_STATIC_VM is added to identify the static or
dynamic VM, the offline tool will set this flag for
all the pre-defined VMs.
-- only clear name field for dynamic VM instead of clear entire
vm_config
Tracked-On: #6685
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Victor Sun<victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In previous implementation we leave MAX_EFI_MMAP_ENTRIES in config tool and
let end user to configure it. However it is hard for end user to understand
how to configure it, also it is hard for board_inspector to get this value
automatically because this info is only meaningful during the kernel boot
stage and there is no such info available after boot in Linux toolset.
This patch hardcode the value to 350, and ASSERT if the board need more efi
mmap entries to run ACRN. User could modify the MAX_EFI_MMAP_ENTRIES macro
in case ASSERT occurs in DEBUG stage.
The More size of hv_memdesc[] only consume very little memory, the overhead
is (size * sizeof(struct efi_memory_desc)), i.e. (size * 40) in bytes.
Tracked-On: #6442
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The coding guideline rules C-TY-27 and C-TY-28, combined, requires that
assignment and arithmetic operations shall be applied only on operands of the
same kind. This patch either adds explicit type casts or adjust types of
variables to align the types of operands.
The only semantic change introduced by this patch is the promotion of the
second argument of set_vmcs_bit() and clear_vmcs_bit() to
uint64_t (formerly uint32_t). This avoids clear_vmcs_bit() to accidentally
clears the upper 32 bits of the requested VMCS field.
Other than that, this patch has no semantic change. Specifically this patch
is not meant to fix buggy narrowing operations, only to make these
operations explicit.
Tracked-On: #6776
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename SOS_VM_NUM to SERVICE_VM_NUM.
rename SOS_SOCKET_PORT to SERVICE_VM_SOCKET_PORT.
rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM.
rename PCI_DEV_TYPE_SOSEMUL to PCI_DEV_TYPE_SERVICE_VM_EMUL.
rename SHUTDOWN_REQ_FROM_SOS to SHUTDOWN_REQ_FROM_SERVICE_VM.
rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM.
rename SHUTDOWN_REQ_FROM_UOS to SHUTDOWN_REQ_FROM_USER_VM.
rename UOS_SOCKET_PORT to USER_VM_SOCKET_PORT.
rename SOS_CONSOLE to SERVICE_VM_OS_CONSOLE.
rename SOS_LCS_SOCK to SERVICE_VM_LCS_SOCK.
rename SOS_VM_BOOTARGS to SERVICE_VM_OS_BOOTARGS.
rename SOS_ROOTFS to SERVICE_VM_ROOTFS.
rename SOS_IDLE to SERVICE_VM_IDLE.
rename SEVERITY_SOS to SEVERITY_SERVICE_VM.
rename SOS_VM_UUID to SERVICE_VM_UUID.
rename SOS_REQ to SERVICE_VM_REQ.
rename RTCT_NATIVE_FILE_PATH_IN_SOS to RTCT_NATIVE_FILE_PATH_IN_SERVICE_VM.
rename CBC_REQ_T_UOS_ACTIVE to CBC_REQ_T_USER_VM_ACTIVE.
rename CBC_REQ_T_UOS_INACTIVE to CBC_REQ_T_USER_VM_INACTIV.
rename uos_active to user_vm_active.
Tracked-On: #6744
Signed-off-by: Liu Long <long.liu@linux.intel.com>
Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Rename gpa_uos to gpa_user_vm
rename base_gpa_in_uos to base_gpa_in_user_vm
rename UOS_VIRT_PCI_MMCFG_BASE to USER_VM_VIRT_PCI_MMCFG_BASE
rename UOS_VIRT_PCI_MMCFG_START_BUS to USER_VM_VIRT_PCI_MMCFG_START_BUS
rename UOS_VIRT_PCI_MMCFG_END_BUS to USER_VM_VIRT_PCI_MMCFG_END_BUS
rename UOS_VIRT_PCI_MEMBASE32 to USER_VM_VIRT_PCI_MEMBASE32
rename UOS_VIRT_PCI_MEMLIMIT32 to USER_VM_VIRT_PCI_MEMLIMIT32
rename UOS_VIRT_PCI_MEMBASE64 to USER_VM_VIRT_PCI_MEMBASE64
rename UOS_VIRT_PCI_MEMLIMIT64 to USER_VM_VIRT_PCI_MEMLIMIT64
rename UOS in comments message to User VM.
Tracked-On: #6744
Signed-off-by: Liu Long <long.liu@linux.intel.com>
Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Rename sos_vm to service_vm.
rename sos_vmid to service_vmid.
rename sos_vm_ptr to service_vm_ptr.
rename get_sos_vm to get_service_vm.
rename sos_vm_gpa to service_vm_gpa.
rename sos_vm_e820 to service_vm_e820.
rename sos_efi_info to service_vm_efi_info.
rename sos_vm_config to service_vm_config.
rename sos_vm_hpa2gpa to service_vm_hpa2gpa.
rename vdev_in_sos to vdev_in_service_vm.
rename create_sos_vm_e820 to create_service_vm_e820.
rename sos_high64_max_ram to service_vm_high64_max_ram.
rename prepare_sos_vm_memmap to prepare_service_vm_memmap.
rename post_uos_sworld_memory to post_user_vm_sworld_memory
rename hcall_sos_offline_cpu to hcall_service_vm_offline_cpu.
rename filter_mem_from_sos_e820 to filter_mem_from_service_vm_e820.
rename create_sos_vm_efi_mmap_desc to create_service_vm_efi_mmap_desc.
rename HC_SOS_OFFLINE_CPU to HC_SERVICE_VM_OFFLINE_CPU.
rename SOS to Service VM in comments message.
Tracked-On: #6744
Signed-off-by: Liu Long <long.liu@linux.intel.com>
Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Implement the propagate_vcbm() function:
Set vCBM to to all the vCPUs that share cache with vcpu
to mimic hardware CAT behavior
Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
It's difficult to configure CONFIG_HV_RAM_SIZE properly at once. This patch
not only remove CONFIG_HV_RAM_SIZE, but also we use ld linker script to
dynamically get the size of HV RAM size.
Tracked-On: #6663
Signed-off-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds a new priority based scheduler to support
vCPU scheduling based on their pre-configured priorities.
A vCPU can be running only if there is no higher priority
vCPU running on the same pCPU.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
- remove vcpu->arch.nrexits which is useless.
- record full 32 bits of exit_reason to TRACE_2L(). Make the code simpler.
Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
GSI of hcall_set_irqline should be checked against target_vm's
total GSI count instead of SOS's total GSI count.
Tracked-On: #6357
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
It is used to specify the maximum number of EFI memmap entries.
On some platforms, like Tiger Lake, the number of EFI memmap entries
becomes 268 when the BIOS settings are changed.
The current value of MAX_EFI_MMAP_ENTRIES (256) defined in hypervisor
is not big enough to cover such cases.
As the number of EFI memmap entries depends on the platforms and the
BIOS settings, this patch introduces a new entry MAX_EFI_MMAP_ENTRIES
in configurations so that it can be adjusted for different cases.
Tracked-On: #6442
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Currently the sched event handling may encounter data race problem, and
as a result some vcpus might be stalled forever.
One example can be wbinvd handling where more than 1 vcpus are doing
wbinvd concurrently. The following is a possible execution of 3 vcpus:
-------
0 1 2
req [Note: 0]
req bit0 set [Note: 1]
IPI -> 0
req bit2 set
IPI -> 2
VMExit
req bit2 cleared
wait
vcpu2 descheduled
VMExit
req bit0 cleared
wait
vcpu0 descheduled
signal 0
event0->set=true
wake 0
signal 2
event2->set=true [Note: 3]
wake 2
vcpu2 scheduled
event2->set=false
resume
req
req bit0 set
IPI -> 0
req bit1 set
IPI -> 1
(doesn't matter)
vcpu0 scheduled [Note: 4]
signal 0
event0->set=true
(no wake) [Note: 2]
event0->set=false (the rest doesn't matter)
resume
Any VMExit
req bit0 cleared
wait
idle running
(blocked forever)
Notes:
0: req: vcpu_make_request(vcpu, ACRN_REQUEST_WAIT_WBINVD).
1: req bit: Bit in pending_req_bits. Bit0 stands for bit for vcpu0.
2: In function signal_event, At this time the event->waiting_thread
is not NULL, so wake_thread will not execute
3: eventX: struct sched_event of vcpuX.
4: In function wait_event, the lock does not strictly cover the execution between
schedule() and event->set=false, so other threads may kick in.
-----
As shown in above example, before the last random VMExit, vcpu0 ended up
with request bit set but event->set==false, so blocked forever.
This patch proposes to change event->set from a boolean variable to an
integer. The semantic is very similar to a semaphore. The wait_event
will add 1 to this value, and block when this value is > 0, whereas signal_event
will decrease this value by 1.
It may happen that this value was decreased to a negative number but that
is OK. As long as the wait_event and signal_event are paired and
program order is observed (that is, wait_event always happens-before signal_event
on a single vcpu), this value will eventually be 0.
Tracked-On: #6405
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>