For a pci BAR, its size aligned bits have fixed to 0(except the memory
type bits, they have another fixed value), they are read-only.
When write ~0U to BAR for sizing, (type_bits | size_mask) is written
into BAR.
So do not need to distinguish between sizing vBAR and programming vBAR.
When write a value to vBAR, always store (value & size_mask | type_bit)
to vfcg.
pci_vdev_read_vbar() is unnecessary, because it is only need to read
vcfg.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
When guest doing BAR re-programming, we should check whether
the base address of the BAR is valid.This patch does this check by:
1. whether the gpa is located in the responding MMIO window
2. whether the gpa is aligned with the BAR size
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Now we use pci_vdev_update_vbar_base to update vBAR base address when
guest re-programming BAR. For a IO BAR, we would calculate the 32 bits
base address then mask the high 16 bits. However, the mask code would
never be called since the first if condition statement is always true.
This patch fix it by move the unamsk code into the first if condition
statement.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Add flag "allow_trigger_s5" to launch script xmls. If this flag sets to
'y' and the poweroff_channel sets to "vuart1(pty)" or "vuart1(tty)", the
"allow_trigger_s5" will appends to the end of "--pm_notify_channel
uart".
Tracked-On: #6138
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
generate_shadow_ept_entry() didn't verify the correctness of the requested
guest EPT mapping. That might leak host memory access to L2 VM.
To simplify the implementation of the guest EPT audit, hide capabilities
'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition,
minimize the attribute bits of EPT entry when create a shadow EPT entry.
Also, for invalid requested mapping address, reflect the EPT_VIOLATION to
L1 VM.
Here, we have some TODOs:
1) Enable large page support in generate_shadow_ept_entry()
2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly.
3) Minimize EPT entry attributes.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
L1 VM changes the guest EPT and do INVEPT to invalidate the previous
TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction
to do the update.
The target shadow EPTs can be found according to the 'type' of INVEPT.
Here are two types and their target shadow EPT,
1) Single-context invalidation
Get the EPTP from the INVEPT descriptor. Then find the target
shadow EPT.
2) Global invalidation
All shadow EPTs of the L1 VM.
The INVEPT emulation handler invalidate all the EPT entries of the
target shadow EPTs.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When a shadow EPT is not used anymore, its resources need to be
released.
free_sept_table() is introduced to walk the whole shadow EPT table and
free the pagetable pages.
Please note, the PML4E page of shadow EPT is not freed by
free_sept_table() as it still be used to present a shadow EPT pointer.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With shadow EPT, the hypervisor walks through guest EPT table:
* If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
to L1 VM and resumes to L1 VM.
* If the entry is present in guest EPT, do the EPT_MISCONFIG check.
Inject EPT_MISCONFIG to L1 VM if the check failed.
* If the entry is present in guest EPT, do permission check.
Reflect EPT_VIOLATION to L1 VM if the check failed.
* If the entry is present in guest EPT but shadow EPT entry is not
present, create the shadow entry and resumes to L2 VM.
* If the entry is present in guest EPT but the GPA in the entry is
invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
* Hide 5 level EPT capability, let L1 guest stick to 4 level EPT.
* Access/Dirty bits are not support currently, hide corresponding EPT
capability bits.
* "Mode-based execute control for EPT" is also not support well
currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'struct nept_desc' is used to associate guest EPTP with a shadow EPTP.
It's created in the first reference and be freed while no reference.
The life cycle seems like,
While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc'
of the new guest EPTP is referenced; the 'struct nept_desc' of the old
guest EPTP is dereferenced.
While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc'
of the old guest EPTP is dereferenced.
While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct
nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc'
of the old guest EPTP is dereferenced.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To shadow guest EPT, the hypervisor needs construct a shadow EPT for each
guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP
(EPT pointer). This patch provides following structure to do the association.
struct nept_desc {
/*
* A shadow EPTP.
* The format is same with 'EPT pointer' in VMCS.
* Its PML4 address field is a HVA of the hypervisor.
*/
uint64_t shadow_eptp;
/*
* An guest EPTP configured by L1 VM.
* The format is same with 'EPT pointer' in VMCS.
* Its PML4 address field is a GPA of the L1 VM.
*/
uint64_t guest_eptp;
uint32_t ref_count;
};
Due to lack of dynamic memory allocation of the hypervisor, a array
nept_bucket of type 'struct nept_desc' is introduced to store those
association information. A guest EPT might be shared between different
L2 vCPUs, so this patch provides several functions to handle the
reference of the structure.
Interface get_shadow_eptp() also is introduced. To find the shadow EPTP
of a specified guest EPTP.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Shadow EPT uses lots of pages to construct the shadow page table. To
utilize the memory more efficient, a page poll sept_page_pool is
introduced.
For simplicity, total platform RAM size is considered to calculate the
memory needed for shadow page tables. This is not an accurate upper
bound. This can satisfy typical use-cases where there is not a lot
of overcommitment and sharing of memory between L2 VMs.
Memory of the pool is marked as reserved from E820 table in early stage.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Nested VM exits happen when vCPU is in guest mode (VMCS02 is current).
Initially we reflect all nested VM exits to L1 hypervisor. To prepare
the environment to run L1 guest:
- restore some VMCS fields to the value as what L1 hypervisor programmed.
- VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing.
- load the non-shadowing host states from VMCS12 to VMCS01 guest states.
- VMRESUME to L1 guest with this modified VMCS01.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we
can skip these handling in run_vcpu().
And be careful that we can't cache L2 registers in struct acrn_vcpu.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
invvpid and invept instructions cause VM exits unconditionally.
For initial support, we pass all the instruction operands as is
to the pCPU.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Implement the VMLAUNCH and VMRESUME instructions, allowing a L1
hypervisor to run nested guests.
- merge VMCS control fields and VMCS guest fields to VMCS02
- clear shadow VMCS indicator on VMCS02 and load VMCS02 as current
- set VMCS12 launch state to "launched" in VMLAUNCH handler
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Make up qemu.xml to compromise the static allocators which use the
new xpath based on board inspector.
Tracked-On: #6102
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
For illegal characters, replace original characters with escaped characters in board.xml.
Tracked-On: #6113
Signed-off-by: Kunhui Li <kunhuix.li@intel.com>
The latest version of RTCT specification is version 2.
This patch is to add RTCT v2 support for virtual RTCT
of guest.
Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
Emulation of guest lapic ID has been enhanced to
indicate the topology of vCPU hierarchy.
This patch refine logic to build virtual RCTC_v1 table
of guest to adapt above lapic ID changes.
Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2).
and the MAGIC number in CRL header is also changed from "PTCM"
to "RTCM".
This patch refine the code to detect RTCT table for both
v1 and v2.
Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Modify the initial value of PT_SLOT variable and
update the get slot logic that all device call the virtual_dev_slot function to get slot number directly.
Copy the launch_uos_id1.sh to launch_win.sh.
Tracked-On: #6072
Signed-off-by: Kunhui Li <kunhuix.li@intel.com>
Update the severity from "warning" to "error" for hybrid cores check.
Tracked-On: #5918
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Kunhui Li <kunhuix.li@intel.com>
We missed a section during our last update to the QEMU tutorial
that references which version (tag) of ACRN to use.
Tracked-On: #5928
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
hv is responsible to ensure PTM is always enabled on hw root port if
that root port is PTM root-capable. If PTM root is not enabled already in physical
root port before guest launch, guest OS can only enable it in root port's virtual
config space and PTM may not function as desired so we would rather not
to allow user to enable PTM on pass-thru device in this case.
Also revisit a few comments to add assumption that acrn only support a
simple PTM hierarch emulation i.e., EP (ptm requestor) is directly connected
to root port (ptm root), or ptm requestor is rcie).
V4:
- Change behavior from V3: When users tries to enable PTM while PTM root is not enabled on
physical root port, allow user to launch VM and pass thru the device
with no PTM support.
V3:
- When users tries to enable PTM while PTM root is not enabled on
physical root port, allow user to launch VM and enable PTM in virtual
root port. This is not going to enable PTM on physical root port.
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
For post launch VM, ACRN supports PTM under these conditions:
1. HW implements a simple PTM hierarchy: PTM requestor device (ep) is
directly connected to PTM root capable root port. Or
2. ptm requestor itself is root complex integrated ep.
Currently acrn doesn't support emulation of other type of PTM hiearchy, such
as if there is an intermediate PTM node (for example, switch) inbetween
PTM requestor and PTM root.
To avoid VM touching physical hardware, acrn hv ensures PTM is always enabled
in the hardware.
During hv's pci init, if root port is ptm capable,
hv will enable PTM on that root port. In addition,
log error (and don't enable PTM) if ptm root
capability is on intermediate node other than root port.
V2:
- Modify commit messages to clarify the limitation
of current PTM implementation.
- Fix code that may fail FUSA
- Remove pci_ptm_info() and put info log inside pci_enable_ptm_root().
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In physical destination mode, the destination processor is specified by its
local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa,
the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical
Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR.
This patch adds support for a vCPU A could write ICR to send IPI to another
vCPU B which is in different APIC mode.
Tracked-On: #5923
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Retrieve physical APIC IDs from board xml file and use them to fill in the ACPI MADT table
for pre-Launched VMs.
Note that the config-tool will throw an error if the processors/die/core/thread tags are absent.
User needs to run board_inspector.py to regenerate the board xml file when this commit is merged,
if the processors/die/core/thread tags are missing in the board xml file.
Tracked-On: #6020
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Two utility functions are copied and adapted from hyerpervisor:
ffs64
bitmap_clear_nolock
Two public functions are provided for future use (such as for RTCTv2)
pcpuid_from_vcpuid
lapicid_from_pcpuid
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
This allows users to retrieve and use the requested platform_info information from hypervisor
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
So that DM can retrieve physical APIC IDs and use them to fill in the ACPI MADT table for
post-launched VMs.
Note:
1. DM needs to use the same logic as hypervisor to calculate vLAPIC IDs based on physical APIC IDs
and CPU affinity setting
2. Using reserved0[] in struct hc_platform_info to pass physical APIC IDs means we can only support at
most 116 cores. And it assumes LAPIC ID is 8bits (X2APIC mode supports 32 bits).
Cat IDs shift will be used by DM RTCT V2
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs
is not sufficient to replicate the host CPU and cache topologies in guest VMs,
we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise,
guest VMs may see weird CPU topology.
Note that in current code, ACRN has already passthroughed host cache CPUID
leaf 04H to guest VMs
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS,
and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs.
Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs
would result in wrong CPU and cache topologies showing in the guest VMs,
and could adversely affect performance if the guest VM chooses to detect
CPU and cache topologies and optimize its behavior accordingly.
Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration
CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched
and post-Launched VMs.
Tracked-On: #6020
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Remove the direct calls to exec_vmptrld() or exec_vmclear(), and replace
with the wrapper APIs load_va_vmcs() and clear_va_vmcs().
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
commit 873ed75 ("misc: sanity check VM config for nested virtualization")
requires that the guest_flag tag can't be empty, or it will fail to build.
This patch changes adl instances of "<guest_flag></guest_flag>" to
"<guest_flag>0</guest_flag>".
Tracked-On: #5923
Signed-off-by: Jiang, Yanting <yanting.jiang@intel.com>
Configure PTM in post-launched VM using <PTM> element. If the //vm/PTM
sets to 'y', pci_dev.c.xsl appends the virtual root port to
corresponding struct acrn_vm_pci_dev_config of that VM. Currently it
supports only post-launched VMs.
Configure enable_ptm for dm argument. If a uos/enable_ptm with uos id
= 'vm_id 'sets to 'y' and the vm/PTM with the same vm_id sets to 'y',
append an "enable_ptm" flag to the end of passthrough ethernet devices.
Currently there is only ethernet card can support the "enable_ptm"flag.
For the schema validation, the <PTM> can only be ['y', 'n'].
For the launched script validation, the <enable_ptm> can only be ['y',
'n']. If the <enable_ptm> sets to 'y' but the corresponding <PTM> sets
to 'n', the launch script will fail to generate.
Tracked-On: #6054
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Add a number of steps and details that were not called
out in the "Getting Started Guide for ACRN Hybrid Mode". Those
are not obvious to the first-time or novice user so the user
guide was hard to follow and confusing. At a high-level:
* How to build Zephyr
* How to install ACRN
* How to install the ACRN kernel
The hybrid scenario overview diagram has been updated too.
Tracked-On: #5992
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Add an xslt file "board_info.h.xsl". This file is used to
generate board_info.h which is used by hypervisor.
Tracked-On: #6024
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add an xslt file "pci_dev.c.xsl". This file is used to
generate pci_dev.c which is used by hypervisor.
Tracked-On: #6024
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
acrn:get-vbdf: get the virtual bdf from allocation.xml based on vmid and device name
acrn:get-pbdf: get physical bdf from <pci_dev>
acrn:ptdev-name-suffix: fix the name to look up allocation.xml
acrn:get-hidden-device-num: get the number of hidden devices based on
board name
acrn:is-vmsix-supported-device: check if a device is a vmsix supported
device based on the vendor and identifier
Tracked-On: #6024
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Assign bdf to pci emulated and passthrough devices.
For pre-launched VM, assigns unique bdf to passthrough devices, inter-vm
shared memory, pci vuart(console and communication vuarts).
For SOS vm, assigns unique bdf to inter-vm shared memory and pci
vuart(console and communication vuarts).
The bdf follows the rules below:
- the bdf 00:00.0 is reserved for pci hostbridge
- the assigned bdf range: bus is 0x00, dev is in range [0x1, 0x20)
and the fuc is 0x00
- the bdf must be unique, which means any vm's emulated devices cannot
share the same bdf with existing devices
- some devices's bdf is hardcoded, modify its bdf would leads the
device cannot be dicoverd by os. A HARDCODED_BDF_LIST in bdf.py documents
them
- the passthrough devices' bdf can be reused in SOS vm
Tracked-On: #6024
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>