Commit Graph

3371 Commits

Author SHA1 Message Date
Zide Chen ad37553873 hv: nested: redundant permission check on nested_vmentry()
check_vmx_permission() is called in vmresume_vmexit_handler() and
vmlaunch_vmexit_handler() already.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-08-20 08:14:40 +08:00
Yifan Liu d575edf79a hv: Change sched_event structure to resolve data race in event handling
Currently the sched event handling may encounter data race problem, and
as a result some vcpus might be stalled forever.

One example can be wbinvd handling where more than 1 vcpus are doing
wbinvd concurrently. The following is a possible execution of 3 vcpus:
-------
0                            1                           2
                             req [Note: 0]
                             req bit0 set [Note: 1]
                             IPI -> 0
                             req bit2 set
                             IPI -> 2
                                                         VMExit
                                                         req bit2 cleared
                                                         wait
                                                         vcpu2 descheduled

VMExit
req bit0 cleared
wait
vcpu0 descheduled
                             signal 0
                             event0->set=true
                             wake 0

                             signal 2
                             event2->set=true [Note: 3]
                             wake 2
                                                         vcpu2 scheduled
                                                         event2->set=false
                                                         resume

                                                         req
                                                         req bit0 set
                                                         IPI -> 0
                                                         req bit1 set
                                                         IPI -> 1
                             (doesn't matter)
vcpu0 scheduled [Note: 4]
                                                         signal 0
                                                         event0->set=true
                                                         (no wake) [Note: 2]
event0->set=false                                        (the rest doesn't matter)
resume

Any VMExit
req bit0 cleared
wait
idle running

(blocked forever)

Notes:
0: req: vcpu_make_request(vcpu, ACRN_REQUEST_WAIT_WBINVD).
1: req bit: Bit in pending_req_bits. Bit0 stands for bit for vcpu0.
2: In function signal_event, At this time the event->waiting_thread
    is not NULL, so wake_thread will not execute
3: eventX: struct sched_event of vcpuX.
4: In function wait_event, the lock does not strictly cover the execution between
    schedule() and event->set=false, so other threads may kick in.
-----

As shown in above example, before the last random VMExit, vcpu0 ended up
with request bit set but event->set==false, so blocked forever.

This patch proposes to change event->set from a boolean variable to an
integer. The semantic is very similar to a semaphore. The wait_event
will add 1 to this value, and block when this value is > 0, whereas signal_event
will decrease this value by 1.

It may happen that this value was decreased to a negative number but that
is OK. As long as the wait_event and signal_event are paired and
program order is observed (that is, wait_event always happens-before signal_event
on a single vcpu), this value will eventually be 0.

Tracked-On: #6405
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
2021-08-20 08:11:40 +08:00
Zhou, Wu b394777908 HV: Add implements of 32bit and 64bit elf loader
This is a simply implement for the 32bit and 64bit elf loader.

The loading function first reads the image header, and finds the program
entries that are marked as PT_LOAD, then loads segments from elf file to
guest ram. After that, it finds the bss section in the elf section entries, and
clear the ram area it points to.

Limitations:
1. The e_type of the elf image must be ET_EXEC(executable). Relocatable or
   dynamic code is not supported.
2. The loader only copies program segments that has a p_type of
   PT_LOAD(loadable segment). Other segments are ignored.
3. The loader doesn’t support Sections that are relocatable
   (sh_type is SHT_REL or SHT_RELA)
4. The 64bit elf’s entry address must below 4G.
5. The elf is assumed to be able to put segments to valid guest memory.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Zhou, Wu c2468d2791 HV: Add elf loader sketch
This patch adds a function elf_loader() to load elf image.
It checks the elf header, get its 32/64 bit type, then calls
the corresponding loading routines, which are empty, and
will be realized later.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Zhou, Wu 537f69dde9 HV: Add elf header file for elf loader
Source: https://github.com/freebsd/freebsd-src/blob/main/sys/sys/elf_common.h
Trimed to meet the minimal requirements for the Zephyr elf file to be loaded
Also added elf file header data struct and program/section entry data structs.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Zhou, Wu 8100b1dd56 HV: Remove 'vm_' of vm_elf_loader and etc.
In order to make better sense, vm_elf_loader, vm_bzimage_loader and
vm_rawimage_loader are changed to elf_loaer, bzimage_loaer and
rawimage_loader.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Zhou, Wu 53f6720d13 HV: Combine the acpi loading fucntion to one place
Remove the acpi loading function from elf_loader, rawimage_loaer and
bzimage_loader, and call it together in vm_sw_loader.

Now the vm_sw_loader's job is not just loading sw, so we rename it to
prepare_os_image.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Zhou, Wu e78aacbe55 HV: Correct some naming issues
For the guest OS loaders, prapare_loading_xxx are not accurate for
what those functions actually do. Now they are changed to load_xxx:
load_rawimage, load_bzimage.

And the 'bsp' expression is confusing in the comments for
init_vcpu_protect_mode_regs, changed to a better way.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Victor Sun 3124018917 HV: vm_load: rename vboot_info.h to vboot.h
vboot_info.h declares vm loader function also, so rename the file name to
vboot.h;

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Victor Sun 9b632c0e4b HV: vm_load: split vm_load.c to support diff kernel format
The patch splits the vm_load.c to three parts, the loader function of bzImage
kernel is moved to bzimage_loader.c, the loader function of raw image kernel
is moved to rawimage_loader.c, the stub is still stayed in vm_load.c to load
the corresponding kernel loader function. Each loader function could be
isolated by CONFIG_GUEST_KERNEL_XXX macro which generated by config tool.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Victor Sun 2524572fb2 HV: vm_load: refine vm_sw_loader API
Change if condition to switch in vm_sw_loader() so that the sw loader
could be compiled conditionally.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Yang,Yu-chu 73dc610d90 config-tool: refine guest kernel types
Rename KERNEL_ZEPHYR to KERNEL_RAWIMAGE. Added new type "KERNEL_ELF".

Add CONFIG_GUEST_KERNEL_RAWIMAGE, CONFIG_GUEST_KERNEL_ELF and/or
CONFIG_GUEST_KERNEL_BZIMAGE to config.h if it's configured.

Tracked-On: #6323

Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
2021-08-19 20:00:45 +08:00
Victor Sun 178b3e85e3 HV: vm_load: change kernel type for zephyr image
Previously we only support loading raw format of zephyr image as prelaunched
Zephyr VM, this would cause guest F segment overridden issue because the zephyr
raw image covers memory space from 0x1000 to 0x100000 upper. To fix this issue,
we should support ELF format image loading so that parse and load the multiple
segments from ELF image directly.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Fei Li 2e7491a8ec hv: mmiodev: a minor bug fix about refine acrn_mmiodev data structure
Rename base_hpa to host_pa in acrn_mmiodev data structure.

Tracked-On: #6366
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-19 12:01:35 +08:00
Liu,Junming 2c5c8754de hv:enable GVT-d for pre-launched linux guest in logical partion mode
When pass-thru GPU to pre-launched Linux guest,
need to pass GPU OpRegion to the guest.
Here's the detailed steps:
1. reserve a memory region in ve820 table for GPU OpRegion
2. build EPT mapping for GPU OpRegion to pass-thru OpRegion to guest
3. emulate the pci config register for OpRegion
For the third step, here's detailed description:
The address of OpRegion locates on PCI config space offset 0xFC,
Normal Linux guest won't write this register,
so we can regard this register as read-only.
When guest reads this register, return the emulated value.
When guest writes this register, ignore the operation.

Tracked-On: #6387

Signed-off-by: Liu,Junming <junming.liu@intel.com>
2021-08-19 11:56:26 +08:00
Jian Jun Chen dc77ef9e52 hv: ivshmem: map SHM BAR with PAT ignored
ACRN does not support the variable range vMTRR. The default
memory type of vMTRR is UC. With this vMTRR emulation guest VM
such as Linux refuses to map the MMIO address space as WB. In
order to get better performance SHM BAR of ivshmem is mapped
with PAT ignored and memory type of SHM BAR is fixed to WB.

Tracked-On: #6389
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-13 11:17:15 +08:00
Yang,Yu-chu d997f4bbc1 config-tools: refine bin_gen.py and create virtual TPM2 acpi table
Create virtual acpi table of tpm2 based on the raw data if the TPM2
device is presented and the passthrough tpm2 is enabled.

Refine the arguments of bin_gen.py. The --board and --scenario take the
path to the XMLs as the argument. The allocation.xml is needed for
bin_gen.py to generate tpm2 acpi table.

Refine the condition of tpm2_acpi_gen. The tpm2 device "MSFT0101" can be
present in device id or compatible_id(CID). Check both attributes and
child node of tpm2 device.

Tracked-On: #6320
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
2021-08-11 14:45:55 +08:00
Fei Li a705ff2dac hv: relocate ACPI DATA address to 0x7fe00000
Relocate ACPI address to 0x7fe00000 and ACPI NVS to 0x7ff00000 correspondingly.
In this case, we could include TPM event log region [0x7ffb0000, 0x80000000)
into ACPI NVS.

Tracked-On: #6320
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-11 14:45:55 +08:00
Fei Li 74e68e39d1 hv: tpm2: do tpm2 fixup for security vm
ACRN used to prepare the vTPM2 ACPI Table for pre-launched VM at the build stage
using config tools. This is OK if the TPM2 ACPI Table never changes. However,
TPM2 ACPI Table may be changed in some conditions: change BIOS configuration or
update BIOS.

This patch do TPM2 fixup to update the vTPM2 ACPI Table and TPM2 MMIO resource
configuration according to the physical TPM2 ACPI Table.

Tracked-On: #6366
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-11 14:45:55 +08:00
Fei Li f81b39225c HV: refine acrn_mmiodev data structure
1. add a name field to indicate what the MMIO Device is.
2. add two more MMIO resource to the acrn_mmiodev data structure.

Tracked-On: #6366
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-11 14:45:55 +08:00
Fei Li 20061b7c39 hv: remove xsave dependence
ACRN could run without XSAVE Capability. So remove XSAVE dependence to support
more (hardware or virtual) platforms.

Tracked-On: #6287
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-10 16:36:15 +08:00
Fei Li 84235bf07c hv: vtd: a minor refine about dmar_wait_completion
Check whether condition is met before check whether time is out after iommu_read32.
This is because iommu_read32 would cause time out on some virtual platform in
spite of the current DMAR status meets the pre_condition.

Tracked-On: #6371
Signed-off-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-10 16:36:15 +08:00
Tao Yuhong 171856c46b hv: uc-lock: Fix do not trap #GP
If HV enable trigger #GP for uc-lock, and is about to emulate guest uc-lock
instructions, should trap guest #GP. Guest uc-lock instrucction trigger #GP,
cause vmexit for #GP, HV handle this vmexit and emulate uc-lock
instruction.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-08-09 15:33:12 +08:00
liu hang1 d07bd78b13 Makefile:add targz-pkg entry in Makefile
User could use make targz-pkg command to generate tar package in
build directory,which could help user simplify the process
of installing acrn hypervisor in target board. user need to copy the
tarball package to target board,and extract it to "/" directory.

Tracked-On: #6355
Signed-off-by: liu hang1 <hang1.liu@intel.com>
Reviewed-by: VanCutsem, Geoffroy <geoffroy.vancutsem@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2021-08-09 11:52:27 +08:00
Kunhui-Li 578c18b962 config_tools: remove obsolete kconfig files
Remove obsolete Kconfig files;
Update Kconfig related README and error message.

Tracked-On: #6315
Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>
2021-08-09 09:25:02 +08:00
Victor Sun 4a53a23faa HV: debug: support 64bit BAR pci uart with 32bit space
Currently the HV console does not support PCI UART with 64bit BAR, but in the
case that the BAR is in 64bit and the BAR space is below 4GB (i.e. the high
32bit address of the 64bit BAR is zero), HV should be able to support it.

Tracked-On: #6334

Signed-off-by: Victor Sun <victor.sun@intel.com>
2021-08-04 10:10:35 +08:00
Victor Sun 2fbc4c26e6 HV: vm_load: remove kernel_load_addr in sw_kernel_info struct
When guest kernel has multiple loading segments like ELF format image, just
define one load address in sw_kernel_info struct is meaningless.

The patch removes kernel_load_addr member in struct sw_kernel_info, the load
address should be parsed in each specified format image processing.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun d1d59437ea HV: vm_load: correct needed size of bzImage kernel
The previous code did not load bzImage start from protected mode part, result
in the protected mode part un-align with kernel_alignment field and then cause
kernel decompression start from a later aligned address. In this case we had
to enlarge the needed size of bzImage kernel to kernel_init_size plus double
size of kernel_alignment.

With loading issue of bzImage protected mode part fixed, the kernel needed size
is corrected in this patch.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun 2b5bd2e87a HV: vm_load: load protected mode code only for bzImage
When LaaG boots with bzImage module file, only protected mode code need
to be loaded to guest space since the VM will boot from protected mode
directly. Futhermore, per Linux boot protocol the protected mode code
better to be aligned with kernel_alignment field in zeropage, otherwise
kernel will take time to do "rep movs" to the aligned address.

In previous code, the bzImage is loaded to the address where aligned with
kernel_alignment, this would make the protected mode code unalign with
kernel_alignment. If the kernel is configured with CONFIG_RELOCATABLE=n,
the guest would not boot. This patch fixed this issue.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun 9caff7360f HV: vm_load: set kernel load addr in vm_load.c
This patch moves get_bzimage_kernel_load_addr() from init_vm_sw_load() to
vm_sw_loader() stage so will set kernel load address of bzImage type kernel
in vm_bzimage_loader() in vm_load.c.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun e40a258102 HV: vm_load: set ramdisk load addr in vm_load.c
This patch moves get_initrd_load_addr() API from init_vm_sw_load() to
vm_sw_loader() stage. The patch assumes that the kernel image have been
loaded to guest space already.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun afe24731a7 HV: vm_load: remove load_sw_modules() api
In load_sw_modules() implementation, we always assuming the guest kernel
module has one load address and then the whole kernel image would be loaded
to guest space from its load address. This is not true when guest kernel
has multiple load addresses like ELF format kernel image.

This patch removes load_sw_modules() API, and the loading method of each
format of kernel image could be specified in prepare_loading_xxximage() API.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun 2938eca363 HV: vm_load: refine api of get_bzimage_kernel_load_addr()
As the previous commit said the kernel load address should be moved
from init_vm_sw_load() to vm_sw_loader() stage. This patch refines
the API of get_bzimage_kernel_load_addr() in init_vm_kernel_info()
for later use.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Victor Sun 33d226bf58 HV: vm_load: refine api of get_initrd_load_addr()
Currently the guest kernel load address and ramdisk load address are
initialized during init_vm_sw_load() stage, this is meaningless when
guest kernel has multiple segments with different loading addresses.
In that case, the kernel load addresses should be parsed and loaded
in vm_sw_loader() stage, the ramdisk load address should be set in
that stage also because it is depended on kernel load address.

This patch refines the API of get_initrd_load_addr() which will set
proper initrd load address of bzImage type kernel for later use.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Fei Li bc5c3a0bb7 hv: vpci: modify Interrupt Line Register as writable
According to PCIe Spec, for a RW register bits, If the optional feature
that is associated with the bits is not implemented, the bits are permitted
to be hardwired to 0b. However Zephyr would use INTx Line Register as writable
even this PCI device has no INTx, so emulate INTx Line Register as writable.

Tracked-On: #6330
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-03 11:01:24 +08:00
Fei Li 481e9fba9d hv: remove the constraint "MMU and EPT must both support large page or not"
There're some virtual platform which doesn't meet this constraint. So remove
this constraint.

Tracked-On: #6329
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-03 11:01:24 +08:00
Minggui Cao 80ae3224d9 hv: expose PMC to core partition VM
for core partition VM (like RTVM), PMC is always used for performance
profiling / tuning, so expose PMC capability and pass-through its MSRs
to the VM.

Tracked-On: #6307
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-27 14:58:28 +08:00
Minggui Cao eba8c4e78b hv: use ARRAY_SIZE to calc local array size
if one array just used in local only, and its size not used extern,
use ARRAY_SIZE macro to calculate its size.

Tracked-On: #6307
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2021-07-27 14:58:28 +08:00
Yifan Liu 69fef2e685 hv: debug: Add hv console callback to VM-exit event
In some scenarios (e.g., nested) where lapic-pt is enabled for a vcpu
running on a pcpu hosting console timer, the hv console will be
inaccessible.

This patch adds the console callback to every VM-exit event so that the
console can still be somewhat functional under such circumstance.

Since this is VM-exit driven, the VM-exit/second can be low in certain
cases (e.g., idle or running stress workload). In extreme cases where
the guest panics/hangs, there will be no VM-exits at all.

In most cases, the shell is laggy but functional (probably enough for
debugging purpose).

Tracked-On: #6312
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
2021-07-22 10:08:23 +08:00
Tao Yuhong 8360c3dfe6 HV: enable #GP for UC lock
For an atomic operation using bus locking, it would generate LOCK# bus
signal, if it has Non-WB memory operand. This is an UC lock. It will
ruin the RT behavior of the system.
If MSR_IA32_CORE_CAPABILITIES[bit4] is 1, then CPU can trigger #GP
for instructions which cause UC lock. This feature is controlled by
MSR_TEST_CTL[bit28].
This patch enables #GP for guest UC lock.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong 2aba7f31db HV: rename splitlock file name
Because the emulation code is for both split-lock and uc-lock,
rename splitlock.c/splitlock.h to lock_instr_emul.c/lock_instr_emul.h

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong 7926504011 HV: rename split-lock emulation APIs
Because the emulation code is for both split-lock and uc-lock, Changed
these API names:
vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation()
vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation()
emulate_splitlock() -> emulate_lock_instr()

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong bbd7b7091b HV: re-use split-lock emulation code for uc-lock
Split-lock emulation can be re-used for uc-lock. In emulate_splitlock(),
it only work if this vmexit is for #AC trap and guest do not handle
split-lock and HV enable #AC for splitlock.
Add another condition to let emulate_splitlock() also work for #GP trap
and guest do not handle uc-lock and HV enable #GP for uc-lock.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong 553d59644b HV: Fix decode_instruction() trigger #UD for emulating UC-lock
When ACRN uses decode_instruction to emulate split-lock/uc-lock
instruction, It is actually a try-decode to see if it is XCHG.
If the instruction is XCHG instruction, ACRN must emulate it
(inject #PF if it is triggered) with peer VCPUs paused, and advance
the guest IP. If the instruction is a LOCK prefixed instruction
with accessing the UC memory, ACRN Halted the peer VCPUs, and
advance the IP to skip the LOCK prefix, and then let the VCPU
Executes one instruction by enabling IRQ Windows vm-exit. For
other cases, ACRN injects the exception back to VCPU without
emulating it.
So change the API to decode_instruction(vcpu, bool full_decode),
when full_decode is true, the API does same thing as before. When
full_decode is false, the different is if decode_instruction() meet unknown
instruction, will keep return = -1 and do not inject #UD. We can use
this to distinguish that an #UD has been skipped, and need inject #AC/#GP back.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-07-21 11:25:47 +08:00
Yonghua Huang 48908d522f hv: minor coding style fix in list.h
To add brakets for '(char *)(ptr)' in MACRO
 container_of(), which may be used recursively.

Tracked-On: #6284
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-07-15 14:18:13 +08:00
Shuo A Liu 4896faaebb hv: dm: Remove aligned attribute of common structures
Common structures are used by DM, kernel, HV. Aligned attribute might
caused structures size mismatch between DM/HV and kernel, as kernel uses
default GCC alignment.

So, make DM/HV also use the default GCC alignment.

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu d170336e90 hv: Remove unused definition
Below data structures definition are deprecated, remove them.

struct acrn_create_vcpu
struct acrn_nmi_entry
struct acrn_vm_pci_msix_remap

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 6e0b12180c hv: dm: Use new power management data structures
struct cpu_px_data		->	struct acrn_pstate_data
struct cpu_cx_data		->	struct acrn_cstate_data
enum pm_cmd_type		->	enum acrn_pm_cmd_type
struct acpi_generic_address	->	struct acrn_acpi_generic_address
cpu_cx_data			->	acrn_cstate_data
cpu_px_data			->	acrn_pstate_data

IC_PM_GET_CPU_STATE		->	ACRN_IOCTL_PM_GET_CPU_STATE

PMCMD_GET_PX_CNT		->	ACRN_PMCMD_GET_PX_CNT
PMCMD_GET_CX_CNT		->	ACRN_PMCMD_GET_CX_CNT
PMCMD_GET_PX_DATA		->	ACRN_PMCMD_GET_PX_DATA
PMCMD_GET_CX_DATA		->	ACRN_PMCMD_GET_CX_DATA

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 98c80d75b8 hv: dm: Use new virtual device management ioctls
IC_ADD_HV_VDEV		->	ACRN_IOCTL_CREATE_VDEV
IC_REMOVE_HV_VDEV	->	ACRN_IOCTL_DESTROY_VDEV
struct acrn_emul_dev	->	struct acrn_vdev

Also, move struct acrn_vdev to acrn_common.h as this structure is used
by both DM and HV.

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 9e7abbb38c dm: Use new MMIO device passthrough management ioctls
IC_ASSIGN_MMIODEV	->	ACRN_IOCTL_ASSIGN_MMIODEV
IC_DEASSIGN_MMIODEV	->	ACRN_IOCTL_DEASSIGN_MMIODEV

struct acrn_mmiodev has slight change. Move struct acrn_mmiodev into
acrn_common.h because it is used by both DM and HV.

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 3625eb7a99 hv: dm: Use new pci device passthrough management ioctls
IC_ASSIGN_PCIDEV		->	ACRN_IOCTL_ASSIGN_PCIDEV
IC_DEASSIGN_PCIDEV		->	ACRN_IOCTL_DEASSIGN_PCIDEV
QUIRK_PTDEV			->	ACRN_PTDEV_QUIRK_ASSIGN
struct acrn_assign_pcidev	->	struct acrn_pcidev

Move struct acrn_pcidev into acrn_common.h because it is used by both
DM and HV.

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 9c910bae44 hv: dm: Use new I/O request data structures
struct vhm_request		->	struct acrn_io_request
union vhm_request_buffer	->	struct acrn_io_request_buffer
struct pio_request		->	struct acrn_pio_request
struct mmio_request		->	struct acrn_mmio_request
struct ioreq_notify		->	struct acrn_ioreq_notify

VHM_REQ_PIO_INVAL		->	IOREQ_PIO_INVAL
VHM_REQ_MMIO_INVAL		->	IOREQ_MMIO_INVAL
REQ_PORTIO			->	ACRN_IOREQ_TYPE_PORTIO
REQ_MMIO			->	ACRN_IOREQ_TYPE_MMIO
REQ_PCICFG			->	ACRN_IOREQ_TYPE_PCICFG
REQ_WP				->	ACRN_IOREQ_TYPE_WP

REQUEST_READ			->	ACRN_IOREQ_DIR_READ
REQUEST_WRITE			->	ACRN_IOREQ_DIR_WRITE
REQ_STATE_PROCESSING		->	ACRN_IOREQ_STATE_PROCESSING
REQ_STATE_PENDING		->	ACRN_IOREQ_STATE_PENDING
REQ_STATE_COMPLETE		->	ACRN_IOREQ_STATE_COMPLETE
REQ_STATE_FREE			->	ACRN_IOREQ_STATE_FREE

IC_CREATE_IOREQ_CLIENT		->	ACRN_IOCTL_CREATE_IOREQ_CLIENT
IC_DESTROY_IOREQ_CLIENT		->	ACRN_IOCTL_DESTROY_IOREQ_CLIENT
IC_ATTACH_IOREQ_CLIENT		->	ACRN_IOCTL_ATTACH_IOREQ_CLIENT
IC_NOTIFY_REQUEST_FINISH	->	ACRN_IOCTL_NOTIFY_REQUEST_FINISH
IC_CLEAR_VM_IOREQ		->	ACRN_IOCTL_CLEAR_VM_IOREQ
HYPERVISOR_CALLBACK_VHM_VECTOR	->	HYPERVISOR_CALLBACK_HSM_VECTOR

arch_fire_vhm_interrupt()	->	arch_fire_hsm_interrupt()
get_vhm_notification_vector()	->	get_hsm_notification_vector()
set_vhm_notification_vector()	->	set_hsm_notification_vector()
acrn_vhm_notification_vector	->	acrn_hsm_notification_vector
get_vhm_req_state()		->	get_io_req_state()
set_vhm_req_state()		->	set_io_req_state()

Below structures have slight difference with former ones.

  struct acrn_ioreq_notify
  strcut acrn_io_request

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 107cae316a hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS
struct acrn_set_vcpu_regs	->	struct acrn_vcpu_regs
struct acrn_vcpu_regs		->	struct acrn_regs
IC_SET_VCPU_REGS		->	ACRN_IOCTL_SET_VCPU_REGS

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu f476ca55ab hv: dm: Use new VM management ioctls
IC_CREATE_VM		->	ACRN_IOCTL_CREATE_VM
IC_DESTROY_VM		->	ACRN_IOCTL_DESTROY_VM
IC_START_VM		->	ACRN_IOCTL_START_VM
IC_PAUSE_VM		->	ACRN_IOCTL_PAUSE_VM
IC_RESET_VM		->	ACRN_IOCTL_RESET_VM

struct acrn_create_vm	->	struct acrn_vm_creation

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 7efe18a84b hv: Use new struct acrn_platform_info to adapt new HSM driver
struct hc_platform_info	->	struct acrn_platform_info
MAX_PLATFORM_LAPIC_IDS	->	ACRN_PLATFORM_LAPIC_IDS_MAX

A layout change to the struct hc_platform_info is that move
max_kata_containers to back of vm_config_size,
		uint16_t max_vcpus_per_vm;
		uint16_t max_vms;
		uint32_t vm_config_size;
		uint64_t max_kata_containers;
Then, they are nature 64-bits aligned.

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 3deb973b7a dm: Use new ioctl ACRN_IOCTL_GET_PLATFORM_INFO
IC_GET_PLATFORM_INFO	->	ACRN_IOCTL_GET_PLATFORM_INFO
struct acrn_vm_config	->	struct acrn_vm_config_header(DM only)
struct platform_info	->	struct acrn_platform_info

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu 9c1caad25a hv: nested: Keep privilege bits sync in shadow EPT entry
Guest may not use INVEPT instruction after enabling any of bits 2:0 from
0 to 1 of a present EPT entry, then the shadow EPT entry has no chance
to sync guest EPT entry. According to the SDM,
"""
Software may use the INVEPT instruction after modifying a present EPT
paging-structure entry (see Section 28.2.2) to change any of the
privilege bits 2:0 from 0 to 1.1 Failure to do so may cause an EPT
violation that would not otherwise occur. Because an EPT violation
invalidates any mappings that would be used by the access that caused
the EPT violation (see Section 28.3.3.1), an EPT violation will not
recur if the original access is performed again, even if the INVEPT
instruction is not executed.
"""

Sync the afterthought of privilege bits from guest EPT entry to shadow
EPT entry to cover above case.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-02 09:24:12 +08:00
Shuo A Liu a431cff94e hv: Use 64 bits definition for 64 bits MSR_IA32_VMX_EPT_VPID_CAP operation
MSR_IA32_VMX_EPT_VPID_CAP is 64 bits. Using 32 bits MACROs with it may
cause the bit expression wrong.

Unify the MSR_IA32_VMX_EPT_VPID_CAP operation with 64 bits definition.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-02 09:24:12 +08:00
Rong Liu 321e560968 hv: add max payload to vrp
It seems important that passthru device's max payload settings match
the settings on the native device otherwise passthru device may not work.
So we have to set vrp's max payload capacity as native root port
otherwise we may accidentally change passthru device's max payload
since during guest OS's pci device enumeration, pass-thru device will
renegotiate its max payload's setting with vrp.

Tracked-On: #5915

Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-15 08:53:53 +08:00
Victor Sun 50868dd594 HV: ramdisk and kernel load addr improve
For ramdisk, need to double check the limit of ramdisk GPA when locate
ramdisk load addr;

For SOS kernel load addr, need not to consider position of hypervisor
start and end address since the range has been set to e820 RESERVED.

Tracked-On: #5879

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 21:50:22 +08:00
Victor Sun e371432695 HV: avoid pre-launched VM modules being corrupted by SOS kernel load
When hypervisor boots, the multiboot modules have been loaded to host space
by bootloader already. The space range of pre-launched VM modules is also
exposed to SOS VM, so SOS VM kernel might pick this range to extract kernel
when KASLR enabled. This would corrupt pre-launched VM modules and result in
pre-launched VM boot fail.

This patch will try to fix this issue. The SOS VM will not be loaded to guest
space until all pre-launched VMs are loaded successfully.

Tracked-On: #5879

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 1b3a75c984 HV: place kernel and ramdisk by find_space_from_ve820()
We should not hardcode the VM ramdisk load address right after kernel
load address because of two reasons:
	1. Per Linux kernel boot protocol, the Kernel need a size of
	   contiguous memory(i.e. init_size field in zeropage) from
	   its load address to boot, then the address would overlap
	   with ramdisk;
	2. The hardcoded address could not be ensured as a valid address
	   in guest e820 table, especially with a huge ramdisk;

Also we should not hardcode the VM kernel load address to its pref_address
which work for non-relocatable kernel only. For a relocatable kernel,
it could run from any valid address where bootloader load to.

The patch will set the VM kernel and ramdisk load address by scanning
guest e820 table with find_space_from_ve820() api:
	1. For SOS VM, the ramdisk has been loaded by multiboot bootloader
	   already so set the load address as module source address,
	   the relocatable kernel would be relocated to a appropriate address
	   out space of hypervisor and boot modules to avoid guest memory
	   copy corruption;
	2. For pre-launched VM, the kernel would be loaded to pref_address
	   first, then ramdisk will be put to a appropriate address out space
	   of kernel according to guest memory layout and maximum ramdisk
	   address limit under 4GB;

Tracked-On: #5879

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun eca245760a HV: create guest efi memmap for SOS VM
The SOS VM should not use host efi memmap directly, since there are some
memory ranges which reserved by hypersior and pre-launched VM should not
be exposed to SOS VM. These memory ranges should be filtered from SOS VM
efi memmap, otherwise it would caused unexpected issues. For example, The
SOS kernel kaslr will try to find the random address for extracted kernel
image in EFI table first. So it's possible that these reserved memory is
picked for extracted kernel image. This will make SOS kernel boot fail.

The patch would create efi memmory map for SOS VM and pass the memory map
info to zeropage for loading SOS VM kernel. The boot service related region
in host efi memmap is also kept for SOS VM so that SOS VM could have full
capability of EFI services as host.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun a966ed70c2 HV: correct bootargs module size
The bootargs module represents a string buffer and there is a NULL char at
the end so its size should not be calculated by strnlen_s(), otherwise the
NULL char will be ignored in gpa copy and result in kernel boot fail;

Tracked-On: #6162

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 268d4c3f3c HV: boot guest with boot params
Previously the load GPA of LaaG boot params like zeropage/cmdline and
initgdt are all hard-coded, this would bring potential LaaG boot issues.

The patch will try to fix this issue by finding a 32KB load_params memory
block for LaaG to store these guest boot params.

For other guest with raw image, in general only vgdt need to be cared of so
the load_params will be put at 0x800 since it is a common place that most
guests won't touch for entering protected mode.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun ed97022646 HV: add find_space_from_ve820() api
The API would search ve820 table and return a valid GPA when the requested
size of memory is available in the specified memory range, or return
INVALID_GPA if the requested memory slot is not available;

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 6127c0c5d2 HV: modify low 1MB area for pre-launched VM e820
The memory range of [0xA0000, 0xFFFFF] is a known reserved area for BIOS,
actually Linux kernel would enforce this area to be reserved during its
boot stage. Set this area to usable would cause potential compatibility
issues.

The patch set the range to reserved type to make it consistent with the
real world.

BTW, There should be a EBDA(Entended BIOS DATA Area) with reserved type
exist right before 0xA0000 in real world for non-EFI boot. But given ACRN
has no legacy BIOS emulation, we simply skipped the EBDA in vE820.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 9dfac7a7a3 HV: init hv_e820 from efi mmap if boot from uefi
Hypervisor use e820_alloc_memory() api to allocate memory for trampoline code
and ept pages, whereas the usable ram in hv_e820 might include efi boot service
region if system boot from uefi environment, this would result in some uefi
service broken in SOS. These boot service region should be filtered from
hv_e820.
This patch will parse the efi memory descriptor entries info from efi memory
map pointer when system boot from uefi environment, and then initialize hv_e820
accordingly, that all efi boot service region would be kept as reserved in
hv_e820.

Please note the original efi memory map could be above 4GB address space,
so the efi memory parsing process must be done after enable_paging().

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 9ac8e292fd HV: add efi memory map parsing function
When hypervisor boot from efi environment, the efi memory layout should be
considered as main memory map reference for hypervisor use. This patch add
function that parses the efi memory descriptor entries info from efi memory
map pointer and stores the info into a static hv_memdesc[] array.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 4e1deab3d9 HV: init paging before init e820
With this patch, the hv_e820 will be initialized after enable paging. This
is because the hv_e820 will be initialized from efi mmap when system boot
from uefi, which the efi mmap could be above 4G space.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 82c28af404 HV: modularization: rename mi_acpi_rsdp_va to acpi_rsdp_va
The simply rename mi_acpi_rsdp_va in acrn_boot_info struct to acpi_rsdp_va;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 4774c79da0 HV: modularizatoin: refine efi_info struct usage in acrn boot info
This patch has below changes:
	1. rename mi_efi_info to uefi_info in struct acrn_boot_info;

	2. remove redundant "efi_" prefix for efi_info struct members;

	3. The efi_info structure in acrn_boot_info struct is defined as
	   same as Linux kernel so the native efi info from boot loader
	   is passed to SOS zeropage with memcpy() api directly. Now replace
	   memcpy() with detailed struct member assignment;

	4. add boot_from_uefi() api;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 82a1d4406c HV: modularization: use abi_mmap struct in acrn boot info
Use more generic abi_mmap struct to replace multiboot_mmap struct in
acrn_boot_info;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun c59ea6c250 HV: modularization: use abi_module struct in acrn boot info
Use more generic abi_module struct to replace multiboot_module struct in
acrn_boot_info;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 16624bab5e HV: modularization: use loader_name char array in acrn boot info
The patch has below changes:
	1. rename mi_loader_name in acrn_boot_info struct to loader_name;
	2. change loader_name type from pointer to array to avoid accessing
	   original multiboot info region;
	3. remove mi_drivers_length and mi_drivers_addr which are never used;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 484d3ec9df HV: modularization: use cmdline char array in acrn boot info
The name of mi_cmdline in acrn_boot_info structure would cause confusion with
mi_cmdline in multiboot_info structure, rename it to cmdline. At the same time,
the data type is changed from pointer to array to avoid accessing the original
multiboot info region which might be used by other software modules.

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun b11dfb6f20 HV: modularization: add boot.c to wrap multiboot module
Add a wrapper API init_acrn_boot_info() so that it could be used to boot
ACRN with any boot protocol;

Another change is change term of multiboot1 to multiboot because there is
no such term officially;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 28b7cee412 HV: modularization: rename multiboot.h to boot.h
Given the structure in multiboot.h could be used for any boot protocol,
use a more generic name "boot.h" instead;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun e8f726e321 HV: modularization: remove mi_flags from acrn boot info
The mi_flags is not needed any more so remove it from acrn_boot_info struct;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun 8f24d91108 HV: modularization: name change on acrn_multiboot_info
The acrn_multiboot_info structure stores acrn specific boot info and should
not be limited to support multiboot protocol related structure only.

This patch only do below changes:

	1. change name of acrn_multiboot_info to acrn_boot_info;
	2. change name of mbi to abi because of the change in 1, also the
	   naming might bring confusion with native multiboot info;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun b0e1d610d2 HV: modularization: move module check to sanitize multiboot info
ACRN used to support deprivileged boot mode which do not need multiboot
modules, while direct boot mode need multiboot modules at lease for
service VM bzImage, so ACRN postponed the multiboot modules sanity check
in init_vm_boot_info.

Now deprivileged boot mode was totally removed, so we can do multiboot
module check in sanitize_acrn_multiboot_info().

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-11 10:06:02 +08:00
Liang Yi 400d31916a doc: update timer HLD doc after modularization
Replace rdstc() and get_tsc_khz() with their architectural agnostic
counterparts cpu_ticks() and cpu_tickrate().

Tracked-On: #5920
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-06-09 17:11:25 -04:00
Shuo A Liu d965f6e6a1 hv: Enlarge E820_MAX_ENTRIES to 64
e820_alloc_memory() splits one E820 entry into two entries. With vEPT
enabled, e820_alloc_memory() is called one more. On some platforms, the
e820 entries might exceed 32.

Enlarge E820_MAX_ENTRIES to 64. Please note, it must be less than 128
due to constrain of zeropage. Linux kernel defines it as 128.

Tracked-On: #6168
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Shuo A Liu 9ae32f96af hv: Wrap same code as a static function
vmptrld_vmexit_handler() has a same code snippet with
vmclear_vmexit_handler(). Wrap the same code snippet as a static
function clear_vmcs02().

There is only a small logic change that add
	nested->current_vmcs12_ptr = INVALID_GPA
in vmptrld_vmexit_handler() for the old VMCS. That's reasonable.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Shuo A Liu 387ea23961 hv: Rename get_ept_entry() to get_eptp()
get_ept_entry() actually returns the EPTP of a VM. So rename it to
get_eptp() for readability.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Zide Chen b6b5373818 hv: deny access to HV owned legacy PIO UART from SOS
We need to deny accesses from SOS to the HV owned UART device, otherwise
SOS could have direct access to this physical device and mess up the HV
console.

If ACRN debug UART is configured as PIO based, For example,
CONFIG_SERIAL_PIO_BASE is generated from acrn-config tool, or the UART
config is overwritten by hypervisor parameter "uart=port@<port address>",
it could run into problem if ACRN doesn't emulate this UART PIO port
to SOS. For example:

- none of the ACRN emulated vUART devices has same PIO port with the
  port of the debug UART device.
- ACRN emulates PCI vUART for SOS (configure "console_vuart" with
  PCI_VUART in the scenario configuration)

This patch fixes the above issue by masking PIO accesses from SOS.
deny_hv_owned_devices() is moved after setup_io_bitmap() where
vm->arch_vm.io_bitmap is initialized.

Commit 50d852561 ("HV: deny HV owned PCI bar access from SOS") handles
the case that ACRN debug UART is configured as a PCI device. e.g.,
hypervisor parameter "uart=bdf@<BDF value>" is appended.

If the hypervisor debug UART is MMIO based, need to configured it as
a PCI type device, so that it can be hidden from SOS.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-08 16:16:14 +08:00
Yonghua Huang 25c0e3817e hv: validate input for dmar_free_irte function
Malicious input 'index' may trigger buffer
 overflow on array 'irte_alloc_bitmap[]'.

 This patch validate that 'index' shall be
 less than 'CONFIG_MAX_IR_ENTRIES' and also
 remove unnecessary check on 'index' in
 'ptirq_free_irte()' function with this fix.

Tracked-On: #6132
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-06-08 09:03:10 +08:00
Yonghua Huang 4acaeb91bd hv: remove unnecessary ASSERT in vlapic_write
vlapic_write handle 'offset' that is valid and ignore
 all other invalid 'offset'. so ASSERT on this 'offset'
 input is unnecessary.

 This patch removes above ASSERT to avoid potential
 hypervisor crash by guest malicious input when debug
 build is used.

Tracked-On: #6131
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-06-08 09:03:10 +08:00
Tao Yuhong cb75de2163 HV: vpci: refine vbar sizing
For a pci BAR, its size aligned bits have fixed to 0(except the memory
type bits, they have another fixed value), they are read-only.
When write ~0U to BAR for sizing, (type_bits | size_mask) is written
into BAR.
So do not need to distinguish between sizing vBAR and programming vBAR.
When write a value to vBAR, always store (value & size_mask | type_bit)
to vfcg.
pci_vdev_read_vbar() is unnecessary, because it is only need to read
vcfg.

Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
2021-06-08 08:39:01 +08:00
Tao Yuhong 5ecca6b256 HV: vpci: check if address is in VM BAR MMIO space
When guest doing BAR re-programming, we should check whether
the base address of the BAR is valid.This patch does this check by:
1. whether the gpa is located in the responding MMIO window
2. whether the gpa is aligned with the BAR size

Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
2021-06-08 08:39:01 +08:00
Tao Yuhong 7da53ce138 HV: vpci: Fix do not mask I/O BAR upper 16-bit
Now we use pci_vdev_update_vbar_base to update vBAR base address when
guest re-programming BAR. For a IO BAR, we would calculate the 32 bits
base address then mask the high 16 bits. However, the mask code would
never be called since the first if condition statement is always true.

This patch fix it by move the unamsk code into the first if condition
statement.

Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
2021-06-08 08:39:01 +08:00
Shuo A Liu 15e6c5b9cf hv: nested: audit guest EPT mapping during shadow EPT entries setup
generate_shadow_ept_entry() didn't verify the correctness of the requested
guest EPT mapping. That might leak host memory access to L2 VM.

To simplify the implementation of the guest EPT audit, hide capabilities
'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition,
minimize the attribute bits of EPT entry when create a shadow EPT entry.
Also, for invalid requested mapping address, reflect the EPT_VIOLATION to
L1 VM.

Here, we have some TODOs:
1) Enable large page support in generate_shadow_ept_entry()
2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly.
3) Minimize EPT entry attributes.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 3110e70d0a hv: nested: INVEPT emulation supports shadow EPT
L1 VM changes the guest EPT and do INVEPT to invalidate the previous
TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction
to do the update.

The target shadow EPTs can be found according to the 'type' of INVEPT.
Here are two types and their target shadow EPT,
  1) Single-context invalidation
     Get the EPTP from the INVEPT descriptor. Then find the target
     shadow EPT.
  2) Global invalidation
     All shadow EPTs of the L1 VM.

The INVEPT emulation handler invalidate all the EPT entries of the
target shadow EPTs.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 1dc7b7f798 hv: nested: Introduce shadow EPT release function
When a shadow EPT is not used anymore, its resources need to be
released.

free_sept_table() is introduced to walk the whole shadow EPT table and
free the pagetable pages.

Please note, the PML4E page of shadow EPT is not freed by
free_sept_table() as it still be used to present a shadow EPT pointer.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu b10b5658bd hv: nested: Introduce L2 VM EPT VIOLATION handler
With shadow EPT, the hypervisor walks through guest EPT table:

  * If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
    to L1 VM and resumes to L1 VM.

  * If the entry is present in guest EPT, do the EPT_MISCONFIG check.
    Inject EPT_MISCONFIG to L1 VM if the check failed.

  * If the entry is present in guest EPT, do permission check.
    Reflect EPT_VIOLATION to L1 VM if the check failed.

  * If the entry is present in guest EPT but shadow EPT entry is not
    present, create the shadow entry and resumes to L2 VM.

  * If the entry is present in guest EPT but the GPA in the entry is
    invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 8565750bbe hv: nested: Hide some capability bits from L1 guest
* Hide 5 level EPT capability, let L1 guest stick to 4 level EPT.

 * Access/Dirty bits are not support currently, hide corresponding EPT
   capability bits.

 * "Mode-based execute control for EPT" is also not support well
   currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 540a484147 hv: nested: Manage shadow EPTP according to guest VMCS change
'struct nept_desc' is used to associate guest EPTP with a shadow EPTP.
It's created in the first reference and be freed while no reference.

The life cycle seems like,

While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc'
of the new guest EPTP is referenced; the 'struct nept_desc' of the old
guest EPTP is dereferenced.

While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc'
of the old guest EPTP is dereferenced.

While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct
nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc'
of the old guest EPTP is dereferenced.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 10ec896f99 hv: nested: Introduce shadow EPT infrastructure
To shadow guest EPT, the hypervisor needs construct a shadow EPT for each
guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP
(EPT pointer). This patch provides following structure to do the association.

	struct nept_desc {
	       /*
	        * A shadow EPTP.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a HVA of the hypervisor.
	        */
	       uint64_t shadow_eptp;
	       /*
	        * An guest EPTP configured by L1 VM.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a GPA of the L1 VM.
	        */
	       uint64_t guest_eptp;
	       uint32_t ref_count;
	};

Due to lack of dynamic memory allocation of the hypervisor, a array
nept_bucket of type 'struct nept_desc' is introduced to store those
association information. A guest EPT might be shared between different
L2 vCPUs, so this patch provides several functions to handle the
reference of the structure.

Interface get_shadow_eptp() also is introduced. To find the shadow EPTP
of a specified guest EPTP.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu 17bc7f08c9 hv: nested: Create a page pool for shadow EPT construction
Shadow EPT uses lots of pages to construct the shadow page table. To
utilize the memory more efficient, a page poll sept_page_pool is
introduced.

For simplicity, total platform RAM size is considered to calculate the
memory needed for shadow page tables. This is not an accurate upper
bound.  This can satisfy typical use-cases where there is not a lot
of overcommitment and sharing of memory between L2 VMs.

Memory of the pool is marked as reserved from E820 table in early stage.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Zide Chen 811e367ad9 hv: nested: implement nested VM exit handler
Nested VM exits happen when vCPU is in guest mode (VMCS02 is current).
Initially we reflect all nested VM exits to L1 hypervisor. To prepare
the environment to run L1 guest:

- restore some VMCS fields to the value as what L1 hypervisor programmed.
- VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing.
- load the non-shadowing host states from VMCS12 to VMCS01 guest states.
- VMRESUME to L1 guest with this modified VMCS01.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen 22d225663f hv: nested: update run_vcpu() function for nested case
Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we
can skip these handling in run_vcpu().

And be careful that we can't cache L2 registers in struct acrn_vcpu.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen 4acc65eacc hv: nested: support for INVEPT and INVVPID emulation
invvpid and invept instructions cause VM exits unconditionally.
For initial support, we pass all the instruction operands as is
to the pCPU.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen 4c29a0bb29 hv: nested: support for VMLAUNCH and VMRESUME emulation
Implement the VMLAUNCH and VMRESUME instructions, allowing a L1
hypervisor to run nested guests.

- merge VMCS control fields and VMCS guest fields to VMCS02
- clear shadow VMCS indicator on VMCS02 and load VMCS02 as current
- set VMCS12 launch state to "launched" in VMLAUNCH handler

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Yonghua Huang 1a6ead9af5 hv: update RTCT ACPI table detecting
Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2).
 and the MAGIC number in CRL header is also changed from "PTCM"
 to "RTCM".

 This patch refine the code to detect RTCT table for both
 v1 and v2.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-06-01 08:22:20 +08:00
Rong Liu 43acbdd345 hv: ensure PTM root is always enabled in hw
For post launch VM, ACRN supports PTM under these conditions:
1. HW implements a simple PTM hierarchy: PTM requestor device (ep) is
directly connected to PTM root capable root port. Or
2. ptm requestor itself is root complex integrated ep.
Currently acrn doesn't support emulation of other type of PTM hiearchy, such
as if there is an intermediate PTM node (for example, switch) inbetween
PTM requestor and PTM root.
To avoid VM touching physical hardware, acrn hv ensures PTM is always enabled
in the hardware.
During hv's pci init, if root port is ptm capable,
hv will enable PTM on that root port.  In addition,
log error (and don't enable PTM) if ptm root
capability is on intermediate node other than root port.

V2:
 - Modify commit messages to clarify the limitation
of current PTM implementation.
 - Fix code that may fail FUSA
 - Remove pci_ptm_info() and put info log inside pci_enable_ptm_root().

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-27 09:00:50 +08:00
Li Fei1 0d5f12e281 hv: vlapic: a minor refine about vlapic_x2apic_pt_icr_access
In physical destination mode, the destination processor is specified by its
local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa,
the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical
Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR.

This patch adds support for a vCPU A could write ICR to send IPI to another
vCPU B which is in different APIC mode.

Tracked-On: #5923
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-27 09:00:08 +08:00
dongshen 3e1fa0fb98 hv: hypercall: 72 reserved bytes are used in struct hc_platform_info to pass physical APIC and cat IDs shift to DM
So that DM can retrieve physical APIC IDs and use them to fill in the ACPI MADT table for
post-launched VMs.

Note:
1. DM needs to use the same logic as hypervisor to calculate vLAPIC IDs based on physical APIC IDs
and CPU affinity setting

2. Using reserved0[] in struct hc_platform_info to pass physical APIC IDs means we can only support at
most 116 cores. And it assumes LAPIC ID is 8bits (X2APIC mode supports 32 bits).

Cat IDs shift will be used by DM RTCT V2

Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-26 11:23:06 +08:00
dongshen 5e3c6ae941 hv: vcpuid: passthrough host CPUID leaf.0BH to guest VMs
Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs
is not sufficient to replicate the host CPU and cache topologies in guest VMs,
we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise,
guest VMs may see weird CPU topology.

Note that in current code, ACRN has already passthroughed host cache CPUID
leaf 04H to guest VMs

Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-26 11:23:06 +08:00
dongshen f332ef15b2 hv: vlapic: use physical APIC IDs as vLAPIC IDs for pre-launched and post-launched VMs
In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS,
and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs.

Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs
would result in wrong CPU and cache topologies showing in the guest VMs,
and could adversely affect performance if the guest VM chooses to detect
CPU and cache topologies and optimize its behavior accordingly.

Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration
CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched
and post-Launched VMs.

Tracked-On: #6020
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-26 11:23:06 +08:00
Zide Chen 6428ca8f5b hv: VMPTRLD and VMCLEAR VMCS with the common APIs
Remove the direct calls to exec_vmptrld() or exec_vmclear(), and replace
with the wrapper APIs load_va_vmcs() and clear_va_vmcs().

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-26 11:22:26 +08:00
Yang,Yu-chu 1de47b0433 config-tools: build acrn with xslt generated pci_dev.c and board_info.h
Build acrn using xslt transformed pci_dev.c and board_info.h.

Tracked-On: #6024
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2021-05-24 21:53:22 +08:00
Zide Chen 6d69058a9d hv: nested: support for VMREAD and VMWRITE emulation
This patch implements the VMREAD and VMWRITE instructions.

When L1 guest is running with an active VMCS12, the “VMCS shadowing”
VM-execution control is always set to 1 in VMCS01. Thus the possible
behavior of VMREAD or VMWRITE from L1 could be:

- It causes a VM exit to L0 if the bit corresponds to the target VMCS
  field in the VMREAD bitmap or VMWRITE bitmap is set to 1.
- It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in
  our case) if the above mentioned bit is set to 0.

This patch handles the VMREAD and VMWRITE VM exits in this way:

- on VMWRITE, it writes the desired VMCS value to the respective field
  in the cached VMCS12. For VMCS fields that need to be synced to VMCS02,
  sets the corresponding dirty flag.

- on VMREAD, it reads the desired VMCS value from the cached VMCS12.

Tracked-On: #5923
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen 2bd269c11c hv: nested: support for VMCLEAR emulation
This patch is to emulate VMCLEAR instruction.

L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of
these: active and current, active but not current, not yet VMPTRLDed.

To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to
"clear", and if L0 already cached this VMCS12, need to sync it back to
guest memory:

- sync shadow fields from shadow VMCS VMCS to cache VMCS12
- copy cache VMCS12 to L1 guest memory

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen 5379b14108 hv: nested: define VMCS shadow fields
Enable VMCS shadowing for most of the VMCS fields, so that execution of
the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor
won't cause VM exits, but read from or write to the shadow VMCS.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen 863e58e539 hv: nested: define software layout for VMCS12 and helper functions
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.

ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen f5744174b5 hv: nested: support for VMPTRLD emulation
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.

- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
  flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
  1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
  2) initialize VMCS02 revision ID and host-state area
  3) load shadow fields from cache VMCS12 to VMCS02
  4) enable VMCS shadowing before L1 Vm entry

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen 0a1ac2f4a0 hv: nested: support for VMXOFF emulation
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.

- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
  for VMXOFF and other VMX instructions.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen 3fdad3c6d1 hv: nested: check prerequisites to enter VMX operation
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.

Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen fc8f07e740 hv: nested: support for VMXON emulation
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Tao Yuhong b93d6b2ef0 HV: Fix mistake use stac() & clac()
The commit 2ab70f43e5
HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV

It is wrong in using stac()/clac()

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-24 10:32:54 +08:00
Li Fei1 6077152c4b hv: vlapic: extend vlapic_x2apic_pt_icr_access to support more destination mode
Now guest would use `Destination Shorthand` to broadcast IPIs if there're more
than one destination. However, it is not supported when the guest is in LAPIC
passthru situation, and all active VCPUs are working in X2APIC mode. As a result,
the guest would not work properly since this kind broadcast IPIs was ignored
by ACRN. What's worse, ACRN Hypervisor would inject GP to the guest in this case.

This patch extend vlapic_x2apic_pt_icr_access to support more destination modes
(both `Physical` and `Logical`) and destination shorthand (`No Shorthand`, `Self`,
`All Including Self` and `All Excluding Self`).

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Li Fei1 a69e67b58b hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand
1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort
2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead
3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Tao Yuhong 2ab70f43e5 HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV
The accrss right of HV RAM can be changed to PAGE_USER (eg. trusty RAM
of post-launched VM). So before using clflush(or clflushopt) to flush
HV RAM cache, must allow explicit supervisor-mode data accesses to
user-mode pages. Otherwise, it may trigger page fault.

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-21 09:20:46 +08:00
Victor Sun 5df3455e8d HV: remove SOS kernel hugepage related bootargs
Hypervisor does not need to care about hugepage settings in SOS kernel, user
could enable these settings in the scenario config file or GRUB menu.

Tracked-On: #5815
Signed-off-by: Victor Sun <victor.sun@intel.com>
2021-05-20 13:31:56 +08:00
Victor Sun ef737c3089 HV: refine init_vm_bootargs_info()
changes:
	1. The VM load order type condition is not needed, since the function
	   is called only when create SOS VM or pre-launched VM;
	2. Fixed wrong parameter of fill_seed_arg() which introduced by commit
	   80262f0602.
	3. More comments on why multiboot string could override the pre-
	   configured VM bootargs and why append multiboot cmdline to SOS VM
	   bootargs;

Tracked-On: #5815
Signed-off-by: Victor Sun <victor.sun@intel.com>
2021-05-20 13:31:56 +08:00
Helmut Buchsbaum b673bc36b1 Makefile: honor BUILD_VERSION and BUILD_TAG
Use BUILD_VERSION an BUILD_TAG variable also for hypervisor,
acrnprobe and crashlog. This eases build from an archive without
git available.

Tracked-On: #6035
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
2021-05-20 10:02:33 +08:00
Helmut Buchsbaum 596f27bad3 Makefile: Make builds reproducible
Make builds reproducible by honoring SOURCE_DATE_EPOCH and USER
environment variables in the respective Makefiles. Just follow the
recommendations at https://reproducible-builds.org/

Build tools (e.g. Debian packaging, Yocto) use this to ensure reproducibility
of packages.

Tracked-On: #6035
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
2021-05-20 10:02:33 +08:00
Rong Liu 3db4491e1c hv: PTM: Create virtual root port
Create virtual root port through add_vdev hypercall. add_vdev
identifies the virtual device to add by its vendor id and device id, then
call the corresponding function to create virtual device.

	-create_vrp(): Find the right virtual root port to create
by its secondary bus number, then initialize the virtual root port.
And finally initialize PTM related configurations.

	-destroy_vrp(): nothing to destroy

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
2021-05-19 13:54:24 +08:00
Rong Liu d57bf51c89 hv: PTM: Add virtual root port
Add virtual root port that supports the most basic pci-e bridge and root port operations.

	- init_vroot_port(): init vroot_port's basic registers.

	- deinit_vroot_port(): reset vroot_port

	- read_vroot_port_cfg(): read from vroot_port's virtual config space.

	- write_vroot_port_cfg(): write to vroot_port's virtual config space.

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
2021-05-19 13:54:24 +08:00
Rong Liu 7e93f31e2c dm: PTM: Add virtual root port to vm
If PTM can be enabled on passthru device, a virtual root port
	is added to vm to act as ptm root.  And the passthru device is
	connected to the virtual root port instead of the virtual host bridge.

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
2021-05-19 13:54:24 +08:00
Liang Yi 6805510d77 hv/mod_timer: refine timer interface
1. do not allow external modules to touch internal field of a timer.
2. make timer mode internal, period_in_ticks will decide the mode.

API wise:
1. the "mode" parameter was taken out of initialize_timer().
2. a new function update_timer() was added to update the timeout and
   period fields.
3. the timer_expired() function was extended with an output parameter
   to return the remaining cycles before expiration.

Also, the "fire_tsc" field name of hv_timer was renamed to "timeout".
With the new API, however, this change should not concern user code.

Tracked-On: #5920

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi 3547c9cd23 hv/mod_timer: make timer into an arch-independent module
x86/timer.[ch] was moved to the common directory largely unchanged.

x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.

Other than these two functions, the timer module is largely arch
agnostic.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi 51204a8d11 hv/mod_timer: separate delay functions from the timer module
Modules that use udelay() should include "delay.h" explicitly.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi 5a2b89b0a4 hv/mod_timer: split tsc handling code from timer.
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.

The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-05-18 16:43:28 +08:00
Yonghua Huang 00b3a28d5d hv: update RTCT parser to support RTCT version 2
RTCT has been updated to version 2,
  this patch updates hypervisor RTCT parser to support
  both version 1 and version 2 of RTCT.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
2021-05-17 17:19:11 +08:00
Yonghua Huang 9facbb43b3 config-tool: rename PSRARM to SSRAM
'psram' and 'PSRAM' are legacy names and replaced
  with 'ssram' and 'SSRAM' respectively.

Tracked-On: #6012
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Shuang Zheng <shuang.zheng@intel.com>
2021-05-17 14:31:42 +08:00
Zide Chen c9982e8c7e hv: nested: setup emulated VMX MSRs
We emulated these MSRs:

- MSR_IA32_VMX_PINBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS2
- MSR_IA32_VMX_EXIT_CTLS
- MSR_IA32_VMX_ENTRY_CTLS
- MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc.
- MSR_IA32_VMX_MISC

For the following MSRs, we pass through the physical value to L1 guests:

- MSR_IA32_VMX_EPT_VPID_CAP
- MSR_IA32_VMX_VMCS_ENUM
- MSR_IA32_VMX_CR0_FIXED0
- MSR_IA32_VMX_CR0_FIXED1
- MSR_IA32_VMX_CR4_FIXED0
- MSR_IA32_VMX_CR4_FIXED1

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen 4930992118 hv: nested: implement the framework for VMX MSR emulation
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
  emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].

In this way we can take advantage of the existing infrastructure to
emulate these MSRs.

Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen 97df220f49 hv: vmsr: emulate IA32_FEATURE_CONTORL MSR for nested virtualization
In order to support nested virtualization, need to expose the "Enable VMX
outside SMX operation" bit to L1 hypervisor.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Yonghua Huang e9870893a3 hv: rename some software SRAM local names
For simplification purpose, use 'ssram' instead of
 'software sram' for local names inside rtcm module.

Tracked-On: #6015
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 10:08:17 +08:00
Li Fei1 30febed0e1 hv: cache: wrap common APIs
Wrap three common Cache APIs:
- flush_invalidate_all_cache
- flush_cacheline
- flush_cache_range

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1 77e64f6092 hv: tlb: wrap common APIs
Wrap two common TLB APIs: flush_tlb and flush_tlb_range.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1 d94582389e hv: mmu: move arch specific parts into cpu.h
Move Cache/TLB arch specific parts into cpu.h
After this change, we should not expose arch specific parts out from mmu.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1 d6362b6e0a hv: paging: rename ppt_set/clear_ATTR to set_paging_ATTR
Rename ppt_set/clear_(attribute) to set_paging_(attribute)

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Zide Chen ccfdf9cdd7 hv: nested: enable nested virtualization
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:

- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
  so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
  is able to be set.

Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.

Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.

Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Zide Chen dd90eccc25 hv: move invvpid and invept helper code from mmu.c to mmu.h
moving invvpid and invept helper code from mmu.c to mmu.h, so that they
can be accessed by the nested virtualization code.

No logical changes.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Zide Chen 7e1ac8a74e config-tools: add NVMX_ENABLED feature and GUEST_FLAG_NVMX_ENABLED flag
NVMX_ENABLED: ACRN is built to support nested virtualization if set.

GUEST_FLAG_NVMX_ENABLED: indicates that the VMX capability can be present
in this guest to run nested VMs.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Shuo A Liu 3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
dongshen ebadf00de8 hv: some coding style fixes
Fix issues reported by checkpatch.pl

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-12 16:50:34 +08:00
Junjie Mao ea4eadf0a5 hv: hypercalls: refactor permission-checking and dispatching logic
The current permission-checking and dispatching mechanism of hypercalls is
not unified because:

  1. Some hypercalls require the exact vCPU initiating the call, while the
     others only need to know the VM.
  2. Different hypercalls have different permission requirements: the
     trusty-related ones are enabled by a guest flag, while the others
     require the initiating VM to be the Service OS.

Without a unified logic it could be hard to scale when more kinds of
hypercalls are added later.

The objectives of this patch are as follows.

  1. All hypercalls have the same prototype and are dispatched by a unified
     logic.
  2. Permissions are checked by a unified logic without consulting the
     hypercall ID.

To achieve the first objective, this patch modifies the type of the first
parameter of hcall_* functions (which are the callbacks implementing the
hypercalls) from `struct acrn_vm *` to `struct acrn_vcpu *`. The
doxygen-style documentations are updated accordingly.

To achieve the second objective, this patch adds to `struct hc_dispatch` a
`permission_flags` field which specifies the guest flags that must ALL be
set for a VM to be able to invoke the hypercall. The default value (which
is 0UL) indicates that this hypercall is for SOS only. Currently only the
`permission_flag` of trusty-related hypercalls have the non-zero value
GUEST_FLAG_SECURE_WORLD_ENABLED.

With `permission_flag`, the permission checking logic of hypercalls is
unified as follows.

  1. General checks
     i. If the VM is neither SOS nor having any guest flag that allows
        certain hypercalls, it gets #UD upon executing the `vmcall`
        instruction.
    ii. If the VM is allowed to execute the `vmcall` instruction, but
        attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0).
  2. Hypercall-specific checks
     i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the
        initiating VM must be SOS and the specified target VM cannot be a
        pre-launched VM. Otherwise the hypercall returns -EINVAL without
        further actions.
    ii. If the hypercall requires certain guest flags, the initiating VM
        must have all the required flags. Otherwise the hypercall returns
        -EINVAL without further actions.
   iii. A hypercall with an unknown hypercall ID makes the hypercall
        returns -EINVAL without further actions.

The logic above is different from the current implementation in the
following aspects.

  1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts
     to execute `vmcall` in ring 1, 2 or 3.
  2. A pre-launched VM now gets #UD (rather than the return value -EPERM)
     when it attempts to execute a trusty hypercall in ring 0.
  3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it
     attempts to invoke a trusty hypercall.
  4. A post-launched VM with trusty support now gets the return value
     -EINVAL (rather than #UD) when it attempts to invoke a non-trusty
     hypercall or an invalid hypercall.

v1 -> v2:
 - Update documentation that describe hypercall behavior.
 - Fix Doxygen warnings

Tracked-On: #5924
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-12 13:43:41 +08:00
Geoffroy Van Cutsem 86176a30a0 config-tools: fix a couple of typos in helper script
Fix a couple of typos in text displayed by a helper script
used when building ACRN. No functional change made to the
script itself.

Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
2021-05-10 13:18:08 +08:00
Liang Yi 688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00
Yang,Yu-chu 5f2f82f4d8 config-tools: introduce xslt transform and clang-format in genconf.sh
Add "transform" to generate following files with xsltproc in genconf.sh:
  - ivshmem_cfg.h
  - misc_cfg.h
  - pt_intx.c
  - vm_configurations.c
  - vm_configurations.h

Add code formatter using clang-format. It formats the gernerated code
with customized condfiguration if clang-format package and configuraion
file ".clang-format" exist.

Add sed in genconf.sh "transform" to replace the copyright "YEAR" of generated files.

Tracked-On: #5980
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2021-05-07 14:39:08 +08:00
Li Fei1 f3327364c3 hv: mmu: fix a minor bug
We should only map [low32_max_ram, 4G) MMIO region as UC attribute,
not map [low32_max_ram, low32_max_ram + 4G) region as UC attribute.
Otherwise, the HV will complain [4G, low32_max_ram + 4G) region has
already mapped.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-29 08:57:13 +08:00
Geoffroy Van Cutsem 9e838248c3 hv: enable uart=bdf@ for PCI serial ports which bar0 is not MMIO
This patch fixes the 'uart=bdf@XXX' mechanism for the PCI serial
port devices which bar0 is not MMIO.

Tracked-On: #5968
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Signed-off-by: Li Fei <fei1.li@intel.com>
2021-04-29 08:56:33 +08:00
Shuo A Liu dc88c2e397 hv: Save/restore MSR_IA32_CSTAR during context switch
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.

Do the save/resotre MSR_IA32_CSTAR during context switch.

Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 11:21:52 +08:00
Jian Jun Chen 31b8b698ce hv: TLFS: Add tsc_offset support for reference time
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.

This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen b4312efbd7 hv: TLFS: inject #GP to guest VM for writing of read-only MSRs
TLFS spec defines that HV_X64_MSR_VP_INDEX and HV_X64_MSR_TIME_REF_COUNT
are read-only MSRs. Any attempt to write to them results in a #GP fault.

Fix the issue by returning error in handler hyperv_wrmsr() of MSRs
HV_X64_MSR_VP_INDEX/HV_X64_MSR_TIME_REF_COUNT emulation.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen dd524d076d hv: TLFS: Setup hypercall page according to the vcpu mode
TLFS spec defines different hypercall ABIs for X86 and x64. Currently
x64 hypercall interface is not supported well.

Setup the hypercall interface page according to the vcpu mode.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Yonghua Huang 6d5759a260 hv: bugfix in min() and max() MACROs
These two MACROs shall be wrapped as a single
 value respectively, hence brackets should be used.

Tracked-On: #5951
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-04-23 09:58:21 +08:00
Li Fei1 628bca5cad hv: pgtable: use new algo to calculate PPT/EPT_PD_PAGE_NUM
In order to support platform (such as Ander Lake) which physical address width
bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)).
This is a complete waste of memory.

This patch would reserve PD page by three parts:
1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most;
2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most;
3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus
PDPT entries if its size is larger than 1GB ) at most for:
(a) MMIO BAR size must be a power of 2 from 16 bytes;
(b) MMIO BAR base address must be power of two in size and are aligned with
its size.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1 053c09e764 hv: cpu_cap: PAW over 39 bits must support 1GB large page
The platform which physical-address width over 39 bits must support
1GB large page (Both MMU and VMX sides ). This could save lots of
page table pages for EPT MMIO mapping.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1 41e2d40d1f hv: e820: remove get_mem_range_info
No one uses get_mem_range_info to get the top/bottom/size of the physical memory.
We could get these informations by e820 table easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1 3a465388d4 hv: guest: remove get_mem_range_info in prepare_sos_vm_memmap
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address of SOS. This assumes the platform must have
high memory space.

This patch removes the assumption. It will set high 64 bits max memory address of
SOS to 4G by default (Which means there's no 64 bits high memory), then update
the high 64 bits max memory address if the SOS really has high memory space.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1 901e8c869e hv: vE820: calculate SOS memory size by vE820 tables
SOS's memory size could be calculated by its vE820 Tables easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1 ad15053304 hv: mmu: remove get_mem_range_info in init_paging
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address. This assumes the platform must have high
memory space.

This patch calculates the high 64 bits max memory address according the e820 tables
and removes the assumption "The platform must have high memory space" by map the
low RAM region and high RAM region separately.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1 6137347411 hv: smp: fix an isuue about SMP sync
Now BSP may launch VMs before APs have not done its initilization,
for example, sched_control for per-cpu. However, when we initilize
the vcpu thread data, it will access the object (scheduler) of the
sched_control of APs. As a result, it will trigger the PF.

This patch would waits each physical has done its initilization before
to continue to execute.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00
Li Fei1 5f281df548 hv: serializng: use mfence to ensure trampoline code was updated
Using the MFENCE to make sure trampoline code
has been updated (clflush) into memory beforing start APs.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00
Li Fei1 e049abb542 hv: vcpuid: hide new cpuid 0x1b/0x1f
Hide CPUID 0x1b (PCONFIG) and 0x1f (Extended Topology Enumeration Leaf)

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-20 13:28:44 +08:00
Li Fei1 31f48d12a2 hv: memory order: use mfence to strengthen the fast string operations order
Use MFENCE to strengthen the fast string operations execute order to ensure
all trampoline code was updated before flush it into the memory.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-20 13:28:44 +08:00
Yifan Liu b80c388b52 hv: Hide HLAT to guest
For platform with HLAT (Hypervisor-managed Linear Address Translation)
capability, the hypervisor shall hide this feature to its guest.

This patch adds MSR_IA32_VMX_PROCBASED_CTLS3 MSR to unsupported MSR
list.

The presence of this MSR is determined by 1-setting of bit 49 of MSR
MSR_IA32_VMX_PROCBASED_CTLS. which is already in unsupported MSR list. [2]

Related documentations:
[1] Intel Architecture Instruction Set Extensions, version Feb 16, 2021,
Ch 6.12
[2] Intel KeyLocker Specification, Sept 2020, Ch 7.2

Tracked-On: #5895
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-07 13:47:47 +08:00
Junjie Mao 14e8e68d39 Makefile: add missing dependencies for parallel execution of make
This patch adds the following dependencies among recipes:

 - Building of any C file depends on $(HV_CONFIG_TIMESTAMP) which indicates
   the presence of generated configuration files.
 - Source files listed in $(VM_CFG_C_SRCS), which are the generated
   configuration files, depends on $(HV_CONFIG_TIMESTAMP)

With the dependencies above, the build system can now safely be executed in
parallel, e.g. `make -j4`.

Tracked-On: #5874
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-03-29 15:45:56 +08:00
Li Fei1 d1ae797742 hv: pgtable: move sanitize_pte into pagetable.c
sanitize_pte is used to set page table entry to map to an sanitized page to
mitigate l1tf. It should belongs to pgtable module. So move it to pagetable.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1 ef90bb6db3 hv:pgtable: rename lookup_address to pgtable_lookup_entry
lookup_address is used to lookup a pagetable entry by an address. So rename it
to pgtable_lookup_entry to indicate this clearly.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1 36ddd87a09 hv: pgtable: remove alloc_ept_page
alloc_page/free_page should been called in pagetable module. In order to do this,
we add pgtable_create_root and pgtable_create_trusty_root to create PML4 page table
page for normal world and secure world.

After this done, no one uses alloc_ept_page. So remove it.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1 ea701c63c7 hv: pgtable: add pgtable_create_trusty_root
Add pgtable_create_trusty_root to allocate a page for trusty PML4 page table page.
This function also copy PDPT entries from Normal world to Secure world.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1 596c349600 hv: pgtable: add pgtable_create_root
Add pgtable_create_root to allocate a page for PMl4 page table page.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1 eb52e2193a hv: pgtable: refine name for pgtable add/modify/del
Rename mmu_add to pgtable_add_map;
Rename mmu_modify_or_del to pgtable_modify_or_del_map.
And move these functions declaration into pgtable.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
liujunming ffd13a2fe1 hv: vpci: fix msi enable issue under some cases
In VT-d scenario, if MSI interrupt has been enabled,
vCPU writes the content in MSI registers,
and all bits of the content are read-only.

In this case, hypervisor code will call
enable_disable_msi(vdev, false), which will disable MSI.
And there's no chance to call remap_vmsi.
This is wrong behavior, which will result in the disable of MSI.

Tracked-On: #5847

Reviewed-by: Li Fei1 <fei1.li@intel.com>
Signed-off-by: liujunming <junming.liu@intel.com>
2021-03-25 09:39:31 +08:00
Liang Yi 33ef656462 hv/mod-irq: use arch specific header files
Requires explicit arch path name in the include directive.

The config scripts was also updated to reflect this change.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi df36da1b80 hv/mod_irq: do not include x86/irq.h in common/irq.h
Each .c file includes the arch specific irq header file (with full
path) by itself if required.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 741a208a02 hv/mod_irq: cleanup x86 lapic/ioapic header files
Declarations referenced nowhere else are moved into the c file.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 6f0a7016d3 hv/mod_irq: move IPI declarations out of x86/irq.h
They are moved into the new header file x86/notify.h.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi ff732cfb2a hv/mod_irq: move guest interrupt API out of x86/irq.h
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 798015876c hv/mod_irq: move NMI and exception handler out of x86/irq.c
Each of them now resides in a separate .c file.

Tracked-On: #5825

Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 6098648373 hv/mod_irq: cleanup x86/irq.h
Move exception stack layout struct and exception/NMI handling
declarations from x86/irq.h into x86/cpu.h.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 3a50f949e1 hv/mod_irq: split irq.c into arch/x86/irq.c and common/irq.c
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()

API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.

Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi c46e3c71ac hv/mod_irq: decouple irq number reservation from ioapic
This is done be adding irq_rsvd_bitmap as an auxiliary bitmap
besides irq_alloc_bitmap.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi 038e0cae92 hv/mod_irq: split IRQ handling into common and arch specific parts
The common IRQ handling routine calls arch specific functions
pre_irq_arch() and post_irq_arch() before and after calling the
registered action function respectively.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi ac3e0a1718 hv/mod_irq: split irq initialization into common and arch specific parts
The common part initializes the global irq_desc data structure while the
arch specific part initialize the HW and its own irq data.

This is one of the preparation steps for spliting IRQ handling into common
and architecture specific parts.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi f3cae9e258 hv/mod_irq: hide arch specific data in irq_desc
Arch specific IRQ data is now an opaque pointer in irq_desc.

This is a preparation step for spliting IRQ handling into common
and architecture specific parts.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Geoffroy Van Cutsem a7e53dd32f doc: update BDF information for 'uart=' hypervisor parameter
The 'uart=' parameter for the hypervisor takes multiple forms. One
is to specify the BDF (Bus, Device, Function) value of the serial
port PCI device. The description in the documentation used the
previous format (e.g. '0:18.1') but a 16-bit WORD in HEX needs
to be passed nowadays. E.g.: '0:18.1' is specified by 'uart=0xc1'

Tracked-On: #5842
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Signed-off-by: Benjamin Fitch <benjamin.fitch@intel.com>
2021-03-23 13:54:10 -07:00
Li Fei1 9000381f34 hv: pgtable: move pgtable definition to pgtable.h
This patch moves pgtable definition to pgtable.h and include the proper
header file for page module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1 0278a3f46e hv: pgatble: move the EPT page table related APIs to ept.c
Move the EPT page table related APIs to ept.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for EPT should defined in EPT module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1 5c71ca456a hv: pgatble: move the MMU page table related APIs to mmu.c
Move the MMU page table related APIs to mmu.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for MMU should defined in MMU module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1 15d68675e9 hv: pgtable: separate common APIs for MMU/EPT
We would move the MMU page table related APIs to mmu.c and move the EPT related
APIs to EPT.c. The page table module only provides APIs to add/modify/delete/lookup
page table entry.

This patch separates common APIs and adds separate APIs of page table module
for MMU/EPT.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1 80bd3ac02a hv: trusty: move post_uos_sworld_memory into vm.c
post_uos_sworld_memory are used for post-launched VM which support trusty.
It's more VM related. So move it definition into vm.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Yonghua Huang 1a011bd91b hv: disable guest MONITOR-WAIT support when SW SRAM is configured
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang ae43b2a847 hv: disable host MONITOR-WAIT support when SW SRAM is enabled
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable hypervisor(host) MONITOR-WAIT support and refine
software sram initializaion flow.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang ea44bb6c4d hv: wrap function to check software SRAM support
Below boolean function are defined in this patch:
 - is_software_sram_enabled() to check if SW SRAM
   feature is enabled or not.
 - set global variable 'is_sw_sram_initialized'
   to file static.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Li Fei1 768e483cd2 hv: pgtable: rename 'struct memory_ops' to 'struct pgtable'
The fields and APIs in old 'struct memory_ops' are used to add/modify/delete
page table (page or entry). So rename 'struct memory_ops' to 'struct pgtable'.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1 ef98fa69ce hv: pgtable: remove get_default_access_right API
Use default_access_right field to replace get_default_access_right API.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1 7c6a52037a refine ept_flush_leaf_page
Refine the logic how to skip the pSRAM region when flushing cache.

Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-03 14:44:25 +08:00
Li Fei1 1db32f4d03 hv: ept: build 4KB page mapping in EPT for code pages of rtvm
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.

This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-03 13:46:49 +08:00
Li Fei1 01b54241c6 hv: ept: only treak execution right for large pages
To mitigate the page size change MCE vulnerability (CVE-2018-12207), ACRN would
clear the execution permission in the EPT paging-structure entries for large pages
and then intercept an EPT execution-permission violation caused by an attempt to
execution an instruction in the guest.

However, the current code would clear the execution permission in the EPT paging-
structure entries for small pages too when we clearing the the execution permission
for large pages. This would trigger extra EPT violation VM exits.

This patch fix this issue.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-03 13:46:49 +08:00
Junjie Mao 22064f71c1 Makefile: do not define default BOARD/SCENARIO in top-level Makefile
The top-level Makefile should not define any default value as the
hypervisor may have its own configurations set by previous builds.

This patch also changes the hypervisor default RELEASE to `n`.

Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-03-03 09:13:44 +08:00
Junjie Mao d5c11d5e79 Makefile: fixes to bugs that break diffconfig and applydiffconfig
This patch resolves the following bugs that break the targets `diffconfig`
and `applydiffconfig`:

 - Comments after variable definitions cause the varaible to contain
   unintended trailing whitespaces.

 - HV_CONFIG_XML is no longer defined; it is now HV_SCENARIO_XML.

 - '*.asl' files are also generated and should be involved when comparing
   the generated configuration files.

 - Strings between diacritic marks (`) are intepreted as shell commands
   even they are part of informative messages.

 - HV_DIFFCONFIG_LIST should not contain duplicated lines.

Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-03-03 09:13:44 +08:00
Junjie Mao 0edaaa880f Makefile: prefer RELEASE=y|n over RELEASE=0|1
For clarity, we now prefer y|n over 0|1 as the values of boolean options on
make command lines. This patch applies this preference to the Makefile of
the device model and tools, while RELEASE=0|1 is still supported for
backward compatibility.

Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-03-03 09:13:44 +08:00
Li Fei1 97a9c5151b kv: kconfig: remove some unused ram size kconfig
SOS_RAM_SIZE/UOS_RAM_SIZE Kconfig are only used to calculate how many pages we
should reserve for the VM EPT mapping.

Now we reserve pages for each VM EPT pagetable mapping by the PLATFORM_RAM_SIZE
not the VM RAM SIZE. This could simplify the reserve logic for us: not need to
take care variable corner cases. We could make assume we reserve enough pages
base on the VM could not use the resources beyond the platform hardware resources.

So remove these two unused VM ram size kconfig.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1 0579e2ee24 hv: page: add free_page
Add free_page to free page when unmap pagetable.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1 8d9f12f3b7 hv: page: use dynamic page allocation for pagetable mapping
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.

However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.

This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1 5621fabbcb hv: memory: remove get_sworld_memory_base API
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Victor Sun 26abc82f3c HV: panic on 0 address when do e820_alloc_memory
Current memory allocation algorithm is to find the available address from
the highest possible address below max_address. If the function returns 0,
means all memory is used up and we have to put the resource at address 0,
this is dangerous for a running hypervisor.

Also returns 0 would make code logic very complicated, since memcpy_s()
doesn't support address 0 copy.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-26 16:38:32 +08:00
Victor Sun 2e72bb97e7 HV: refine acpi rsdp initialize interface
In previous code, the rsdp initialization is done in get_rsdp() api implicitly.
The function is called multiple times in following acpi table parsing functions
and the condition (rsdp == NULL) need to be added in each parsing function.
This is not needed since the panic would occur if rsdp is NULL when do acpi
initialization.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-26 16:38:32 +08:00
Victor Sun 0588ef3ae3 HV: merge multiboot standard data structures in one header
In this way, all multiboot standard data structure could be found in
multiboot_std.h. The multiboot_priv.h stores all private definitions
and multiboot.h is the only public API header file.

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-26 16:38:32 +08:00
Yonghua Huang fdfd28b140 hv: unmap software region of pre-RTVM from Service VM EPT
Accessing to software SRAM region is not allowed when
 software SRAM is pass-thru to prelaunch RTVM.

 This patch removes software SRAM region from service VM
 EPT if it is enabled for prelaunch RTVM.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-02-25 09:35:31 +08:00
Sainath Grandhi 80a91987f4 hv: Fix incorrect struct definition for ir_bits
Fixing an incorrect struct definition for ir_bits in ioapic_rte. Since bits after
the delivery status in the lower 32 bits are not touched by code,
this has never showed up as an issue. And the higher 32 bits in the RTE
are aligned by the compiler.

Tracked-On: #5773
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
2021-02-25 09:34:49 +08:00
Victor Sun 6bb7a45672 HV: init VM bootargs only for LaaG
Currently the VM bootargs load address is hard-coded at 8KB right before
kernel load address, this should work for Linux kernel only since Linux
kernel is guaranteed to be loadered high than GPA 8K so its load address
would never be overflowed, other OS like Zephyr has no such assumption.

Tracked-On: #5689

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-09 09:00:46 +08:00
Jian Jun Chen aae7a89480 hv: ivshmem: BAR0 size should be 256 Bytes
ivshmem spec says that the size of BAR0 is 256 bytes. Windows
ivshmem driver will check the size of BAR0. It will refuse to
load the ivshmem driver if BAR0 size is not 256.
For post-launched VM hv land ivshmem BARs are allocated by
device model. For pre-launched VM hv land ivshmem BARs are
allocated by acrn-config tool. Both device model and acrn-config
tool should make sure that the BAR base addr are aligned to 4K
at least.

Tracked-On: #5717
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-09 08:57:50 +08:00
Tao Yuhong 50d8525618 HV: deny HV owned PCI bar access from SOS
This patch denies Service VM the access permission to device resources
owned by hypervisor.
HV may own these devices: (1) debug uart pci device for debug version
(2) type 1 pci device if have pre-launched VMs.
Current implementation exposes the mmio/pio resource of HV owned devices
to SOS, should remove them from SOS.

Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-02-03 14:01:23 +08:00
Tao Yuhong 6e7ce4a73f HV: deny pre-launched VM ptdev bar access from SOS
This patch denies Service VM the access permission to device
resources owned by pre-launched VMs.
Rationale:
 * Pre-launched VMs in ACRN are independent of service VM,
   and should be immune to attacks from service VM. However,
   current implementation exposes the bar resource of passthru
   devices to service VM for some reason. This makes it possible
   for service VM to crash or attack pre-launched VMs.
 * It is same for hypervisor owned devices.

NOTE:
 * The MMIO spaces pre-allocated to VFs are still presented to
  Service VM. The SR-IOV capable devices assigned to pre-launched
  VMs doesn't have the SR-IOV capability. So the MMIO address spaces
  pre-allocated by BIOS for VFs are not decoded by hardware and
  couldn't be enabled by guest. SOS may live with seeing the address
  space or not. We will revisit later.

Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 14:01:23 +08:00
Shuo A Liu d4aaf99d86 hv: keylocker: Support keylocker backup MSRs for Guest VM
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.

IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.

Linux keylocker implementation uses this feature, so they are
introduced by this patch.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu 38cd5b481d hv: keylocker: host keylocker iwkey context switch
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.

This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
  1) keylocker feature enabled
  2) Different from the current loaded one.

Two opportunities to do the load_iwkey():
  1) Guest enables CR4.KL bit.
  2) vCPU thread context switch.

load_iwkey() costs ~600 cycles when do the load IWKey action.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu c11c07e0fe hv: keylocker: Support Key Locker feature for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.

It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.

Keylocker usage:
 - New “ENCODEKEY” instructions take original key input and returns HANDLE
   crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
 - Software can then delete the original key from memory
 - Early in boot/software, less likely to have vulnerability that allows
   stealing original key
 - Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
   instructions
 - Note:
      * Software can use original key without knowing it (use HANDLE)
      * HANDLE cannot be used on other systems or after warm/cold reset
      * IWKey cannot be read from CPU after it's loaded (this is the
        nature of this feature) and only 1 copy of IWKey inside CPU.

The virtualization implementation of Key Locker on ACRN is:
 - Each vCPU has a 'struct iwkey' to store its IWKey in struct
   acrn_vcpu_arch.
 - At initilization, every vCPU is created with a random IWKey.
 - Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
   VM-exectuion control) of vCPU to capture and save the IWKey if guest
   set a new IWKey. Don't support randomization (emulate CPUID to
   disable) of the LOADIWKEY as hypervisor cannot capture and save the
   random IWKey. From keylocker spec:
   "Note that a VMM may wish to enumerate no support for HW random IWKeys
   to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
   cannot be easily context switched. A guest ENCODEKEY will return the
   type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
   virtualized a HW random IWKey with a SW specified IWKey."
 - In context_switch_in() of each vCPU, hypervisor loads that vCPU's
   IWKey into pCPU by LOADIWKEY instruction.
 - There is an assumption that ACRN hypervisor will never use the
   KeyLocker feature itself.

This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu 4483e93bd1 hv: keylocker: Enable the tertiary VM-execution controls
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.

This patch enables the tertiary VM-execution controls.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu e9247dbca0 hv: keylocker: Simulate CPUID of keylocker caps for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.

This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.

To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu 15c967ad34 hv: keylocker: Add CR4 bit CR4_KL as CR4_TRAP_AND_PASSTHRU_BITS
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
 - set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
 - load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
 - clear cr4_kl_enabled

This patch trap and passthru the CR4_KL bit to guest for operation.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Li Fei1 94a980c923 hv: hypercall: prevent sos can touch hv/pre-launched VM resource
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.

This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.

Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang  <yonghua.huang@intel.com>
2021-02-02 16:55:40 +08:00
Xie, nanlin 0b6840d1be acrn-config: Update generated configuration source code
1.Reorg generated configuration source code structure
2.Upstream generated configuration source code based on generic board infomation
3.Update license date from 2020 to 2021

Tracked-On: #5644
Signed-off-by: Xie, nanlin <nanlin.xie@intel.com>
2021-02-02 16:53:56 +08:00
Yonghua Huang 8bec63a6ea hv: remove the hardcoding of Software SRAM GPA base
Currently, we hardcode the GPA base of Software SRAM
 to an address that is derived from TGL platform,
 as this GPA is identical with HPA for Pre-launch VM,
 This hardcoded address may not work on other platforms
 if the HPA bases of Software SRAM are different.

 Now, Offline tool configures above GPA based on the
 detection of Software SRAM on specific platform.

 This patch removes the hardcoding GPA of Software SRAM,
 and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
 'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
 is for Prelaunch VM only.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-30 13:41:02 +08:00
Yonghua Huang c9ca23d268 hv: refine RTCM initialization code
- RTCM is initialized in hypervisor only
   if RTCM binaries are detected.
 - Remove address space of RTCM binary from
   Software SRAM region.
 - Refine parse_rtct() function, validity of
   ACPI RTCT table shall be checked by caller.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang a6e666dbe7 hv: remove hardcoding of SW SRAM HPA base
Physical address to SW SRAM region maybe different
 on different platforms, this hardcoded address may
 result in address mismatch for SW SRAM operations.

 This patch removes above hardcoded address and uses
 the physical address parsed from native RTCT.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang a6420e8cfa hv: cleanup legacy terminologies in RTCM module
This patch updates below terminologies according
 to the latest TCC Spec:
  PTCT -> RTCT
  PTCM -> RTCM
  pSRAM -> Software SRAM

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang 806f479108 hv: rename RTCM source files
'ptcm' and 'ptct' are legacy name according
   to the latest TCC spec, hence rename below files
   to avoid confusing:

  ptcm.c -> rtcm.c
  ptcm.h -> rtcm.h
  ptct.h -> rtct.h

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Junjie Mao 0de004e5c9 Makefile: fix build issues due to reorg of misc/
This patch fixes the following issues that break the build system:

 1. The tag of the root nodes of board/scenario XML files are still acrn-config,
    not config_tools. This patch reverts the XPATH that refers to these nodes.

 2. HV_PREDEFINED_BOARD_DIR now also relies on BOARD which may not be
    available at the time the variable is defined. As both board and
    scenario XML files are placed under the same directory, this patch
    refines the path calculation logic to get rid of mixing variables of
    the different flavors.

Tracked-On: #5644
Fixes: 97c9b24030 ("acrn-config: Reorg config tool folder")
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-01-28 10:20:38 +08:00
Liang Yi e8a76868c9 hv: modularization: remove global variable efiloader_sig.
Simplify multiboot API by removing the global variable efiloader_sig.
Replaced by constant at the use site.

Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 67926cee81 hv: modularization: remove include/boot.h.
Remove include/boot.h since it contains only assembly variables that
should only be accessed in arch/x86/init.c.

Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 1de396363f hv: modularization: avoid dependency of multiboot on zeropage.h.
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.

Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 7c02dc0801 hv: modularization: remove multiboot dependency on e820.h.
This is done by adding the MAX_MMAP_ENTRIES macro in multiboot.h.
This macro has to be sync-ed with E820_MAX_ENTRIES manually though.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 681688fbe4 hv: modularization: change of multiboot API.
The init_multiboot_info() and sanitize_multiboot_ifno() APIs now
require parameters instead of implicitly relying on global boot
variables.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 66599e0aa7 hv: modularization: multiboot
Calling sanitize_multiboot() from init.c instead of cpu.c.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi c23e557a18 hv: modularization: make parse_hv_cmdline() an internal function.
This way, we void exposing acrn_mbi as a global variable.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi f6aa0a70c5 hv: modularization: remove get_rsdp_ptr() from multboot API.
This function is a derivative of get_multiboot_info().

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 101004603a hv: modularization: hide "struct multiboot_info"
Move "struct multboot_info" from multiboot.h into multboot.c.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 8f9ec59a53 hv: modularization: cleanup boot.h
Move multiboot specific declarations from boot.h to multiboot.h.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 153e83a19e hv: modularization: multiboot private header
Create multiboot_pri.h and move the relevant declarations into this
file.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi 0ace40b679 hv: modularization: move multiboot files to dedicate sub-directory
Create a multiboot module under the boot directory and move multiboot
files as part of this.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Xie, nanlin 97c9b24030 acrn-config: Reorg config tool folder
Remove vm_configs folder and move all the XML files and generic code example into config_tools/data

Tracked-On: #5644
Signed-off-by: Xie, nanlin <nanlin.xie@intel.com>
2021-01-27 11:08:28 +08:00
Junjie Mao 775214b710 Makefile: create and apply patches to generated sources
In order to enable changing the generated C configuration files manually,
this patch introduces the target `diffconfig` to the build system.

After generating the configuration files, a developer can manually modify
these sources (which are placed under build/configs) and invoke `make
diffconfig` to generate a patch that shows the made differences. Such
patches can be registered to a build by invoking the `applydiffconfig`
target. The build system will always apply them whenever the configuration
files are regenerated.

A typical workflow to create a patch is as follows.

    # The pre_build target relies on generated configuration files
    hypervisor$ make BOARD=xxx SCENARIO=yyy pre_build

    (manually edit files under build/configs/boards and
      build/configs/scenarios)

    hypervisor$ make diffconfig   # Patch generated to build/config.patch
    hypervisor$ cp build/config.patch /path/to/patch

The following steps apply apply the patch to another build.

    hypervisor$ make BOARD=xxx SCENARIO=yyy defconfig
    hypervisor$ make applydiffconfig PATCH=/path/to/patch-file-or-directory
    hypervisor$ make

After any patch is registered for a build, the configuration files will be
automatically regenerated the next time `make` is invoked.

To show a list of registered patches for generated configuration files,
invoke `make applydiffconfig` without specifying `PATCH`.

v2:
 * Add target `applydiffconfig` which accepts a PATCH variable to register
   an arbitrary patch file or a directory containing patch file(s) for a
   build. `.config_patches` is no longer used.

Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-01-20 17:58:00 +08:00
Junjie Mao 866c0881a3 Makefile: generate C configuration files at build time
This patch makes the build system of the hypervisor to cache the board and
scenario XML files in the build directory and generate C configuration
files from them at build time. The C configuration files that are cached in
the git repo is no longer used or updated. Paths to these generated files
in the prebuild Makefile is updated accordingly.

The following targets are introduced or modified.

    * defconfig: Copy default configuration XMLs to the build directory and
                 generate C configuration files.

    * oldconfig: No action.

    * menuconfig: Print a message to redirect users to use the config app
                  and exit.

    * showconfig: Print the BOARD, SCENARIO and RELEASE configured for the
                  current build.

    * update_config: No action.

    * (default): Build the hypervisor with defined configurations.

The following variables can be set on the command line to specify the
default configuration to be used.

    * BOARD: Either a name of the target board or a path to a customized
             board XML. When a board name is specified, the board XML file
             is expected to be available under
             misc/acrn-config/xmls/board-xmls.

    * SCENARIO: Either a name of the scenario of a path to a customized
                scenario XML. When a scenario name is specified, the
                scenario XML file is expected to be available under
                misc/acrn-config/xmls/config-xmls/$(BOARD).

    * BOARD_FILE: Path to the board XML file to be used. This is now
                  obsoleted as BOARD provides the same functionality.

    * SCENARIO_FILE: Path to the scenario XML file to be used. This is now
                     obsoleted as BOARD provides the same functionality.

BOARD/SCENARIO or BOARD_FILE/SCENARIO_FILE shall be used in pair, and
BOARD_FILE/SCENARIO_FILE shall point to valid files when specified. Any
violation to those constraints will stop the build with error
messages. When BOARD/SCENARIO and BOARD_FILE/SCENARIO_FILE are both defined
on the command line, the former takes precedence as the latter are to be
obsoleted.

Additionally, users can define the RELEASE variable to specify a debug or
release build. In case a previous build exists but is configured for a
different build type, the build system will automatically update the
scenario XML and rebuild the sources.

This patch also includes the following tweaks:

    1. Do not use `realpath` to process search paths for generated
       headers. `realpath` only accepts paths of existing files, while the
       directories for generated headers may not be created at the time the
       search paths are calculated.
    2. Always expect `pci_dev.c` to be in place.
    3. HV_CONFIG_* series now encodes absolute paths.

v3:
 * Do not validate BOARD_FILE/SCENARIO_FILE if BOARD/SCENARIO are given.

v2:
 * `defconfig` now also generates the C configuration files.
 * BOARD/SCENARIO now accept either board/scenario names or XML file paths.
 * Adapt to the new allocation.xml & unified.xml.
 * Cleanup names of internal variables in config.mk for brevity.

Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2021-01-20 17:58:00 +08:00