Commit Graph

8231 Commits

Author SHA1 Message Date
Zhang Chen 4a176212eb HV: elf_loader: Fix copy gpa bug in load elf32
The elf images can't be loaded correctly because
the elf_loader copy_to_gpa with wrong size.
The p_filesz and p_memsz both belong to elf32_prog_entry,
this data structure describes segments loaded in ram.
p_filesz means size of segment in file and p_memsz
means size of segment in memory.
ELF loader should copy elf_img to gpa with the
size of p_prg_tbl_head32->p_filesz.

Tracked-On: #8642

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-10 15:26:02 +08:00
Zhang Chen 1933ee93cb HV: elf_loader: enable guest multiboot support
This patch enable guest multiboot support. Try to find
the multiboot header in normal elf guest image.
Introduce the multiboot related basic functions to
initialize multiboot structure. Including
prepare_multiboot_mmap, prepare_loader_name and
find_img_multiboot_header.

Tracked-On: #8642

Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-10 15:26:02 +08:00
Zhang Chen 1d4bdd452c HV: elf_loader: introduce the multiboot_header data structure
Define the multiboot_header data structure and
MULTIBOOT_MEMORY related definitions.

Tracked-On: #8642

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-10 15:26:02 +08:00
Zhang Chen b808c0ef32 HV: elf_loader: Prepare to extend elf loader for multiboot protocol
For the TEE and android kernelflinger boot requirements,
elf_loader need to support the multiboot protocol.
This patch define a memory block to store ELF format VM load
params in guest address space. At the same time, prepare the elf
cmdline field and memory map for the guest kernel.

Tracked-On: #8642

Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-10 15:26:02 +08:00
Victor Sun 49a02f599b HV: elf_loader: Make VM bootargs support elf guest
Except Linux guest, elf guest also need support bootargs.
Currently VM bootargs support all type of guest.

Tracked-On: #8642

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-07-10 15:26:02 +08:00
Haiwei Li 529ade37a4 config_tools: support vUART Timer pCPU configuration
This patch is to allow user to pin vUART timer to specific pCPU via ACRN
config tool. User can configure by setting "vUART timer pCPU ID" under
Hypervisor->Advanced Parameters.

Tracked-On: #8648
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-07-10 10:26:21 +08:00
Haiwei Li f47b2b6860 config_tools: support vUART Tx/Rx buffer size configuration
Introduce an interface to define Tx/Tx buffer size via ACRN config tool.
User can configure under Hypervisor->Advanced Parameters.

Tracked-On: #8644
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-07-10 10:26:21 +08:00
Gao, Shiqing cc9013541f hv: vrtc: remove the unused function `rtc_halted()`
rtc_halted() is not invoked anywhere in the code.
This patch removes this unused function to fix below error.
  error: unused function 'rtc_halted' [-Werror,-Wunused-function]

Tracked-On: #861

Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
2024-07-03 14:55:43 +08:00
Gao, Shiqing 0bcf469758 hv: vtd: fix use of uninitialized variable in dmar_free_irte
This patch fixes the following error:
  error: variable 'sid' is used uninitialized whenever 'if' condition is true
  [-Werror,-Wsometimes-uninitialized]

Tracked-On: #861

Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
2024-07-03 14:55:43 +08:00
Gao, Shiqing 2aed0c7aa1 hv: reloc: enclose `elf64_r_type()` with `#ifdef CONFIG_RELOC`
elf64_r_type() is only invoked when CONFIG_RELOC is defined.
This patch encloses its definition with `#ifdef CONFIG_RELOC`,
otherwise, it is dead code.

Tracked-On: #861

Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
2024-07-03 14:55:43 +08:00
Gao, Shiqing f398b9c29e release: fix the compilation error in release mode
Commit `512c98fd7 hv: trace: show cpu usage of vms in pcpu sharing case`
causes the compilation error in release mode:
  hypervisor/common/schedule.c:190: undefined reference to `TRACE_16STR'

This patch fixes this issue.

Tracked-On: #861

Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
2024-07-03 11:26:01 +08:00
YuanXin-Intel e4b1584577 Change Service VM to supervisor role
1. Enable Service VM to power off or restart the whole platform even when RTVM is running.
2. Allow Service VM stop the RTVM using acrnctl tool with option "stop -f".
3. Add 'Service VM supervisor role enabled' option in ACRN configurator

Tracked-On: #8618

Signed-off-by: YuanXin-Intel <xin.yuan@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-06-28 13:35:07 +08:00
nacui 512c98fd79 hv: trace: show cpu usage of vms in pcpu sharing case
To maximize the cpu utilization, core 0 is usually shared by service
vm and guest vm. But there are no statistics to show the cpu occupation
of each vm.

This patch is to provide cpu usage statistic for users. To calculate
it, a new trace event is added and marked in scheduling context switch,
accompanying with a new python script to analyze the data from acrntrace
output.

Tracked-On: #8621
Signed-off-by: nacui <na.cui@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Haiwei Li <haiwei.li@intel.com>
2024-06-28 12:55:23 +08:00
Wu Zhou 926f2346df config_tools: remove ivshmem size from hv_ram_size
As ivshmem has switched from static allocation to E820 allocation,
the hv_ram_size no longer needs to include ivshmem size.

Tracked-On: #8522
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
2024-06-28 10:00:41 +08:00
Haiwei Li 3d6ca845e2 hv: s3: add timer support
When resume from s3, Service VM OS will hang because timer interrupt on
BSP is not triggered. Hypervisor won't update physical timer because
there are expired timers on pcpu timer list.

Add suspend and resume ops for modules that use timers.

This patch is just for Service VM OS. Support for User VM will be added
in the future.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 5283c147ef hv: pci: Add guest cfg header access handling of type 1 device
When guests resume form s3, an error occurs in guest:

```
pcieport 0000:00:1c.0: refused to change power state from D0 to D3hot
```

PCI bridge (type 1 device) will access configuration space header but
now acrn is not supported. So add handling support.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 2cd0edaf9c hv: pci: restore bus and memory/IO info after reset
After some kind of reset, such as s3, pci bridge tries to restore the
bus and memory/IO info (from 0x18 to 0x32, except for Secondary Latency
Timer 0x1b) to resume device state.

This patch is to restore these info by hypervisor.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 81935737ff hv: s3: reset vm after resume
Now only BSP is reset. After Service VM OS resumes from s3, APs'
apic_base_msr are incorrect with x2apic bit en.

To avoid incorrect states, do `reset_vm` after resume.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li 9c139681f2 hv: s3: hwp: enable hwp after resume from s3
After Service OS resume from s3, an error occurs:

[3649827us][cpu=1][idle1][sev=2][seq=1749]:= Unhandled exception: 13 (General Protection)

[3658622us][cpu=1][idle1][sev=2][seq=1750]:
Host Registers:

[3664881us][cpu=1][idle1][sev=2][seq=1751]:=  Vector=0x000000000000000D  RIP=0x000000000040F9F0

[3674213us][cpu=1][idle1][sev=2][seq=1752]:=     RAX=0x0000000080003801  RBX=0x0000000001800800  RCX=0x0000000000000774

[3685787us][cpu=1][idle1][sev=2][seq=1753]:=     RDX=0x0000000000000000  RDI=0x0000000000000080  RSI=0x0000000000000000

[3697371us][cpu=1][idle1][sev=2][seq=1754]:=     RSP=0x0000000000616C18  RBP=0x0000000000616C38  RBX=0x0000000001800800

[3708947us][cpu=1][idle1][sev=2][seq=1755]:=      R8=0x0000000000000038   R9=0x0000000000000001  R10=0x00000000000003F8

[3720539us][cpu=1][idle1][sev=2][seq=1756]:=     R11=0x000000000000000D  R12=0x0000000000458245  R13=0x0000000000000000

[3732114us][cpu=1][idle1][sev=2][seq=1757]:=  RFLAGS=0x0000000000010202  R14=0x0000000000000000  R15=0x0000000000000000

[3743699us][cpu=1][idle1][sev=2][seq=1758]:= ERRCODE=0x0000000000000000   CS=0x0000000000000008   SS=0x0000000000000010

[3755305us][cpu=1][idle1][sev=2][seq=1759]:= CR2=0x0000000000000000

The error occurs in `msr_write(MSR_IA32_HWP_REQUEST, reg)`, when HWP is
not available.

This patch is to initialize HWP after resume.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Haiwei Li cdfd35ed3d hv: s3: enable lapic earlier
After Service VM OS resumes from s3, BSP starts APs asynchronously,
followed by IPIs to APs to resume tsc. This process takes place in
function `host_enter_s3`. While, APs' lapic are not ready to accept IPI
interrupt, so BSP fails to resume tsc.

So enable lapic earlier to make sure that APs are ready.

Tracked-On: #8623
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-27 11:26:09 +08:00
Xin Zhang d0fed9901d Fix Debian packaging postinst partition finding
If the root partition is bind mounted with / and another, the current
postinst script (using command lsblk) will fail to find the partition:
$type will be "/" only and cause the following command may find the
wrong partition.

Ubuntu 22.04 desktop with firefox snap by default:
```
> lsblk
nvme0n1      259:19   0 931.5G  0 disk
├─nvme0n1p1  259:20   0   243M  0 part /boot/efi
├─nvme0n1p2  259:21   0 927.5G  0 part /var/snap/firefox/common/host-hunspell
│                                      /
```

And current command forces the root partition to be ext4.

This patch fixes the two issues.

Tracked-On: #8532

Signed-off-by: Xin Zhang <xin.x.zhang@intel.com>
2024-06-26 13:56:01 +08:00
Haiwei Li fbe30d4001 dm: pm: add s3 support for an User VM with elf/bzImage
For an elf-loaded or beImage-loaded User VM, acrn-dm is responsible for
handling s3 related matters.

After resume from S3, acrn-dm should read waking_vector and set related
registers to make guest to resume.

Tracked-On: #8536
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-26 12:25:46 +08:00
Haiwei Li 374ad1c9ed dm: pm: add s3 support for an ovmf-booted User VM
For ovmf-booted User VM, we should set CMOS shutdown status register
(index 0xF) as S3_resume(0xFE). So ovmf will read it and start S3 resume
at POST entry.

And ovmf will read waking vector from FACS table and transfer control to
guest.

Tracked-On: #8624
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-26 12:25:46 +08:00
Haiwei Li c42b45e9a3 OVMF release v3.3
- OvmfPkg: resolve AcrnS3Lib
- OvmfPkg: add AcrnS3Lib to support S3
- OvmfPkg: introduce AcrnS3Lib class
- OVMF:ACRN:PCI: Try to load ROM image for the PCI device with PCI_ROM
- OVMF:ACRN:PCI: Add LoadOpRomImageLight to Load the PCI Rom
- OVMF:ACRN:PCI: Write back the original value of PCI ROM

The first three above are related to S3.

Tracked-On: #8624
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-06-26 12:25:46 +08:00
Qiang Zhang 29137b9e9c doc: update doc for vUART to hypervisor console switch key
Things changed since following commit
(c623e1112 debug: vuart: add guest break key support).

Tracked-On: #8583
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2024-06-25 11:07:21 +08:00
Yi Sun cc0364f1f7 doc: Correct the registers names in ivshmem hld
- In 'MMIO Registers Definition', the names of interrupt status/mask registers are wrong

Tracked-On: #8568
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2024-06-25 11:03:02 +08:00
Jiaqing Zhao 53825c5cac e820: properly reserve memory for multiboot modules
In current implementation, if there are multiple continous 4k-aligned
modules, 0-sized e820 entries will be created between these regions.
And for non-4k-aligned modules, when two of them are located in one
page, the second memory range will not be reserved as it was not in
one e820 entry after the first is reserved, making it vulnerable.

This patch fixes it by marking the exact memory range of multiboot
modules as unusable first, then shrinking the e820 entries to page
boundary. If the module crosses multiple e820 entries, possibly due
to a buggy bootloader, hypervisor will panic immediately to prevent
modules getting corrupted.

Tracked-On: #8617
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-06-20 09:10:27 +08:00
Haiwei Li b31fcd3519 hv: cpuid: fix hybrid related cpuid error
Some cpuids will return invalid values on hybrid platform because of the
error in the pointer arithmetic. Add `(void *)` before
`cpu_cpuids.leaves`.

Leaf 0x14 is used to report Intel Processor Trace Enumeration and varies
between P-cores and E-cores on hybrid platform. So add it to
`hybrid_leaves`.

Tracked-On: #8608
Fixes: 59a8cc4c2 ("hv: cpuid: make leaf 0x4 per-cpu in hybrid architecture")
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-06-19 17:07:10 +08:00
Jiaqing Zhao dc8ea42297 dm: deinit iothreads on vm reset
iothreads are created by emulated block devices like virtio. These
devices are resetted on vm reset, but these iothreads are not freed,
causing a resource leak. Fix it by deinit all iothreads on vm reset.

Tracked-On: #8612
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-06-19 16:14:33 +08:00
andi6 46a860bf04 hv: fix using cpuid does not clear the upper 32-bit registers.
In HV, cpuid uses the lower 32 bits of rax\rbx\rcx\rdx registers to pass parameters,
But the software does not clear the upper 32-bit registers,  if the guest
uses 64-bit variables to pass parameters to cpuid,guest will use rax\rbx\rcx\rdx,
not eax\ebx\ecx\edx, the previous value of the high 32 registers will affect the guest.

Tracked-On: #8605
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: andi6 <andi6@xiaomi.com>
2024-06-19 15:35:26 +08:00
nacui 30aeeedcb6 dm: enable lpss device passthrough
Passthrough of lpss devices, such as sdio, spi, uart, is not supported for user
vm due to irq and acpi info missing.

Here provides new pci device passthrough options to pass irq and acpi dsdt info
by users. Considering spi dsdt info varies from HW, to add the flexibility of
configuration, it is designed to pass dsdt file of spi device by users rather
than hard code. Besides, remove the limit of the lpss device passthrough for rtvm.

Tracked-On: #8615

Signed-off-by: nacui <na.cui@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-06-19 14:48:34 +08:00
dongpingx 23b1a44d40 misc: fix Vue3 version & update braces's version
Although my former patch can pass through build procedure but when
I launch configurator and try to load board.xml, the loading
procedure wont finish. So we cannot step forward anymore.

I cannot find a solution right now, so I have to fix the version
to v3.2.33 for several weeks.

This patch is applied to fix vulnerability scanned by Trivy also.
Vulnerability ID is CVE-2024-4068 & fixed version of dependency is 3.0.3.
I added one configuration item named override for package.json.

I tested and confirmed the fix is ok.

Signed-off-by: dongpingx <dongpingx.wu@intel.com>
Tracked-On: #8626
2024-06-18 10:26:32 +08:00
dongpingx 7739f0ef2a misc: add checking while append new vm
This patch will add checking cpu affinity while user click to add new vm.
When I was following client's findings up I found that if I click to add
a new post-launched vm for step 3.Configure settings for scenario and launch
scripts, it failed to show error messages. The current version will check cpu
affinity and serial port for post-launched and hv when creating a new vm, it
wont verify when adding new post-launched & pre-launched vms, it will fail to
save scenario configuration file without any explanation. I've rebuilt and run
configurator, confirmed the checking procedure works.

Signed-off-by: dongpingx <dongpingx.wu@intel.com>
Tracked-On: #8601
Reviewed-by: Junjie Mao junjie.mao@intel.com
2024-06-17 11:40:24 +08:00
Qiang Zhang 5c9e1c0186 board_inspector: fix typo in PCIe PTM Capability name
PCIe extended capability with ID 0x1F is Precise Time Measurement. So
fix typo "TPM" which may confuse users.

Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Tracked-On: #5915
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2024-06-17 10:27:36 +08:00
Shiqing Gao 80b1edabf5 dm: block_if: support bypassing BST_BLOCK logic
With current implementation, in blockif_dequeue/blockif_complete,
if the current request is consecutive to any request in penq or busyq,
current request's status is set to BST_BLOCK. Then, this request is blocked
until the prior request, which blocks it, is completed.
It indicates that consecutive requests are executed sequentially.

This patch adds a flag `no_bst_block` to bypass such logic because:
1. the benefit of this logic is not noticeable;
2. there is a chance that a request is enqueued in block_if_queue but
   not dequeued when this logic is triggered along with the io_uring mechanism;

Example to use this flag:
`add_virtual_device                     5 virtio-blk /dev/nvme1n1,no_bst_block`

Note:
When io_uring is enabled, the BST_BLOCK logic would be bypassed.

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao 11c8907464 dm: virtio-blk: fix virtio_blk_ops bug
When multiple virtio-blk instances are created for one VM,
using the same `static struct virtio_ops virtio_blk_ops` for all instances
is buggy. It only works when all instances are created with the same number
of the virtqueues.

This patch fixes this issue by introducing a member in `struct virtio_blk`
to store the ops info for each virtio-blk instance.

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao f92b0f43e6 dm: block_if: io_uring: flush the modified in-core data on demand
When `io_uring` is used, `blockif_flush_cache` is missing when an WRITE
operation is completed. `blockif_flush_cache` would flush the modified
in-core data to the disk device according to the setting of the cache mode.

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao 5306d9e7db dm: update the `iothread` option to specify the CPU affinity
This patch updates the `iothread` option to specify the CPU affinity
of the iothread. Setting the iothread's CPU affinity could benefit the
Service VM's CPU utilization when Service VM owns limited dedicated CPUs.

It could be helpful to ensure the I/O mediator Quality of Service (QoS).
Once the performance tuning is done, the specific CPU affinity config could
pass to acrn-dm directly, letting the deployment more easily.

The format looks like below:
iothread=<num_iothread>@<cpu_affinity>
"@" is used to separate the following two settings:
 - the number of iothread instances
 - the CPU affinity settings for each iothread instance.

The format of `cpu_affinity` looks like below:
<cpu_affinity_0>/<cpu_affinity_1>/<cpu_affinity_2>/...
1. "/" is used to separate the CPU affinity setting for each iothread instance
   (sequentially).
2. char '*' can be used to skip the setting for the specific iothread instance.
3. the number of cpu_affinity_x vs. the number of iothread instances
   - If # of cpu_affinity_x is less than # of iothread instances,
     no CPU affinity settings for the last few iothread instances.
   - If # of cpu_affinity_x is more than # of iothread instances,
     the extra cpu_affinity_x are discarded.
4. ":" is used to separate different CPU cores for each CPU affinity setting.

Examples to specify the CPU affinity of the iothread:
1. iothread=3@0:1:2/0:1
   `add_virtual_device    9 virtio-blk iothread=3@0:1:2/0:1,mq=3,/dev/nvme1n1`
   a) 3 iothread instances are created.
   b) CPU affinity of iothread instances for this virtio-blk device:
      - 1st iothread instance <-> pins to Service VM CPU 0,1,2
      - 2nd iothread instance <-> pins to Service VM CPU 0,1
      - 3rd iothread instance <-> No CPU affinity settings

2. iothread=3@0/*/1
   `add_virtual_device    9 virtio-blk iothread=3@0/*/1,mq=3,/dev/nvme1n1`
   a) 3 iothread instances are created.
   b) CPU affinity of iothread instances for this virtio-blk device:
      - 1st iothread instance <-> pins to Service VM CPU 0
      - 2nd iothread instance <-> No CPU affinity settings
      - 3rd iothread instance <-> pins to Service VM CPU 1

v1 -> v2:
 * encapsulate one API in iothread.c to parse the iothread options, so that
   other BE can also use it.

v2 -> v3:
 * introduce one API iothread_free_options to free the elements that
   are allocated dynamically in iothread_parse_options().

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao a90aa4fd26 dm: iothread: rename the thread for better readability
This patch renames the iothread for better readability. For instance,
the new name of the iothread for virtio-blk device looks like `iothr-0-blk9:0`.

It could be helpful when tuning the performance and the CPU utilization.

v1 -> v2:
 * add `const` qualifier for the input parameter of `iothread_create`

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao 14c20fa31c dm: block_if: support misaligned request when O_DIRECT is used
Use of O_DIRECT flag could be a performance option.
But this flag may impose alignment restrictions on the length
and address of user-space buffers and the file offset of I/Os.

To support the use of O_DIRECT flag in block_if, this patch adds the support
to handle the misaligned request.
 - When O_DIRECT flag is used (`nocache` is specified in acrn-dm parameters),
    * if the original I/O request is aligned,
      the original I/O request is submitted directly.
    * if the original I/O request is not aligned (either due to the buffer
        address/length misalignment, or the offset misalignment),
      the misaligned request is converted to an aligned request before
      submission.

 - When O_DIRECT flag is not used,
   the original I/O request is submitted directly.

v1 -> v2:
 * cleanup the free() logic in `blockif_init_bounced_write`

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao e2da306755 dm: block_if: support bypassing the Service VM's page cache
This patch adds an acrn-dm option `nocache` to bypass the Service VM's
page cache.
 - By default, the Service VM's page cache is utilized.
 - If `nocache` is specified in acrn-dm parameters, the Service VM's page cache
   would be bypassed (opening the file/block with O_DIRECT flag).

Example to bypass the Service VM's page cache:
`add_virtual_device    5 virtio-blk iothread,mq=2,/dev/nvme2n1,writeback,nocache`

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen 63d41a75fa dm: set iothread nice value to PRIO_MIN
To improve the performance of the virtual device who utilizes iothread
(such as virtio-blk), this patch sets iothread nice value to PRIO_MIN,
so that it could get higher priority on scheduling.

This patch does:
 - introduce `set_thread_priority` to set the priority of the current running
   thread.
   The priority could be any value in the range PRIO_MIN to PRIO_MAX.
   Lower numerical value causes more favorable scheduling.

 - set iothread nice value to PRIO_MIN.

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao 7e6a239646 dm: improve the flexibility of the iothread support
Prior to this patch, one single iothread instance is created and initialized
in the `main` function. This single iothread monitors all the registered fds
and handles all the corresponding requests. It leads to the limited flexibility
of the iothread support.

To improve the flexibility of the iothread support, this patch does:
- add the support of multiple iothread instances.
  `iothread_create` is introduced to create a certain number of iothread
  instances. It shall be called at first by each virtual device owner (such as
  virtio-blk BE) on initialization phase. Then, `iothread_add` can be called
  to add the to be monitored fd to the specified iothread.

- update virtio-blk BE to let the acrn-dm option `iothread` accept a number
  as the number of iothread instances to be created.
  If `iothread` is contained in the parameters, but the number is not specified,
  one iothread instance would be created by default.
  Examples to specify the number of iothread instances:
  1. Create 2 iothread instances
  `add_virtual_device    9 virtio-blk iothread=2,mq=2,/dev/nvme1n1,writeback,aio=io_uring`
  2. Create 1 iothread instances (by default)
  `add_virtual_device    9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=io_uring`

- update virtio-blk BE to separate the request handling of different virtqueues
  to different iothreads.
  The request from one or more virtqueues can be handled in one iothread.
  The mapping between virtqueues and iothreads is based on round robin.

v1 -> v2:
 * add a mutex to protect the free ioctx slot allocation

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Shiqing Gao fed8ce513c dm: block_if: add the io_uring support
io_uring is a high-performance asynchronous I/O framework, primarily designed
to improve the efficiency of input and output (I/O) operations in user-space
applications.
This patch enables io_uring in block_if module. It utilizes the interfaces
provided by the user-space library `liburing` to interact with io_uring
in kernel-space.

To build the acrn-dm with io_uring support, `liburing-dev` package needs to be
installed. For example, it can be installed like below in Ubuntu 22.04.
        sudo apt install liburing-dev

In order to support both the thread pool mechanism and the io_uring mechanism,
an acrn-dm option `aio` is introduced. By default, thread pool mechanism is
selected.
- Example to use io_uring:
  `add_virtual_device    9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=io_uring`
- Example to use thread pool:
  `add_virtual_device    9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=threads`
- Example to use thread pool (by default):
  `add_virtual_device    9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback`

v2 -> v3:
 * Update iothread_handler
    - Use the unified eventfd interfaces to read the counter value of the
      ioeventfd.
    - Remove the while loop to read the ioeventfd. It is not necessary
      because one read would reset the counter value to 0.
 * Update iou_submit_sqe to return an error code
   The caller of iou_submit_sqe shall check the return value.
   If there is NO available submission queue entry in the submission queue,
   need to break the while loop. Request can only be submitted when SQE is
   available.

v1 -> v2:
 * move the logic of reading out ioeventfd from iothread.c to virtio.c, because
   it is specific to the virtqueue handling.

Tracked-On: #8612

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen edb392e7ed dm: block_if: add multiple queues support
block_if is the backend of ahci and virtio-blk. Only one queue is
supported by block_if now. Several worker threads are created as
the thread pool for the queue. One BIG mutex is used for the queue
and thread operation. With this patch block_if can support multiple
queues and each queue is backed by several worker threads. blockif_req
can be submited/enqueued into one specified queue. By spliting into
several queues contention from the BIG mutex can be relieved/eliminated.
This is used to support virtio-blk multiple queues feature.

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen 562c22fb4e dm: virtio-blk: add multiple queues (mq) support
Virtio-blk can support multiple virtqueues (mq) which is negotiated
between FE and BE by the feature bit VIRTIO_BLK_F_MQ. The virtqueue
number of virtio-blk can be specified by "mq=x" in the parameter.
For example: "virtio-blk,iothread,mq=2,..."

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen 74bc2f7cfb hv: asyncio: support data match of the same addr
Virtio legacy device (ver < 1.0) uses a single PIO for all virtqueues.
Notifications from different virtqueues are implemented by writing
virtqueue index to the PIO. Writing different values to the same addr
needs to be mapped to different eventfds by asyncio. This is called
data match feature of asyncio.

v3 -> v4:
 * Update the definition of `struct asyncio_desc`
   Use `struct acrn_asyncio_info` inside it, instaed of defining the duplicated
   fileds.
 * Update `add_asyncio` to use `memcpy_s` rather than assigning all the fields
   using 5 assignment statements.
 * Update `asyncio_is_conflict` for coding style
   120-character line is sufficient to write all conditions.
 * Update the checks related to `wildcard`
   Because we require every conditional clause to have a Boolean type
   in the coding guideline.

v2 -> v3:
No change

v1 -> v2:
No change

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen 2737281010 dm: virtio: add per queue mutex
ACRN virtio devices are using a per device mutex to protect the
concurrent operations on the device's PIO/MMIO. This introduces
big contention in fast IO hence downgrades the IO performance,
for example virtio-blk with asyncio enabled. This patch introduces
per queue mutex to relieve such issues. Currently the per queue
mutex is only used in the asycio path when iothread is enabled.

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jian Jun Chen 26aece0492 dm: virtio: fix a asyncio/ioeventfd bug
ACRN_IOEVENTFD_FLAG_ASYNCIO is not set when unregister ioeventfd
in the current implementation which will cause the old asyncio_desc
will be remained in hypervisor link list when switching from OVMF to
kernel.

Tracked-On: #8612

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2024-06-05 15:23:33 +08:00
Jiaqing Zhao 91e0612e88 hv: dm: refine create/destroy functions
The create function of hv-emulated device must check the return value
of vpci_init_vdev() as it returns NULL pointer on failure, and that
function should be called atomically.

Also, the destory function should deinit the vpci devices created to
prevent resource leak.

Tracked-On: #8590
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-06-04 09:38:34 +08:00