262 lines
9.5 KiB
ReStructuredText
262 lines
9.5 KiB
ReStructuredText
.. _hv-device-passthrough:
|
|
|
|
Device Passthrough
|
|
##################
|
|
|
|
A critical part of virtualization is virtualizing devices: exposing all
|
|
aspects of a device including its I/O, interrupts, DMA, and configuration.
|
|
There are three typical device
|
|
virtualization methods: emulation, para-virtualization, and passthrough.
|
|
Both emulation and passthrough are used in ACRN project. Device
|
|
emulation is discussed in :ref:`hld-io-emulation` and
|
|
device passthrough will be discussed here.
|
|
|
|
In the ACRN project, device emulation means emulating all existing hardware
|
|
resource through a software component device model running in the
|
|
Service OS (SOS). Device
|
|
emulation must maintain the same SW interface as a native device,
|
|
providing transparency to the VM software stack. Passthrough implemented in
|
|
hypervisor assigns a physical device to a VM so the VM can access
|
|
the hardware device directly with minimal (if any) VMM involvement.
|
|
|
|
The difference between device emulation and passthrough is shown in
|
|
:numref:`emu-passthru-diff`. You can notice device emulation has
|
|
a longer access path which causes worse performance compared with
|
|
passthrough. Passthrough can deliver near-native performance, but
|
|
can't support device sharing.
|
|
|
|
.. figure:: images/passthru-image30.png
|
|
:align: center
|
|
:name: emu-passthru-diff
|
|
|
|
Difference between Emulation and passthrough
|
|
|
|
Passthrough in the hypervisor provides the following functionalities to
|
|
allow VM to access PCI devices directly:
|
|
|
|
- DMA Remapping by VT-d for PCI device: hypervisor will setup DMA
|
|
remapping during VM initialization phase.
|
|
- MMIO Remapping between virtual and physical BAR
|
|
- Device configuration Emulation
|
|
- Remapping interrupts for PCI device
|
|
- ACPI configuration Virtualization
|
|
- GSI sharing violation check
|
|
|
|
The following diagram details passthrough initialization control flow in ACRN:
|
|
|
|
.. figure:: images/passthru-image22.png
|
|
:align: center
|
|
|
|
Passthrough devices initialization control flow
|
|
|
|
Passthrough Device status
|
|
*************************
|
|
|
|
Most common devices on supported platforms are enabled for
|
|
passthrough, as detailed here:
|
|
|
|
.. figure:: images/passthru-image77.png
|
|
:align: center
|
|
|
|
Passthrough Device Status
|
|
|
|
DMA Remapping
|
|
*************
|
|
|
|
To enable passthrough, for VM DMA access the VM can only
|
|
support GPA, while physical DMA requires HPA. One work-around
|
|
is building identity mapping so that GPA is equal to HPA, but this
|
|
is not recommended as some VM don't support relocation well. To
|
|
address this issue, Intel introduces VT-d in chipset to add one
|
|
remapping engine to translate GPA to HPA for DMA operations.
|
|
|
|
Each VT-d engine (DMAR Unit), maintains a remapping structure
|
|
similar to a page table with device BDF (Bus/Dev/Func) as input and final
|
|
page table for GPA/HPA translation as output. The GPA/HPA translation
|
|
page table is similar to a normal multi-level page table.
|
|
|
|
VM DMA depends on Intel VT-d to do the translation from GPA to HPA, so we
|
|
need to enable VT-d IOMMU engine in ACRN before we can passthrough any device. SOS
|
|
in ACRN is a VM running in non-root mode which also depends
|
|
on VT-d to access a device. In SOS DMA remapping
|
|
engine settings, GPA is equal to HPA.
|
|
|
|
ACRN hypervisor checks DMA-Remapping Hardware unit Definition (DRHD) in
|
|
host DMAR ACPI table to get basic info, then sets up each DMAR unit. For
|
|
simplicity, ACRN reuses EPT table as the translation table in DMAR
|
|
unit for each passthrough device. The control flow is shown in the
|
|
following figures:
|
|
|
|
.. figure:: images/passthru-image72.png
|
|
:align: center
|
|
|
|
DMA Remapping control flow during HV init
|
|
|
|
.. figure:: images/passthru-image86.png
|
|
:align: center
|
|
|
|
ptdev assignment control flow
|
|
|
|
.. figure:: images/passthru-image42.png
|
|
:align: center
|
|
|
|
ptdev de-assignment control flow
|
|
|
|
|
|
MMIO Remapping
|
|
**************
|
|
|
|
For PCI MMIO BAR, hypervisor builds EPT mapping between virtual BAR and
|
|
physical BAR, then VM can access MMIO directly.
|
|
|
|
Device configuration emulation
|
|
******************************
|
|
|
|
PCI configuration is based on access of port 0xCF8/CFC. ACRN
|
|
implements PCI configuration emulation to handle 0xCF8/CFC to control
|
|
PCI device through two paths: implemented in hypervisor or in SOS device
|
|
model.
|
|
|
|
- When configuration emulation is in the hypervisor, the interception of
|
|
0xCF8/CFC port and emulation of PCI configuration space access are
|
|
tricky and unclean. Therefore the final solution is to reuse the
|
|
PCI emulation infrastructure of SOS device model. The hypervisor
|
|
routes the UOS 0xCF8/CFC access to device model, and keeps blind to the
|
|
physical PCI devices. Upon receiving UOS PCI configuration space access
|
|
request, device model needs to emulate some critical space, for instance,
|
|
BAR, MSI capability, and INTLINE/INTPIN.
|
|
|
|
- For other access, device model
|
|
reads/writes physical configuration space on behalf of UOS. To do
|
|
this, device model is linked with lib pci access to access physical PCI
|
|
device.
|
|
|
|
Interrupt Remapping
|
|
*******************
|
|
|
|
When the physical interrupt of a passthrough device happens, hypervisor has
|
|
to distribute it to the relevant VM according to interrupt remapping
|
|
relationships. The structure ``ptirq_remapping_info`` is used to define
|
|
the subordination relation between physical interrupt and VM, the
|
|
virtual destination, etc. See the following figure for details:
|
|
|
|
.. figure:: images/passthru-image91.png
|
|
:align: center
|
|
|
|
Remapping of physical interrupts
|
|
|
|
There are two different types of interrupt source: IOAPIC and MSI.
|
|
The hypervisor will record different information for interrupt
|
|
distribution: physical and virtual IOAPIC pin for IOAPIC source,
|
|
physical and virtual BDF and other info for MSI source.
|
|
|
|
SOS passthrough is also in the scope of interrupt remapping which is
|
|
done on-demand rather than on hypervisor initialization.
|
|
|
|
.. figure:: images/passthru-image102.png
|
|
:align: center
|
|
:name: init-remapping
|
|
|
|
Initialization of remapping of virtual IOAPIC interrupts for SOS
|
|
|
|
:numref:`init-remapping` above illustrates how remapping of (virtual) IOAPIC
|
|
interrupts are remapped for SOS. VM exit occurs whenever SOS tries to
|
|
unmask an interrupt in (virtual) IOAPIC by writing to the Redirection
|
|
Table Entry (or RTE). The hypervisor then invokes the IOAPIC emulation
|
|
handler (refer to :ref:`hld-io-emulation` for details on I/O emulation) which
|
|
calls APIs to set up a remapping for the to-be-unmasked interrupt.
|
|
|
|
Remapping of (virtual) PIC interrupts are set up in a similar sequence:
|
|
|
|
.. figure:: images/passthru-image98.png
|
|
:align: center
|
|
|
|
Initialization of remapping of virtual MSI for SOS
|
|
|
|
This figure illustrates how mappings of MSI or MSIX are set up for
|
|
SOS. SOS is responsible for issuing an hypercall to notify the
|
|
hypervisor before it configures the PCI configuration space to enable an
|
|
MSI. The hypervisor takes this opportunity to set up a remapping for the
|
|
given MSI or MSIX before it is actually enabled by SOS.
|
|
|
|
When the UOS needs to access the physical device by passthrough, it uses
|
|
the following steps:
|
|
|
|
- UOS gets a virtual interrupt
|
|
- VM exit happens and the trapped vCPU is the target where the interrupt
|
|
will be injected.
|
|
- Hypervisor will handle the interrupt and translate the vector
|
|
according to ptirq_remapping_info.
|
|
- Hypervisor delivers the interrupt to UOS.
|
|
|
|
When the SOS needs to use the physical device, the passthrough is also
|
|
active because the SOS is the first VM. The detail steps are:
|
|
|
|
- SOS get all physical interrupts. It assigns different interrupts for
|
|
different VMs during initialization and reassign when a VM is created or
|
|
deleted.
|
|
- When physical interrupt is trapped, an exception will happen after VMCS
|
|
has been set.
|
|
- Hypervisor will handle the vm exit issue according to
|
|
ptirq_remapping_info and translates the vector.
|
|
- The interrupt will be injected the same as a virtual interrupt.
|
|
|
|
ACPI Virtualization
|
|
*******************
|
|
|
|
ACPI virtualization is designed in ACRN with these assumptions:
|
|
|
|
- HV has no knowledge of ACPI,
|
|
- SOS owns all physical ACPI resources,
|
|
- UOS sees virtual ACPI resources emulated by device model.
|
|
|
|
Some passthrough devices require physical ACPI table entry for
|
|
initialization. The device model will create such device entry based on
|
|
the physical one according to vendor ID and device ID. Virtualization is
|
|
implemented in SOS device model and not in scope of the hypervisor.
|
|
|
|
GSI Sharing Violation Check
|
|
***************************
|
|
|
|
All the PCI devices that are sharing the same GSI should be assigned to
|
|
the same VM to avoid physical GSI sharing between multiple VMs. For
|
|
devices that don't support MSI, ACRN DM
|
|
shares the same GSI pin to a GSI
|
|
sharing group. The devices in the same group should be assigned together to
|
|
the current VM, otherwise, none of them should be assigned to the
|
|
current VM. A device that violates the rule will be rejected to be
|
|
passthrough. The checking logic is implemented in Device Mode and not
|
|
in scope of hypervisor.
|
|
|
|
Data structures and interfaces
|
|
******************************
|
|
|
|
The following APIs are provided to initialize interrupt remapping for
|
|
SOS:
|
|
|
|
.. doxygenfunction:: ptirq_intx_pin_remap
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: ptirq_msix_remap
|
|
:project: Project ACRN
|
|
|
|
The following APIs are provided to manipulate the interrupt remapping
|
|
for UOS.
|
|
|
|
.. doxygenfunction:: ptirq_add_intx_remapping
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: ptirq_remove_intx_remapping
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: ptirq_add_msix_remapping
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: ptirq_remove_msix_remapping
|
|
:project: Project ACRN
|
|
|
|
The following APIs are provided to acknowledge a virtual interrupt.
|
|
|
|
.. doxygenfunction:: ptirq_intx_ack
|
|
:project: Project ACRN
|