acrn-hypervisor/doc/developer-guides/interrupt-hld.rst

487 lines
18 KiB
ReStructuredText

.. _interrupt-hld:
Interrupt Management high-level design
######################################
Overview
********
This document describes the interrupt management high-level design for
the ACRN hypervisor.
The ACRN hypervisor implements a simple but fully functional framework
to manage interrupts and exceptions, as show in
:numref:`interrupt-modules-overview`. In its native layer, it configures
the physical PIC, IOAPIC, and LAPIC to support different interrupt
sources from local timer/IPI to external INTx/MSI. In its virtual guest
layer, it emulates virtual PIC, virtual IOAPIC and virtual LAPIC, and
provides full APIs allowing virtual interrupt injection from emulated or
pass-thru devices.
.. figure:: images/interrupt-image3.png
:align: center
:width: 600px
:name: interrupt-modules-overview
ACRN Interrupt Modules Overview
In the software modules view shown in :numref:`interrupt-sw-modules`,
the ACRN hypervisor sets up the physical interrupt in its basic
interrupt modules (e.g., IOAPIC/LAPIC/IDT). It dispatches the interrupt
in the hypervisor interrupt flow control layer to the corresponding
handlers, that could be pre-defined IPI notification, timer, or runtime
registered pass-thru devices. The ACRN hypervisor then uses its VM
interfaces based on vPIC, vIOAPIC, and vMSI modules, to inject the
necessary virtual interrupt into the specific VM
.. figure:: images/interrupt-image2.png
:align: center
:width: 600px
:name: interrupt-sw-modules
ACRN Interrupt SW Modules Overview
Hypervisor Physical Interrupt Management
****************************************
The ACRN hypervisor is responsible for all the physical interrupt
handling. All physical interrupts are first handled in VMX root-mode.
The "external-interrupt exiting" bit in the VM-Execution controls field
is set to support this. The ACRN hypervisor also initializes all the
interrupt related modules such as IDT, PIC, IOAPIC, and LAPIC.
Only a few physical interrupts (such as TSC-Deadline timer and IOMMU)
are fully serviced in the hypervisor. Most interrupts come from pass-thru
devices whose interrupt are remapped to a virtual INTx/MSI source and
injected to the SOS or UOS, according to the pass-thru device
configuration.
The ACRN hypervisor does handle exceptions and any exception coming from
the VMX root-mode will lead to the CPU halting. For guest exception, the
hypervisor only traps #MC (machine check), prints a warning message, and
injects the exception back into the guest OS.
Physical Interrupt Initialization
=================================
After the ACRN hypervisor get control from the bootloader, it
initializes all physical interrupt-related modules for all the CPUs. The
ACRN hypervisor creates a framework to manage the physical interrupt for
hypervisor-local devices, pass-thru devices, and IPI between CPUs.
IDT
---
The ACRN hypervisor builds its native Interrupt Descriptor Table (IDT) during
interrupt initialization. For exceptions, it links to function
``dispatch_exception``, and for external interrupts it links to function
``dispatch_interrupt``. Please refer to ``arch/x86/idt.S`` for more details.
LAPIC
-----
The ACRN hypervisor resets LAPIC for each CPU, and provides basic APIs
used, for example, by the local timer (TSC Deadline)
program and IPI notification program. These APIs include
write_laipic_reg32, send_lapic_eoi, send_startup_ipi, and
send_single_ipi.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/arch/x86/lapic.h
PIC/IOAPIC
----------
The ACRN hypervisor masks all interrupts from PIC, so all the
legacy interrupts from PIC (<16) are linked to IOAPIC, as shown in
:numref:`interrupt-pic-pin`.
ACRN will pre-allocate vectors and mask them for these legacy interrupts
in IOAPIC RTE. For others (>= 16) ACRN will mask them with vector 0 in
RTE, and the vector will be dynamically allocated on demand.
.. figure:: images/interrupt-image5.png
:align: center
:width: 600px
:name: interrupt-pic-pin
PIC & IOAPIC Pin Connection
Irq Desc
--------
The ACRN hypervisor maintains a global ``irq_desc[]`` array shared among the
CPUs and uses a flat mode to manage the interrupts. The same
vector is linked to the same IRQ number for all CPUs.
.. comment
Need reference to API doc generated from doxygen comments
for ``struct irq_desc`` in hypervisor/include/common/irq.h
The ``irq_desc[]`` array is indexed by the IRQ number. An
``irq_handler`` field can be set to a common edge, level, or quick
handler called from ``interrupt_dispatch``. The ``irq_desc`` structure
also contains the ``dev_list`` field to maintain this IRQ's action
handler list.
The global array ``vector_to_irq[]`` is used to manage the vector
resource. This array is initialized with value ``IRQ_INVALID`` for all
vectors, and will be set to a valid IRQ number after the corresponding
vector is registered.
For example, if the local timer registers interrupt with IRQ number 271 and
vector 0xEF, then the arrays mentioned above will be set to::
irq_desc[271].irq = 271;
irq_desc[271].vector = 0xEF;
vector_to_irq[0xEF] = 271;
Physical Interrupt Flow
=======================
When an physical interrupt occurs, and the CPU is running under VMX root
mode, the interrupt is triggered from the standard native irq flow:
interrupt gate to irq handler. However, if the CPU is running under VMX
non-root mode, an external interrupt will trigger a VM exit for reason
"external-interrupt". See :numref:`interrupt-handle-flow`.
.. figure:: images/interrupt-image4.png
:align: center
:width: 800px
:name: interrupt-handle-flow
ACRN Hypervisor Interrupt Handle Flow
After an interrupt happens (in either case noted above), the ACRN
hypervisor jumps to ``dispatch_interrupt``. This function will check
which vector caused this interrupt, and the corresponding ``irq_desc``
structure's ``irq_handler`` will be called for the service.
There are several irq_handler's defined in the ACRN hypervisor, as shown
in :numref:`interrupt-handle-flow`, designed for different uses. For
example, ``quick_handler_nolock`` is used when no critical data needs
protection in the action handlers; the VCPU notification IPI and local
timer are good example of this use case.
The more complicated ``common_dev_handler_level`` handler is intended
for pass-thru devices with level triggered interrupts. To avoid
continuously triggering the interrupt, it initially masks IOAPIC pin and
unmasks it only when the corresponding vIOAPIC pin gets an explicit EOI
ACK from the guest.
All the irq handler's finally call their own action handler list, as
shown here:
.. code-block: c
struct dev_handler_node \*dev = desc->dev_list;
while (dev != NULL) {
if (dev->dev_handler != NULL)
dev->dev_handler(desc->irq, dev->dev_data);
dev = dev->next;
}
The common APIs for registering, updating, and unregistering
interrupt handlers include irq_to_vector, dev_to_irq, dev_to_vector,
pri_register_handler, normal_register_handler,
unregister_handler_common, and update_irq_handler.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/irq.h
.. _physical_interrupt_source:
Physical Interrupt Source
=========================
The ACRN hypervisor handles interrupts from many different sources, as
shown in :numref:`interrupt-source`:
.. list-table:: Physical Interrupt Source
:widths: 15 10 60
:header-rows: 1
:name: interrupt-source
* - Interrupt Source
- Vector
- Description
* - TSC Deadline Timer
- 0xEF
- The TSC deadline timer implements the timer framework in
the hypervisor based on the LAPIC TSC deadline. This interrupt's
target is specific to the CPU to which the LAPIC belongs.
* - CPU Startup IPI
- N/A
- The BSP needs to trigger an INIT-SIPI sequence to wake up the
APs. This interrupt's target is specified by the BSP calling
`` start_cpus()``.
* - VCPU Notify IPI
- 0xF0
- When the hypervisor needs to kick the VCPU out of VMX non-root
mode to do requests such as virtual interrupt injection, EPT
flush, etc. This interrupt's target is specified by function
``send_single_ipi()``.
* - IOMMU MSI
- dynamic
- IOMMU device supports an MSI interrupt. The vtd device driver in
the hypervisor will register an interrupt to handle dmar fault.
This interrupt's target is specified by vtd device driver.
* - PTdev INTx
- dynamic
- All native devices are owned by the guest (SOS or UOS), taking
advantage of the pass-thru method. Each pass-thru device connected
with IOAPIC/PIC (PTdev INTx) will register an interrupt when
its attached interrupt controller pin first gets unmasked.
This interrupt's target is defined by and RTE entry in the IOAPIC.
* - PTdev MSI
- dynamic
- All native devices are owned by the guest (SOS or UOS), taking
advantage of pass-thru method. Each pass-thru device with
enabled MSI (PTdev MSI) will register an interrupt when the SOS
does an explicit hypercall. This interrupt's target is defined
by an MSI address entry.
Softirq
=======
ACRN hypervisor implements a simple bottom-half softirq to execute the
interrupt handler, as showed in :numref:`interrupt-handle-flow`.
The softirq is executed when an interrupt is enabled. Several APIs for softirq
are defined including enable_softirq, disable_softirq, raise_softirq,
and exec_softirq.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/softirq.h
Physical Exception Handling
===========================
As mentioned earlier, the ACRN hypervisor does not handle any
physical exceptions. The VMX root mode code path should guarantee no
exceptions are triggered while the hypervisor is running.
Guest Virtual Interrupt Management
**********************************
The previous sections describe physical interrupt management in the ACRN
hypervisor. After a physical interrupt happens, a registered action
handler is executed. Usually, the action handler represents a service
for virtual interrupt injection. For example, if an interrupt is
triggered from a pass-thru device, the appropriate virtual interrupt
should be injected into its guest VM.
The virtual interrupt injection could also come from an emulated device.
The I/O mediator in the Service OS (SOS) could trigger an interrupt
through a hypercall, and then do the virtual interrupt injection in the
hypervisor.
The following sections give an introduction to the ACRN guest virtual
interrupt management, including VCPU request for virtual interrupt kick
off, vPIC/vIOAPIC/vLAPIC for virtual interrupt injection interfaces,
physical-to-virtual interrupt mapping for a pass-thru device, and the
process of VMX interrupt/exception injection.
VCPU Request
============
As mentioned in `physical_interrupt_source`_, physical vector 0xF0 is
used to kick the VCPU out of its VMX non-root mode, and make a request
for virtual interrupt injection or other requests such as flush EPT.
The request-make API (vcpu_make_request) and eventid supports virtual interrupt
injection.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/irq.h
There are requests for exception injection (ACRN_REQUEST_EXCP), vLAPIC
event (ACRN_REQUEST_EVENT), external interrupt from vPIC
(ACRN_REQUEST_EXTINT) and non-maskable-interrupt (ACRN_REQUEST_NMI).
The ``vcpu_make_request`` is necessary for a virtual interrupt
injection. If the target VCPU is running under VMX non-root mode, it
will send an IPI to kick it out and results in an external-interrupt
VM-Exit. The flow of :numref:`interrupt-handle-flow` could be executed
to complete the injection of a virtual interrupt.
There are some cases that do not need to send an IPI when making a
request because the CPU making the request is the target VCPU. For
example, the #GP exception request always happens on the current CPU
when an invalid emulation happens. An external interrupt for a pass-thru
device always happens on the VCPUs the device belongs to, so after it
triggers an external-interrupt VM-Exit, the current CPU is also the
target VCPU.
Virtual PIC
===========
The ACRN hypervisor emulates a vPIC for each VM based on IO ranges
0x20-0x21, 0xa0-0xa1, or 0x4d0-0x4d1.
If an interrupt source from vPIC needs to inject an interrupt,
the vpic_assert_irq, vpic_deassert_irq, or vpic_pulse_irq functions can
be called to make a request for ACRN_REQUEST_EXTINT or
ACRN_REQUEST_EVENT:
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/vpic.h
The vpic_pending_intr and vpic_intr_accepted APIs are used to query the
vector being injected and ACK the service, by moving the interrupt from
request service (IRR) to in service (ISR).
Virtual IOAPIC
==============
ACRN hypervisor emulates a vIOAPIC for each VM based on MMIO
VIOAPIC_BASE.
If an interrupt source from vIOAPIC needs to inject an interrupt, the
vioapic_assert_irq, vioapic_dessert_irq, and vioapic_pulse_irq APIs are
used to make a request for ACRN_REQUEST_EVENT.
As the vIOAPIC is always associated with a vLAPIC, the virtual interrupt
injection from vIOAPIC will finally trigger a request for an vLAPIC
event.
Virtual LAPIC
=============
The ACRN hypervisor emulates a vLAPIC for each VCPU based on MMIO
DEFAULT_APIC_BASE.
If an interrupt source from vLAPIC needs to inject an interrupt (e.g.,
from LVT such as an LAPIC timer, from vIOAPIC for a pass-thru device
interrupt, or from an emulated device for a MSI), vlapic_intr_level,
vlapic_intr_edge, vlapic_set_local_intr, vlapic_intr_msi,
vlapic_deliver_intr APIs need to be called, resulting in a request for
ACRN_REQUEST_EVENT.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/vlapic.h
The vlapic_pending_intr and vlapic_intr_accepted APIs are used to query
the vector that needs to be injected and ACK
the service that move the interrupt from request service (IRR) to in
service (ISR).
By default, the ACRN hypervisor enables vAPIC to improve the performance of
a vLAPIC emulation.
Virtual Exception
=================
When doing emulation, an exception may be triggered in the hypervisor,
for example, if guest accesses an invalid vMSR register, or the
hypervisor needs to inject a #GP, or during instruction emulation, an
instruction fetch may access a non-exist page from rip_gva, and a #PF
must be injected.
ACRN hypervisor implements virtual exception injection using the
vcpu_queue_exception, vcpu_inject_gq, and vcpu_inject_pf APIs.
.. comment
Need reference to API doc generated from doxygen comments
in hypervisor/include/common/irq.h
The ACRN hypervisor uses vcpu_inject_gp/vcpu_inject_pf functions to
queue exception requests, and follows `Intel Software
Developer Manual, Vol 3. <SDM vol3>`_ - 6.15, Table 6-5
listing conditions for generating a double fault.
.. _SDM vol3: https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
Interrupt Mapping for a Pass-thru Device
========================================
A VM can control a PCI device directly through pass-thru device
assignment. The pass-thru entry is the major info object, and it is:
- A physical interrupt source, and could be a MSI/MSIX entry, PIC pins, or
IOAPIC pins
- Pass-thru remapping information between physical and virtual interrupt
source, for MSI/MSIX it is identified by a PCI device's BDF. For
PIC/IOAPIC it is identified by the pin number.
.. figure:: images/interrupt-image7.png
:align: center
:width: 600px
:name: interrupt-pass-thru
Pass-thru Device Entry Assignment
As shown in :numref:`interrupt-pass-thru` above, a UOS will assign its
pass-thru device entry by the DM, and it will fill its entry info from:
- vPIC/vIOAPIC interrupt mask/unmask
- MSI IOReq from UOS then MSI hypercall from SOS
The SOS adds its pass-thru device entry at runtime and fills info for:
- vPIC/vIOAPIC interrupt mask/unmask
- MSI hypercall from SOS
During the pass-thru device entry info filling, the hypervisor builds
native IOAPIC RTE/MSI entry based on vIOAPIC/vPIC/vMSI configuration,
and register the physical interrupt handler for it. Then with the pass-thru
device entry as the handler private data, the physical interrupt can
be linked to a virtual pin of a guest's vPIC/vIOAPIC or virtual vector of
a guest's vMSI. The handler then injects the corresponding virtual
interrupt into the guest, based on vPIC/vIOAPIC/vLAPIC APIs described
earlier.
Interrupt Storm Mitigation
==========================
When the Device Model (DM) launches a User OS (UOS), the ACRN hypervisor
will remap the interrupt for this user OS's pass-through devices. When
an interrupt occurs for a pass-through device, the CPU core is assigned
to that User OS gets trapped into the hypervisor. The benefit of such a
mechanism is that, should an interrupt storm happen in a particular UOS,
it will have only a minimal effect on the performance of the Service OS.
Interrupt/Exception Injection Process
=====================================
As shown in :numref:`interrupt-handle-flow`, the ACRN hypervisor injects
virtual interrupt/exception to the guest before its VM-Entry.
This is done by updating the VMX_ENTRY_INT_INFO_FIELD of the VCPU's
VMCS. As this field is unique, the interrupt/exception injection must
follow a priority rule to handle one-by-one.
:numref:`interrupt-injection` below shows the rules about how to inject
virtual interrupt/exception one-by-one. If a high priority
interrupt/exception was already injected, the next pending
interrupt/exception will enable an interrupt window where the next
injection will be done by the following VM-Exit, triggered by the
interrupt window.
.. figure:: images/interrupt-image6.png
:align: center
:width: 600px
:name: interrupt-injection
ACRN Hypervisor Interrupt/Exception Injection Process