571 lines
21 KiB
ReStructuredText
571 lines
21 KiB
ReStructuredText
.. _hld-overview:
|
|
|
|
ACRN High-Level Design Overview
|
|
###############################
|
|
|
|
ACRN is an open-source reference hypervisor (HV) that runs on top of
|
|
:ref:`Intel platforms <hardware>` for heterogeneous use cases such as
|
|
Software-defined Cockpit (SDC), or In-vehicle Experience (IVE) for
|
|
automotive, or human-machine interface (HMI) and real-time OS for industry.
|
|
ACRN provides embedded hypervisor vendors with a reference I/O mediation
|
|
solution with a permissive license and provides auto makers and industry users a
|
|
reference software stack for corresponding use.
|
|
|
|
ACRN Use Cases
|
|
**************
|
|
|
|
Software-Defined Cockpit
|
|
========================
|
|
|
|
The SDC system consists of multiple systems: the instrument cluster (IC)
|
|
system, the In-vehicle Infotainment (IVI) system, and one or more rear
|
|
seat entertainment (RSE) systems. Each system runs as a VM for better
|
|
isolation.
|
|
|
|
The Instrument Control (IC) system manages graphic displays of:
|
|
|
|
- driving speed, engine RPM, temperature, fuel level, odometer, trip mile, etc.
|
|
- alerts of low fuel or tire pressure
|
|
- rear-view camera (RVC) and surround-camera view for driving assistance
|
|
|
|
In-Vehicle Infotainment
|
|
=======================
|
|
|
|
A typical In-vehicle Infotainment (IVI) system supports:
|
|
|
|
- Navigation systems
|
|
- Radios, audio and video playback
|
|
- Mobile devices connection for calls, music, and applications via voice
|
|
recognition and/or gesture Recognition / Touch
|
|
- Rear-seat RSE services such as:
|
|
|
|
- entertainment system
|
|
- virtual office
|
|
- connection to IVI front system and mobile devices (cloud
|
|
connectivity)
|
|
|
|
ACRN supports guest OSes of Linux and Android. OEMs can use the ACRN hypervisor
|
|
and the Linux or Android guest OS reference code to implement their own VMs for
|
|
a customized IC/IVI/RSE.
|
|
|
|
Industry Usage
|
|
==============
|
|
|
|
A typical industry usage includes one Windows HMI + one real-time VM (RTVM):
|
|
|
|
- Windows HMI as a guest OS with display to provide human-machine interface
|
|
- RTVM that runs a specific RTOS on it to handle
|
|
real-time workloads such as PLC control
|
|
|
|
ACRN supports a Windows* Guest OS for such HMI capability. ACRN continues to add
|
|
features to enhance its real-time performance to meet hard-RT key performance
|
|
indicators for its RTVM:
|
|
|
|
- Cache Allocation Technology (CAT)
|
|
- Memory Bandwidth Allocation (MBA)
|
|
- LAPIC passthrough
|
|
- Polling mode driver
|
|
- Always Running Timer (ART)
|
|
- Intel Time Coordinated Computing (TCC) features, such as split lock
|
|
detection and cache locking
|
|
|
|
|
|
Hardware Requirements
|
|
*********************
|
|
|
|
Mandatory IA CPU features:
|
|
|
|
- Long mode
|
|
- MTRR
|
|
- TSC deadline timer
|
|
- NX, SMAP, SMEP
|
|
- Intel-VT including VMX, EPT, VT-d, APICv, VPID, INVEPT and INVVPID
|
|
|
|
Recommended Memory: 4GB, 8GB preferred.
|
|
|
|
|
|
ACRN Architecture
|
|
*****************
|
|
|
|
ACRN is a type 1 hypervisor that runs on top of bare metal. It supports
|
|
certain :ref:`Intel platforms <hardware>` and can be easily extended to support
|
|
future
|
|
platforms. ACRN implements a hybrid VMM architecture, using a privileged
|
|
Service VM to manage I/O devices and
|
|
provide I/O mediation. Multiple User VMs can be supported, running Ubuntu,
|
|
Android, Windows, or an RTOS such as Zephyr.
|
|
|
|
ACRN 1.0
|
|
========
|
|
|
|
ACRN 1.0 is designed mainly for auto use cases such as SDC and IVI.
|
|
|
|
Instrument cluster applications are critical in the SDC use case, and may
|
|
require functional safety certification in the future. Running the IC system in
|
|
a separate VM can isolate it from other VMs and their applications, thereby
|
|
reducing the attack surface and minimizing potential interference. However,
|
|
running the IC system in a separate VM introduces additional latency for the IC
|
|
applications. Some country regulations require an IVE system to show a rear-view
|
|
camera (RVC) within 2 seconds, which is difficult to achieve if a separate
|
|
instrument cluster VM is started after the User VM is booted.
|
|
|
|
:numref:`overview-arch1.0` shows the architecture of ACRN 1.0 together with
|
|
the IC VM and Service VM. As shown, the Service VM owns most of the platform
|
|
devices and
|
|
provides I/O mediation to VMs. Some of the PCIe devices function as a
|
|
passthrough mode to User VMs according to VM configuration. In addition,
|
|
the Service VM could run the IC applications and HV helper applications such
|
|
as the Device Model, VM manager, etc., where the VM manager is responsible
|
|
for VM start/stop/pause, virtual CPU pause/resume, etc.
|
|
|
|
.. figure:: images/over-image34.png
|
|
:align: center
|
|
:name: overview-arch1.0
|
|
|
|
ACRN 1.0 Architecture
|
|
|
|
ACRN 2.0
|
|
========
|
|
|
|
ACRN 2.0 extended ACRN to support a pre-launched VM (mainly for safety VM)
|
|
and real-time (RT) VM.
|
|
|
|
:numref:`overview-arch2.0` shows the architecture of ACRN 2.0; the main
|
|
differences compared to ACRN 1.0 are that:
|
|
|
|
- ACRN 2.0 supports a pre-launched VM, with isolated resources,
|
|
including CPU, memory, and hardware devices.
|
|
|
|
- ACRN 2.0 adds a few necessary device emulations in the hypervisor, such as
|
|
vPCI and vUART, to avoid interference between different VMs.
|
|
|
|
- ACRN 2.0 supports an RTVM as a post-launched User VM, with features such as
|
|
LAPIC passthrough and PMD virtio driver.
|
|
|
|
.. figure:: images/over-image35.png
|
|
:align: center
|
|
:name: overview-arch2.0
|
|
|
|
ACRN 2.0 Architecture
|
|
|
|
.. _intro-io-emulation:
|
|
|
|
Device Emulation
|
|
================
|
|
|
|
ACRN adopts various approaches for emulating devices for the User VM:
|
|
|
|
- **Emulated device**: A virtual device using this approach is emulated in
|
|
the Service VM by trapping accesses to the device in the User VM. Two
|
|
sub-categories exist for emulated devices:
|
|
|
|
- fully emulated, allowing native drivers to be used
|
|
unmodified in the User VM, and
|
|
- para-virtualized, requiring front-end drivers in
|
|
the User VM to function.
|
|
|
|
- **Passthrough device**: A device passed through to the User VM is fully
|
|
accessible to the User VM without interception. However, interrupts
|
|
are first handled by the hypervisor before
|
|
being injected to the User VM.
|
|
|
|
- **Mediated passthrough device**: A mediated passthrough device is a
|
|
hybrid of the previous two approaches. Performance-critical
|
|
resources (mostly data-plane related) are passed-through to the User VMs, and
|
|
other resources (mostly control-plane related) are emulated.
|
|
|
|
|
|
.. _ACRN-io-mediator:
|
|
|
|
I/O Emulation
|
|
-------------
|
|
|
|
The Device Model (DM) is a place for managing User VM devices: it allocates
|
|
memory for the User VMs, configures and initializes the devices shared by the
|
|
guest, loads the virtual BIOS and initializes the virtual CPU state, and
|
|
invokes the hypervisor service to execute the guest instructions.
|
|
|
|
The following diagram illustrates the control flow of emulating a port
|
|
I/O read from the User VM.
|
|
|
|
.. figure:: images/over-image29.png
|
|
:align: center
|
|
:name: overview-io-emu-path
|
|
|
|
I/O (PIO/MMIO) Emulation Path
|
|
|
|
:numref:`overview-io-emu-path` shows an example I/O emulation flow path.
|
|
|
|
When a guest executes an I/O instruction (port I/O or MMIO), a VM exit
|
|
happens. The HV takes control and executes the request based on the VM exit
|
|
reason ``VMX_EXIT_REASON_IO_INSTRUCTION`` for port I/O access, for
|
|
example. The HV fetches the additional guest instructions, if any,
|
|
and processes the port I/O instructions at a pre-configured port address
|
|
(in ``AL, 20h``, for example). The HV places the decoded information, such as
|
|
the port I/O address, size of access, read/write, and target register,
|
|
into the I/O request in the I/O request buffer (shown in
|
|
:numref:`overview-io-emu-path`) and then notifies/interrupts the Service VM
|
|
to process.
|
|
|
|
The Hypervisor service module (HSM) in the Service VM intercepts HV interrupts,
|
|
and accesses the I/O request buffer for the port I/O instructions. It
|
|
then checks to see if any kernel device claims ownership of the
|
|
I/O port. The owning device, if any, executes the requested APIs from a
|
|
VM. Otherwise, the HSM leaves the I/O request in the request buffer
|
|
and wakes up the DM thread for processing.
|
|
|
|
DM follows the same mechanism as HSM. The I/O processing thread of the
|
|
DM queries the I/O request buffer to get the PIO instruction details and
|
|
checks to see if any (guest) device emulation modules claim ownership of
|
|
the I/O port. If yes, the owning module is invoked to execute requested
|
|
APIs.
|
|
|
|
When the DM completes the emulation (port I/O 20h access in this example)
|
|
of a device such as uDev1, uDev1 puts the result into the request
|
|
buffer (register AL). The DM returns the control to the HV
|
|
indicating completion of an I/O instruction emulation, typically through
|
|
HSM/hypercall. The HV then stores the result to the guest register
|
|
context, advances the guest IP to indicate the completion of instruction
|
|
execution, and resumes the guest.
|
|
|
|
MMIO access path is similar except for a VM exit reason of *EPT violation*.
|
|
MMIO access is usually trapped through a ``VMX_EXIT_REASON_EPT_VIOLATION`` in
|
|
the hypervisor.
|
|
|
|
DMA Emulation
|
|
-------------
|
|
|
|
The only fully virtualized devices to the User VM are USB xHCI, UART,
|
|
and Automotive I/O controller. None of these require emulating
|
|
DMA transactions. ACRN does not support virtual DMA.
|
|
|
|
Hypervisor
|
|
**********
|
|
|
|
ACRN takes advantage of Intel Virtualization Technology (Intel VT).
|
|
The ACRN HV runs in Virtual Machine Extension (VMX) root operation,
|
|
host mode, or VMM mode, while the Service VM and User VM guests run
|
|
in VMX non-root operation, or guest mode. (We'll use "root mode"
|
|
and "non-root mode" for simplicity.)
|
|
|
|
The VMM mode has 4 rings. ACRN
|
|
runs the HV in ring 0 privilege only, and leaves ring 1-3 unused. A guest
|
|
running in non-root mode has its own full rings (ring 0 to 3). The
|
|
guest kernel runs in ring 0 in guest mode, while the guest userland
|
|
applications run in ring 3 of guest mode (ring 1 and 2 are usually not
|
|
used by commercial OS).
|
|
|
|
.. figure:: images/over-image11.png
|
|
:align: center
|
|
:name: overview-arch-hv
|
|
|
|
|
|
Architecture of ACRN Hypervisor
|
|
|
|
:numref:`overview-arch-hv` shows an overview of the ACRN hypervisor architecture.
|
|
|
|
- A platform initialization layer provides an entry
|
|
point, checking hardware capabilities and initializing the
|
|
processors, memory, and interrupts. Relocation of the hypervisor
|
|
image and derivation of encryption seeds are also supported by this
|
|
component.
|
|
|
|
- A hardware management and utilities layer provides services for
|
|
managing physical resources at runtime. Examples include handling
|
|
physical interrupts and low power state changes.
|
|
|
|
- A layer sitting on top of hardware management enables virtual
|
|
CPUs (or vCPUs), leveraging Intel VT. A vCPU loop runs a vCPU in
|
|
non-root mode and handles VM exit events triggered by the vCPU.
|
|
This layer handles CPU and memory-related VM
|
|
exits and provides a way to inject exceptions or interrupts to a
|
|
vCPU.
|
|
|
|
- On top of vCPUs are three components for device emulation: one for
|
|
emulation inside the hypervisor, another for communicating with
|
|
the Service VM for mediation, and the third for managing passthrough
|
|
devices.
|
|
|
|
- The highest layer is a VM management module providing
|
|
VM lifecycle and power operations.
|
|
|
|
- A library component provides basic utilities for the rest of the
|
|
hypervisor, including encryption algorithms, mutual-exclusion
|
|
primitives, etc.
|
|
|
|
There are three ways that the hypervisor interacts with the Service VM:
|
|
the VM exits (including hypercalls), upcalls, and through the I/O request buffer.
|
|
Interaction between the hypervisor and the User VM is more restricted, including
|
|
only VM exits and hypercalls related to trusty.
|
|
|
|
Service VM
|
|
**********
|
|
|
|
The Service VM is an important guest OS in the ACRN architecture. It
|
|
runs in non-root mode, and contains many critical components, including the VM
|
|
Manager, the Device Model (DM), ACRN services, kernel mediation, and virtio
|
|
and hypercall modules (HSM). The DM manages the User VM and
|
|
provides device emulation for it. The User VMS also provides services
|
|
for system power lifecycle management through the ACRN service and VM manager,
|
|
and services for system debugging through ACRN log/trace tools.
|
|
|
|
DM
|
|
==
|
|
|
|
DM (Device Model) is a user-level QEMU-like application in the Service VM
|
|
responsible for creating the User VM and then performing devices emulation
|
|
based on command line configurations.
|
|
|
|
Based on an HSM kernel module, DM interacts with VM Manager to create the User
|
|
VM. It then emulates devices through full virtualization on the DM user
|
|
level, or para-virtualized based on kernel mediator (such as virtio,
|
|
GVT), or passthrough based on kernel HSM APIs.
|
|
|
|
Refer to :ref:`hld-devicemodel` for more details.
|
|
|
|
VM Manager
|
|
==========
|
|
|
|
VM Manager is a user-level service in the Service VM handling User VM creation and
|
|
VM state management, according to the application requirements or system
|
|
power operations.
|
|
|
|
VM Manager creates the User VM based on DM application, and does User VM state
|
|
management by interacting with lifecycle service in ACRN service.
|
|
|
|
Refer to :ref:`hv-vm-management` for more details.
|
|
|
|
ACRN Service
|
|
============
|
|
|
|
ACRN service provides
|
|
system lifecycle management based on IOC polling. It communicates with the
|
|
VM Manager to handle the User VM state, such as S3 and power-off.
|
|
|
|
HSM
|
|
===
|
|
|
|
The HSM (Hypervisor service module) kernel module is the Service VM kernel driver
|
|
supporting User VM management and device emulation. Device Model follows
|
|
the standard Linux char device API (ioctl) to access HSM
|
|
functionalities. HSM communicates with the ACRN hypervisor through
|
|
hypercall or upcall interrupts.
|
|
|
|
Refer to :ref:`hld-devicemodelhsm` for more details.
|
|
|
|
Kernel Mediators
|
|
================
|
|
|
|
Kernel mediators are kernel modules providing a para-virtualization method
|
|
for the User VMs, for example, an i915 GVT driver.
|
|
|
|
Log/Trace Tools
|
|
===============
|
|
|
|
ACRN Log/Trace tools are user-level applications used to
|
|
capture ACRN hypervisor log and trace data. The HSM kernel module provides a
|
|
middle layer to support these tools.
|
|
|
|
Refer to :ref:`hld-trace-log` for more details.
|
|
|
|
User VM
|
|
*******
|
|
|
|
ACRN can boot Linux and Android guest OSes. For an Android guest OS, ACRN
|
|
provides a VM environment with two worlds: normal world and trusty
|
|
world. The Android OS runs in the normal world. The trusty OS and
|
|
security sensitive applications run in the trusty world. The trusty
|
|
world can see the memory of the normal world, but the normal world cannot see
|
|
the trusty world.
|
|
|
|
Guest Physical Memory Layout - User VM E820
|
|
===========================================
|
|
|
|
DM creates an E820 table for a User VM based on these simple rules:
|
|
|
|
- If requested VM memory size < low memory limitation (2 GB,
|
|
defined in DM), then low memory range = [0, requested VM memory
|
|
size]
|
|
|
|
- If requested VM memory size > low memory limitation, then low
|
|
memory range = [0, 2G], and high memory range =
|
|
[4G, 4G + requested VM memory size - 2G]
|
|
|
|
.. figure:: images/over-image13.png
|
|
:align: center
|
|
|
|
User VM Physical Memory Layout
|
|
|
|
User VM Memory Allocation
|
|
=========================
|
|
|
|
The DM does User VM memory allocation based on the hugetlb mechanism by default.
|
|
The real memory mapping may be scattered in the Service VM physical
|
|
memory space, as shown in :numref:`overview-mem-layout`:
|
|
|
|
.. figure:: images/over-image15.png
|
|
:align: center
|
|
:name: overview-mem-layout
|
|
|
|
|
|
User VM Physical Memory Layout Based on Hugetlb
|
|
|
|
The User VM's memory is allocated by the Service VM DM application; it may come
|
|
from different huge pages in the Service VM as shown in
|
|
:numref:`overview-mem-layout`.
|
|
|
|
As the Service VM knows the size of these huge pages,
|
|
GPA\ :sup:`service_vm` and GPA\ :sup:`user_vm`, it works with the hypervisor
|
|
to complete the User VM's host-to-guest mapping using this pseudo code:
|
|
|
|
.. code-block:: none
|
|
|
|
for x in allocated huge pages do
|
|
x.hpa = gpa2hpa_for_service_vm(x.service_vm_gpa)
|
|
host2guest_map_for_user_vm(x.hpa, x.user_vm_gpa, x.size)
|
|
end
|
|
|
|
OVMF Bootloader
|
|
=======================
|
|
|
|
Open Virtual Machine Firmware (OVMF) is the virtual bootloader that supports
|
|
the EFI boot of the User VM on the ACRN hypervisor platform.
|
|
|
|
The VM Manager in the Service VM copies OVMF to the User VM memory while
|
|
creating the User VM virtual BSP. The Service VM passes the start of OVMF and
|
|
related information to HV. HV sets the guest RIP of the User VM virtual BSP as
|
|
the start of OVMF and related guest registers, and launches the User VM virtual
|
|
BSP. The OVMF starts running in the virtual real mode within the User VM.
|
|
Conceptually, OVMF is part of the User VM runtime.
|
|
|
|
Freedom From Interference
|
|
*************************
|
|
|
|
The hypervisor is critical for preventing inter-VM interference, using
|
|
the following mechanisms:
|
|
|
|
- Each physical CPU is dedicated to one vCPU.
|
|
|
|
CPU sharing is in the TODO list, but talking about inter-VM interference,
|
|
sharing a physical CPU among multiple vCPUs gives rise to multiple
|
|
sources of interference such as the vCPU of one VM flushing the
|
|
L1 & L2 cache for another, or tremendous interrupts for one VM
|
|
delaying the execution of another. It also requires vCPU
|
|
scheduling in the hypervisor to consider more complexities such as
|
|
scheduling latency and vCPU priority, exposing more opportunities
|
|
for one VM to interfere with another.
|
|
|
|
To prevent such interference, ACRN hypervisor could adopt static
|
|
core partitioning by dedicating each physical CPU to one vCPU. The
|
|
physical CPU loops in idle when the vCPU is paused by I/O
|
|
emulation. This makes the vCPU scheduling deterministic and physical
|
|
resource sharing is minimized.
|
|
|
|
- Hardware mechanisms including EPT, VT-d, SMAP and SMEP are leveraged
|
|
to prevent unintended memory accesses.
|
|
|
|
Memory corruption can be a common failure mode. ACRN hypervisor properly
|
|
sets up the memory-related hardware mechanisms to ensure that:
|
|
|
|
1. The Service VM cannot access the memory of the hypervisor, unless explicitly
|
|
allowed.
|
|
|
|
2. The User VM cannot access the memory of the Service VM and the hypervisor.
|
|
|
|
3. The hypervisor does not unintendedly access the memory of the Service or User VM.
|
|
|
|
- The destination of external interrupts is set to be the physical core
|
|
where the VM that handles them is running.
|
|
|
|
External interrupts are always handled by the hypervisor in ACRN.
|
|
Excessive interrupts to one VM (say VM A) could slow down another
|
|
VM (VM B) if they are handled by the physical core running VM B
|
|
instead of VM A. Two mechanisms are designed to mitigate such
|
|
interference.
|
|
|
|
1. The destination of an external interrupt is set to the physical core
|
|
that runs the vCPU where virtual interrupts will be injected.
|
|
|
|
2. The hypervisor maintains statistics on the total number of received
|
|
interrupts to the Service VM via a hypercall, and has a delay mechanism to
|
|
temporarily block certain virtual interrupts from being injected.
|
|
This allows the Service VM to detect the occurrence of an interrupt storm and
|
|
control the interrupt injection rate when necessary.
|
|
|
|
Boot Flow
|
|
*********
|
|
|
|
.. figure:: images/over-image85.png
|
|
:align: center
|
|
|
|
.. figure:: images/over-image134.png
|
|
:align: center
|
|
|
|
|
|
ACRN Boot Flow
|
|
|
|
Power Management
|
|
****************
|
|
|
|
CPU P-State & C-State
|
|
=====================
|
|
|
|
In ACRN, CPU P-state and C-state (Px/Cx) are controlled by the guest OS.
|
|
The corresponding governors are managed in the Service VM or User VM for
|
|
best power efficiency and simplicity.
|
|
|
|
Guests should be able to process the ACPI P-state and C-state requests from
|
|
OSPM. The needed ACPI objects for P-state and C-state management should be ready
|
|
in an ACPI table.
|
|
|
|
The hypervisor can restrict a guest's P-state and C-state requests (per customer
|
|
requirement). MSR accesses of P-state requests could be intercepted by
|
|
the hypervisor and forwarded to the host directly if the requested
|
|
P-state is valid. Guest MWAIT or port I/O accesses of C-state control could
|
|
be passed through to host with no hypervisor interception to minimize
|
|
performance impacts.
|
|
|
|
This diagram shows CPU P-state and C-state management blocks:
|
|
|
|
.. figure:: images/over-image4.png
|
|
:align: center
|
|
|
|
|
|
CPU P-State and C-State Management Block Diagram
|
|
|
|
System Power State
|
|
==================
|
|
|
|
ACRN supports ACPI standard defined power states: S3 and S5 in system
|
|
level. For each guest, ACRN assumes the guest implements OSPM and controls its
|
|
own power state accordingly. ACRN doesn't involve guest OSPM. Instead,
|
|
it traps the power state transition request from the guest and emulates it.
|
|
|
|
.. figure:: images/over-image21.png
|
|
:align: center
|
|
:name: overview-pm-block
|
|
|
|
ACRN Power Management Diagram Block
|
|
|
|
:numref:`overview-pm-block` shows the basic diagram block for ACRN PM.
|
|
The OSPM in each guest manages the guest power state transition. The
|
|
Device Model running in the Service VM traps and emulates the power state
|
|
transition of the User VM (Linux VM or Android VM in
|
|
:numref:`overview-pm-block`). VM Manager knows all User VM power states and
|
|
notifies the OSPM of the Service VM once
|
|
the User VM is in the required power state.
|
|
|
|
Then the OSPM of the Service VM starts the power state transition of the Service VM
|
|
trapped to "Sx Agency" in ACRN, and it starts the power state
|
|
transition.
|
|
|
|
Some details about the ACPI table for the User VM and Service VM:
|
|
|
|
- The ACPI table in the User VM is emulated by the Device Model. The Device Model
|
|
knows which register the User VM writes to trigger power state
|
|
transitions. The Device Model must register an I/O handler for it.
|
|
|
|
- The ACPI table in the Service VM is passthrough. There is no ACPI parser
|
|
in ACRN HV. The power management related ACPI table is
|
|
generated offline and hard-coded in ACRN HV.
|