acrn-hypervisor/doc/primer/index.rst

.. _primer:

Developer Primer
################

This Developer Primer introduces the fundamental components of ACRN and
the virtualization technology used by this open source reference stack.
Code level documentation and additional details can be found by
consulting the :ref:`acrn_apis` documentation and the `source code in
GitHub`_.

.. _source code in GitHub: https://github.com/projectacrn

The ACRN Hypervisor acts as a host with full control of the processor(s)
and the hardware (physical memory, interrupt management and I/O). It
provides the User OS with an abstraction of a virtual platform, allowing
the guest to behave as if were executing directly on a logical
processor.

.. _source tree structure:

Source Tree Structure
*********************

Understanding the ACRN hypervisor and the ACRN device model source tree
structure is helpful for locating the code associated with a particular
hypervisor and device emulation feature.

The ACRN source code (and documentation) are maintained in the
https://github.com/projectacrn/acrn-hypervisor repo, with the
hypervisor, device model, tools, and documentation in their own
folders::

   acrn-hypervisor
   ├─ hypervisor
   ├─ devicemodel
   ├─ tools
   └─ doc

Here's a brief description of each of these source tree folders:

ACRN hypervisor source tree
===========================

**arch/x86/**
  hypervisor architecture, which includes arch x86 related source files
  to run the hypervisor, such as CPU, memory, interrupt, and VMX.

**boot/**
  boot stuff mainly including ACPI related

**bsp/**
  board support package, used to support NUC with UEFI

**common/**
  common source files for hypervisor, which including VM hypercall
  definition, VM main loop, and VM software loader

**debug/**
  all debug related source files, which will not be compiled for
  release version, mainly including console, uart, logmsg and shell

**include/**
  include files for all public APIs (doxygen comments in these source
  files are used to generate the :ref:`acrn_apis` documentation)

**lib/**
  runtime service libraries

ACRN Device Model source tree
=============================

**arch/x86/**
  architecture-specific source files needed for the devicemodel

**core/**
  ACRN Device model core logic (main loop, SOS interface, etc.)

**hw/**
  Hardware emulation code, with the following subdirectories:

  **pci/**
     PCI devices, including VBS-Us (Virtio backend drivers in user-space).

  **platform/**
     platform devices such as uart, and keyboard.

**include/**
  include files for all public APIs (doxygen comments in these source
  files are used to generate the :ref:`acrn_apis` documentation)

**samples/**
  scripts (included in the Clear Linux build) for setting up the network
  and launching the User OS on the platform.

ACRN Tools source tree
=============================

The tools folder holds source code for ACRN-provided tools such as:

acrnlog
  a userland tool to capture the log output from the currently running
  hypervisor, and from the last previous run if the hypervisor crashed.

acrnctl
  a utility to create, delete, list, launch, and stop a User OS (UOS).

acrntrace
  a Service OS (SOS) utility to capture trace data and scripts to
  analyze the collected data.

ACRN documentation source tree
==============================

Project ACRN documentation is written using the reStructuredText markup
language (.rst file extension) with Sphinx extensions, and processed
using Sphinx to create a formatted stand-alone website, (the one you're
reading now.) Developers can view this content either in its raw form as
.rst markup files in the acrn-documentation repo, or you can generate
the HTML content and view it with a web browser directly on your
workstation, useful if you're contributing documentation to the project.

**api/**
  ReST files for API document generation

**custom-doxygen/**
  Customization files for doxygen-generated html output (while
  generated, we currently don't include the doxygen html output but do use
  the XML output to feed into the Sphinx-generation process)

**getting_started/**
  ReST files and images for the Getting Started Guide

**howtos/**
  ReST files and images for Technical and Process how-to articles

**images/**
  Image files not specific to a document (logos, and such)

**introduction/**
  ReST files and images for the Introduction to Project ACRN

**primer/**
  ReST files and images for the Developer Primer

**scripts/**
  Files used to assist building the documentation set

**static/**
  Sphinx folder for extras added to the generated output (such as custom
  CSS additions)

**_templates/**
  Sphinx configuration updates for the standard read-the-docs templates
  used to format the generated HTML output

CPU virtualization
******************

The ACRN hypervisor uses static partitioning of the physical CPU cores,
providing each User OS a virtualized environment containing at least one
statically assigned physical CPU core. The CPUID features for a
partitioned physical core is the same as the native CPU features. CPU
power management (Cx/Px) is managed by the User OS.

The supported Intel |reg| NUC platform (see :ref:`hardware`) has a CPU
with four cores. The Service OS is assigned one core and the other three
cores are assigned to the User OS. ``XSAVE`` and ``XRSTOR`` instructions
(used to perform a full save/restore of the extended state in the
processor to/from memory) are currently not supported in the User OS.
(The kernel boot parameters must specify ``noxsave``). Processor core
sharing among User OSes is planned for a future release.

The following sections introduce CPU virtualization related
concepts and technologies.

Host GDT
========

The ACRN hypervisor initializes the host Global Descriptor Table (GDT),
used to define the characteristics of the various memory areas during
program execution. Code Segment ``CS:0x8`` and Data Segment ``DS:0x10``
are configured as Hypervisor selectors, with their settings in host the
GDT as shown in :numref:`host-gdt`:

.. figure:: images/primer-host-gdt.png
   :align: center
   :name: host-gdt

   Host GDT

Host IDT
========

The ACRN hypervisor installs interrupt gates for both Exceptions and
Vectors. That means exceptions and interrupts will automatically disable
interrupts. The ``HOST_GDT_RING0_CODE_SEL`` is used in the Host IDT
table.

Guest SMP Booting
=================

The Bootstrap Processor (BSP) vCPU for the User OS boots into x64 long
mode directly, while the Application Processors (AP) vCPU boots into
real mode. The virtualized Local Advanced Programmable Interrupt
Controller (vLAPIC) for the User OS in the hypervisor emulates the
INIT/STARTUP signals.

The AP vCPU belonging to the User OS begins in an infinite loop, waiting
for an INIT signal.  Once the User OS issues a Startup IPI (SIPI) signal
to another vCPU, the vLAPIC traps the request, resets the target vCPU,
and then enters the ``INIT->STARTUP#1->STARTUP#2`` cycle to boot the
vCPUs for the User OS.

VMX configuration
=================

ACRN hypervisor has the Virtual Machine configuration (VMX) shown in
:numref:`VMX_MSR` below. (These configuration settings may change in the future, according to
virtualization policies.)

.. table:: VMX Configuration
   :align: center
   :widths: auto
   :name: VMX_MSR

   +----------------------------------------+----------------+---------------------------------------+
   | **VMX MSR**                            | **Bits**       | **Description**                       |
   +========================================+================+=======================================+
   | **MSR\_IA32\_VMX\_PINBASED\_CTLS**     | Bit0 set       | Enable External IRQ VM Exit           |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit6 set       | Enable HV pre-40ms Preemption timer   |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit7 clr       | Post interrupt did not support        |
   +----------------------------------------+----------------+---------------------------------------+
   | **MSR\_IA32\_VMX\_PROCBASED\_CTLS**    | Bit25 set      | Enable I/O bitmap                     |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit28 set      | Enable MSR bitmap                     |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit19,20 set   | Enable CR8 store/load                 |
   +----------------------------------------+----------------+---------------------------------------+
   | **MSR\_IA32\_VMX\_PROCBASED\_CTLS2**   | Bit1 set       | Enable EPT                            |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit7 set       | Allow guest real mode                 |
   +----------------------------------------+----------------+---------------------------------------+
   | **MSR\_IA32\_VMX\_EXIT\_CTLS**         | Bit15          | VMX Exit auto ack vector              |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit18,19       | MSR IA32\_PAT save/load               |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit20,21       | MSR IA32\_EFER save/load              |
   +                                        +----------------+---------------------------------------+
   |                                        | Bit9           | 64-bit mode after VM Exit             |
   +----------------------------------------+----------------+---------------------------------------+


CPUID and Guest TSC calibration
===============================

User OS access to CPUID will be trapped by ACRN hypervisor, however
the ACRN hypervisor will pass through most of the native CPUID
information to the guest, except the virtualized CPUID 0x1 (to
provide fake x86_model).

The Time Stamp Counter (TSC) is a 64-bit register present on all x86
processors that counts the number of cycles since reset. ACRN hypervisor
also virtualizes ``MSR_PLATFORM_INFO`` and ``MSR_ATOM_FSB_FREQ``.

RDTSC/RDTSCP
============

User OS vCPU reads of ``RDTSC``, ``RDTSCP``, or ``MSR_IA32_TSC_AUX``
will not make the VM Exit to the hypervisor. Thus the vCPUID provided by
``MSR_IA32_TSC_AUX`` can be changed via the User OS.

The ``RDTSCP`` instruction is widely used by the ACRN hypervisor to
identify the current CPU (and read the current value of the processor's
time-stamp counter). Because there is no VM Exit for
``MSR_IA32_TSC_AUX`` msr register, the hypervisor will save and restore
the ``MSR_IA32_TSC_AUX`` value on every VM Exit and Enter. Before the
hypervisor restores the host CPU ID, we must not use a ``RDTSCP``
instruction because it would return the vCPU ID instead of host CPU ID.

CR Register virtualization
==========================

Guest CR8 access will make the VM Exit, and is emulated in the
hypervisor for vLAPIC to update its PPR register. Guest access to CR3
will not make the VM Exit.

MSR BITMAP
==========

In the ACRN hypervisor, only these module-specific registers (MSR) are
supported:

**MSR_IA32_TSC_DEADLINE**
  emulates Guest TSC timer program

**MSR_PLATFORM_INFO**
  emulates a fake X86 module

**MSR_ATOM_FSB_FREQ**
  provides the CPU frequency directly via this MSR to avoid TSC calibration

I/O BITMAP
==========

All User OS I/O port accesses are trapped into the ACRN hypervisor by
default. Most of the Service OS I/O port accesses are not trapped into
the ACRN hypervisor, allowing the Service OS direct access to the
hardware port.

The Service OS I/O trap policy is:

**0x3F8/0x3FC**
  for emulated vUART inside hypervisor for SOS only, will be trapped

**0x20/0xA0/0x460**
  for vPIC emulation in hypervisor, will be trapped

**0xCF8/0xCFC**
  for hypervisor PCI device interception, will be trapped

Exceptions
==========

The User OS handles its exceptions inside the VM, including page fault,
GP, etc. A #MC and #DB exception causes a VM Exit to the ACRN hypervisor
console.

Memory virtualization
*********************

ACRN hypervisor provides memory virtualization by using a static
partition of system memory. Each virtual machine owns its own contiguous
partition of memory, with the Service OS staying in lower memory and the
User OS instances in high memory. (High memory is memory which is not
permanently mapped in the kernel address space, while Low Memory is
always mapped, so you can access it in the kernel simply by
dereferencing a pointer.) In future implementations, this will evolve to
utilize EPT/VT-d.

ACRN hypervisor memory is not visible to any User OS. In the ACRN
hypervisor, there are a few memory accesses that need to work
efficiently:

- ACRN hypervisor to access host memory
- vCPU per VM to access guest memory
- vCPU per VM to access host memory
- vCPU per VM to access MMIO memory

The rest of this section introduces how these kinds of memory accesses
are managed.  It gives an overview of physical memory layout,
Paravirtualization (MMU) memory mapping in the hypervisor and VMs, and
Host-Guest Extended Page Table (EPT) memory mapping for each VM.

Physical Memory Layout
======================

The Physical Memory Layout Example for Service OS & User OS is shown in
:numref:`primer-mem-layout` below:

.. figure:: images/primer-mem-layout.png
   :align: center
   :name: primer-mem-layout

   Memory Layout

:numref:`primer-mem-layout` shows an example of physical memory layout
of the Service and User OS. The Service OS accepts the whole e820 table
(all usable memory address ranges not reserved for use by the BIOS)
after filtering out the Hypervisor memory too. From the SOS's point of
view, it takes control of all available physical memory, including User
OS memory, not used by the hypervisor (or BIOS). Each User OSes memory
is allocated from (High) SOS memory and the User OS only owns this
section of memory control.

Some of the physical memory of a 32-bit machine, needs to be sacrificed
by making it hidden so memory-mapped I/O (MMIO) devices have room to
communicate. This creates an MMIO hole for VMs to access some range of
MMIO addresses directly for communicating to devices; or they may need
the hypervisor to trap some range of MMIO to do device emulation. This
access control is done through EPT mapping.

PV (MMU) Memory Mapping in the Hypervisor
=========================================

.. figure:: images/primer-pv-mapping.png
   :align: center
   :name: primer-pv-mapping

   ACRN Hypervisor PV Mapping Example

The ACRN hypervisor is trusted and can access and control all system
memory, as shown in :numref:`primer-pv-mapping`. Because the hypervisor
is running in protected mode, an MMU page table must be prepared for its
PV translation. To simplify things, the PV translation page table is set
as a 1:1 mapping.  Some MMIO range mappings could be removed if they are
not needed. This PV page table is created when the hypervisor memory is
first initialized.

PV (MMU) Memory Mapping in VMs
==============================

As mentioned earlier, the Primary vCPU starts to run in protected mode
when its VM is started. But before it begins, a temporary PV (MMU) page
table must be prepared..

This page table is a 1:1 mapping for 4 Gb, and only lives for a short
time when the vCPU first runs. After the vCPU starts to run its kernel
image (for example Linux\*), the kernel will create its own PV page
tables, after which, the temporary page table will be obsoleted.

Host-Guest (EPT) Memory Mapping
===============================

The VMs (both SOS and UOS) need to create an Extended Page Table (EPT) to
access the host physical memory based on its guest physical memory. The
guest VMs also need to set an MMIO trap to trigger EPT violations for
device emulation (such as IOAPIC, and LAPIC).  This memory layout is
shown in :numref:`primer-sos-ept-mapping`:

.. figure:: images/primer-sos-ept-mapping.png
   :align: center
   :name: primer-sos-ept-mapping

   SOS EPT Mapping Example

The SOS takes control of all the host physical memory space: its EPT
mapping covers almost all of the host memory except that reserved for
the hypervisor (HV) and a few MMIO trap ranges for IOAPIC & LAPIC
emulation. The guest to host mapping for SOS is 1:1.

.. figure:: images/primer-uos-ept-mapping.png
   :align: center
   :name: primer-uos-ept-mapping

   UOS EPT Mapping Example

However, for the UOS, its memory EPT mapping is linear but with an
offset (as shown in :numref:`primer-uos-ept-mapping`).  The MMIO hole is
not mapped to trap all MMIO accesses from the UOS (and do emulating in
the device model). To support pass through devices in the future, some
MMIO range mapping may be added.

Graphic mediation
*****************

Intel |reg| Graphics Virtualization Technology –g (Intel |reg| GVT-g)
provides GPU sharing capability to multiple VMs by using a mediated
pass-through technique. This allows a VM to access performance critical
I/O resources (usually partitioned) directly, without intervention from
the hypervisor in most cases.

Privileged operations from this VM are trap-and-emulated to provide
secure isolation among VMs. The Hypervisor must ensure that no
vulnerability is exposed when assigning performance-critical resource to
each VM. When a performance-critical resource cannot be partitioned, a
scheduler must be implemented (either in software or hardware) to allow
time-based sharing among multiple VMs. In this case, the device must
allow the hypervisor to save and restore the hardware state associated
with the shared resource, either through direct I/O register read/write
(when there is no software invisible state) or through a device-specific
context save/restore mechanism (where there is a software invisible
state).

In the initial release of Project ACRN, graphic mediation is not
enabled, and is planned for a future release.

I/O emulation
*************

The I/O path is explained in the :ref:`ACRN-io-mediator` section of the
:ref:`introduction`.  The following sections, provide additional device
assignment management and PIO/MMIO trap flow introduction.

Device Assignment Management
============================

ACRN hypervisor provides major device assignment management. Since the
hypervisor owns all native vectors and IRQs, there must be a mapping
table to handle the Guest IRQ/Vector to Host IRQ/Vector. Currently we
assign all devices to VM0 except the UART.

If a PCI device (with MSI/MSI-x) is assigned to Guest, the User OS will
program the PCI config space and set the guest vector to this device. A
Hypercall ``CWP_VM_PCI_MSIX_FIXUP`` is provided. Once the guest programs
the guest vector, the User OS may call this hypercall to notify the ACRN
hypervisor. The hypervisor allocates a host vector, creates a guest-host
mapping relation, and replaces the guest vector with a real native
vector for the device:

**PCI MSI/MSI-X**
  PCI Message Signaled Interrupts (MSI/MSX-x) from
  devices can be triggered from a hypercall when a guest program
  vectors. All PCI devices are programed with real vectors
  allocated by the Hypervisor.

**PCI/INTx**
  Device assignment is triggered when the guest programs
  the virtual Advanced I/O Programmable Interrupt Controller
  (vIOAPC) Redirection Table Entries (RTE).

**Legacy**
  Legacy devices are assigned to VM0.

User OS device assignment is similar to the above, except the User OS
doesn't call hypercall. Instead, the Guest program PCI configuration
space will be trapped into the Device Module, and Device Module may
issue hypercall to notify hypervisor the guest vector is changing.

Currently, there are two types of I/O Emulation supported: MMIO and
PORTIO trap handling. MMIO emulation is triggered by an EPT violation
VMExit only. If there is an EPT misconfiguration and VMExit occurs, the
hypervisor will halt the system. (Because the hypervisor set up all EPT
page table mapping at the beginning of the Guest boot, there should not
be an EPT misconfiguration.)

There are multiple places where I/O emulation can happen - in ACRN
hypervisor, Service OS Kernel VHM module, or in the Service OS Userland
ACRN Device Module.

PIO/MMIO trap Flow
==================

Here is a description of the PIO/MMIO trap flow:

#. Instruction decoder: get the Guest Physical Address (GPA) from VM
   Exit, go through gla2gpa() page walker if necessary.

#. Emulate the instruction. Here the hypervisor will have an address
   range check to see if the hypervisor is interested in this IO
   port or MMIO GPA access.

#. Hypervisor emulates vLAPIC, vIOAPIC, vPIC, and vUART only (for
   Service OS only). Any other emulation request are forwarded to
   the SOS for handling. The vCPU raising the I/O request will
   halt until this I/O request is processed successfully. An IPI will
   send to vCPU0 of SOS to notify there is an I/O request waiting for
   service.

#. Service OS VHM module takes the I/O request and dispatches the request
   to multiple clients. These clients could be SOS kernel space
   VBS-K, MPT, or User-land Device model. VHM I/O request server
   selects a default fallback client responsible to handle any I/O
   request not handled by other clients. (The Device Manager is the
   default fallback client.) Each client needs to register its I/O
   range or specific PCI bus/device/function (BDF) numbers. If an I/O
   request falls into the client range, the I/O request server will
   send the request to that client.

#. Multiple clients - fallback client (Device Model in user-land),
   VBS-K client, MPT client.
   Once the I/O request emulation completes, the client updates the
   request status and notifies the hypervisor by a hypercall.
   Hypervisor picks up that request, do any necessary cleanup,
   and resume the Guest vCPU.

Most I/O emulation tasks are done by the SOS CPU, and requests come from
UOS vCPUs.

Virtual interrupt
*****************

All interrupts received by the User OS comes from a virtual interrupt
injected by a virtual vLAPIC, vIOAPIC, or vPIC. All device emulation is
done inside the SOS Userspace device model. However for performance
consideration, vLAPIC, vIOAPIC, and vPIC devices are emulated inside the
ACRN hypervisor directly. From the guest point of view, vPIC uses
Virtual Wire Mode via vIOAPIC.

The symmetric I/O Mode is shown in :numref:`primer-symmetric-io`:

.. figure:: images/primer-symmetric-io.png
   :align: center
   :name: primer-symmetric-io

   Symmetric I/O Mode


**Kernel boot param with vPIC**
  add "maxcpu=0" to User OS to use PIC

**Kernel boot param with vIOAPIC**
  add "maxcpu=1" (as long as not "0") User OS will use IOAPIC. Keep
  IOAPIC pin2 as source of PIC.

Virtual LAPIC
=============

The LAPIC (Local Advanced Programmable interrupt Controller) is
virtualized for SOS or UOS. The vLAPIC is currently emulated by a Guest
MMIO trap to GPA address range: 0xFEE00000 - 0xFEE100000 (1MB). ACRN
hypervisor will support APIC-v and Post interrupts in a future release.

vLAPIC provides the same feature as a native LAPIC:

- Mask/Unmask vectors
- Inject virtual vectors (Level or Edge trigger mode) to vCPU
- Notify vIOAPIC of EOI processing
- Provide TSC Timer service
- vLAPIC support CR8 to update TPR
- INIT/STARTUP handling

Virtual IOAPIC
==============

A vIOAPIC is emulated by the hypervisor when the Guest accesses MMIO GPA
Range: 0xFEC00000 - 0xFEC01000. The vIOAPIC for the SOS will match the
same pin numbers as the native HW IOAPIC. The vIOAPIC for UOS only
provides 24 Pins. When a vIOAPIC PIN is asserted, the vIOAPIC calls
vLAPIC APIs to inject the vector to the Guest.

Virtual PIC
===========

A vPIC is required for TSC calculation. Normally the UOS boots with a
vIOAPIC. A vPIC is a source of external interrupts to the Guest. On
every VMExit, the hypervisor checks if there are pending external PIC
interrupts.

Virtual Interrupt Injection
===========================

The source of virtual interrupts comes from either the Device Module or
from assigned devices:

**SOS assigned devices**
  As we assigned all devices to SOS directly whenever a devices'
  physical interrupts come, we inject the corresponding virtual interrupts
  to SOS via the vLAPIC/vIOAPIC.  In this case, the SOS doesn't use the
  vPIC and does not have emulated devices.

**UOS assigned devices**
  Only PCI devices are assigned to UOS, and virtual interrupt injection
  follows the same way as the SOS. A virtual interrupt injection operation
  is triggered when a device's physical interrupt is triggered.

**UOS emulated devices**
  Device Module (user-land Device Model) is responsible for UOS emulated
  devices' interrupt lifecycle management. The Device Model knows when an
  emulated device needs to assert a virtual IOPAIC/PIC Pin or needs to
  send a virtual MSI vector to the Guest. This logic is entirely handled
  by the Device Model.

:numref:`primer-hypervisor-interrupt` shows how the hypervisor handles
interrupt processing and pending interrupts (acrn_do_intr_process):

.. figure:: images/primer-hypervisor-interrupt.png
   :align: center
   :name: primer-hypervisor-interrupt

   Hypervisor Interrupt handler

There are many cases where the Guest RFLAG.IF is cleared and interrupts
are disabled. The hypervisor will check if the Guest IRQ window is
available before injection. NMI is unmasked interrupt injection
regardless of existing guest IRQ window status. If the current IRQ
windows is not available, hypervisor enables
``MSR_IA32_VMX_PROCBASED_CTLS_IRQ_WIN`` (PROCBASED_CTRL.bit[2]) and
VMEnter directly. The injection will be done on next VMExit once the
Guest issues STI (GuestRFLAG.IF=1).

VT-x and VT-d
*************

Since 2006, Intel CPUs have supported hardware assist - VT-x
instructions, where the CPU itself traps specific guest instructions and
register accesses directly into the VMM without need for binary
translation (and modification) of the guest operating system. Guest
operating systems can be run natively without modification, although it
is common to still install virtualization-aware para-virtualized drivers
into the guests to improve functionality. One common example is access
to storage via emulated SCSI devices.

Intel CPUs and chipsets support various Virtualization Technology (VT)
features - such as VT-x and VT-d. Physical events on the platform
trigger CPU **VM Exits** (a trap into the VMM) to handle physical
events such as physical device interrupts,

In the ACRN hypervisor design, VT-d can be used to do DMA Remapping,
such as Address translation and Isolation.
:numref:`primer-dma-address-mapping` is an example of address
translation:

.. figure:: images/primer-dma-address-mapping.png
   :align: center
   :name: primer-dma-address-mapping

   DMA address mapping

Hypercall
*********

ACRN hypervisor currently supports less than a dozen
:ref:`hypercall_apis` and VHM upcall APIs to support the necessary VM
management, IO request distribution and guest memory mappings. The
hypervisor and Service OS (SOS) reserve vector 0xF4 for hypervisor
notification to the SOS. This upcall is necessary whenever device
emulation is required by the SOS.  The upcall vector 0xF4 is injected to
SOS vCPU0.

Refer to the :ref:`acrn_apis` documentation for details.

Device emulation
****************

The ACRN Device Model emulates different kinds of platform devices, such as
RTC, LPC, UART, PCI device, and Virtio block device. The most important
thing about device emulation is to handle the I/O request from different
devices. The I/O request could be PIO, MMIO, or PCI CFG SPACE access. For
example:

- a CMOS RTC device may access 0x70/0x71 PIO to get the CMOS time,
- a GPU PCI device may access its MMIO or PIO BAR space to complete
  its frame buffer rendering, or
- the bootloader may access PCI devices' CFG
  SPACE for BAR reprogramming.

ACRN Device Model injects interrupts/MSIs to its frontend devices when
necessary as well, for example, a RTC device needs to get its ALARM
interrupt or a PCI device with MSI capability needs to get its MSI. The
Data Model also provides a PIRQ routing mechanism for platform devices.

Virtio Devices
**************

This section introduces the Virtio devices supported by ACRN.  Currently
all the Back-end Virtio drivers are implemented using the Virtio APIs
and the FE drivers are re-using Linux standard Front-end Virtio drivers.

Virtio-rnd
==========

The Virtio-rnd entropy device supplies high-quality randomness for guest
use. The Virtio device ID of the Virtio-rnd device is 4, and supports
one virtqueue of 64 entries (configurable in the source code). No
feature bits are defined.

When the FE driver requires random bytes, the BE device places bytes of
random data onto the virtqueue.

To launch the Virtio-rnd device, you can use the following command:

.. code-block:: bash

   ./acrn-dm -A -m 1168M \
      -s 0:0,hostbridge \
      -s 1,virtio-blk,./uos.img \
      -s 2,virtio-rnd \
      -k bzImage \
      -B "root=/dev/vda rw rootwait noxsave maxcpus=0 nohpet \
          console=hvc0 no_timer_check ignore_loglevel \
          log_buf_len=16M consoleblank=0 tsc=reliable" vm1

To verify the result in user OS side, you can use the following command:

.. code-block:: bash

   od /dev/random

Virtio-blk
==========

The Virtio-blk device is a simple virtual block device. The FE driver
will place read, write, and other requests onto the virtqueue, so that
the BE driver can process them accordingly.

The Virtio device ID of the Virtio-blk is 2, and it supports one
virtqueue with 64 entries, configurable in the source code. The feature
bits supported by the BE device are as follows:

**VTBLK\_F\_SEG\_MAX(bit 2)**
  Maximum number of segments in a request is in seg_max.

**VTBLK\_F\_BLK\_SIZE(bit 6)**
  block size of disk is in blk\_size.

**VTBLK\_F\_FLUSH(bit 9)**
  cache flush command support.

**VTBLK\_F\_TOPOLOGY(bit 10)**
  device exports information on optimal I/O alignment.

To use the Virtio-blk device, use the following command:

.. code-block:: bash

   ./acrn-dm -A -m 1168M \
      -s 0:0,hostbridge \
      -s 1,virtio-blk,./uos.img** \
      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
         nohpet console=hvc0 no_timer_check ignore_loglevel \
         log_buf_len=16M consoleblank=0 tsc=reliable" vm1

To verify the result, you should expect the user OS to boot
successfully.

Virtio-net
==========

The Virtio-net device is a virtual Ethernet device. The Virtio device ID
of the Virtio-net is 1. The Virtio-net device supports two virtqueues,
one for transmitting packets and the other for receiving packets. The
FE driver will place empty buffers onto one virtqueue for receiving
packets, and enqueue outgoing packets onto the other virtqueue for
transmission. Currently the size of each virtqueue is 1000, configurable
in the source code.

To access the external network from user OS, a L2 virtual switch should
be created in the service OS, and the BE driver is bonded to a tap/tun
device linking under the L2 virtual switch. See
:numref:`primer-virtio-net`:

.. figure:: images/primer-virtio-net.png
   :align: center
   :name: primer-virtio-net

   Accessing external network from User OS

Currently the feature bits supported by the BE device are:

**VIRTIO\_NET\_F\_MAC(bit 5)**
  device has given MAC address.

**VIRTIO\_NET\_F\_MRG\_RXBUF(bit 15)**
  BE driver can merge receive buffers.

**VIRTIO\_NET\_F\_STATUS(bit 16)**
  configuration status field is available.

**VIRTIO\_F\_NOTIFY\_ON\_EMPTY(bit 24)**
  device will issue an interrupt if it runs out of available
  descriptors on a virtqueue.

To enable the Virtio-net device, use the following command:

.. code-block:: bash

   ./acrn-dm -A -m 1168M \
      -s 0:0,hostbridge \
      -s 1,virtio-blk,./uos.img \
      -s 2,virtio-net,tap0 \
      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
         nohpet console=hvc0 no_timer_check ignore_loglevel \
         log_buf_len=16M consoleblank=0 tsc=reliable" vm1

To verify the correctness of the device, the external
network should be accessible from the user OS.

Virtio-console
==============

The Virtio-console device is a simple device for data input and output.
The Virtio device ID of the Virtio-console device is 3. A device could
have from one to 16 ports. Each port has a pair of input and output
virtqueues used to communicate information between the FE and BE
drivers. Currently the size of each virtqueue is 64, configurable in the
source code.

Similar to Virtio-net device, the two virtqueues specific to a port are
for transmitting virtqueue and receiving virtqueue. The FE driver will
place empty buffers onto the receiving virtqueue for incoming data, and
enqueue outgoing characters onto transmitting virtqueue.

Currently the feature bits supported by the BE device are:

**VTCON\_F\_SIZE(bit 0)**
  configuration columns and rows are valid.

**VTCON\_F\_MULTIPORT(bit 1)**
  device supports multiple ports, and control virtqueues will be used.

**VTCON\_F\_EMERG\_WRITE(bit 2)**
  device supports emergency write.

Virtio-console supports redirecting guest output to various backend
devices, including stdio/pty/tty. Users could follow the syntax below to
specify which backend to use:

.. code-block:: none

   virtio-console,[@]stdio\|tty\|pty:portname[=portpath][,[@]stdio\|tty\|pty:portname[=portpath]]

For example, to use stdio as a Virtio-console backend, use the following
command:

.. code-block:: bash

   ./acrn-dm -A -m 1168M \
      -s 0:0,hostbridge \
      -s 1,virtio-blk,./uos.img \
      -s 3,virtio-console,@stdio:stdio\_port \
      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
         nohpet console=hvc0 no_timer_check ignore_loglevel \
         log_buf_len=16M consoleblank=0 tsc=reliable" vm1

Then user could login into user OS:

.. code-block:: bash

   Ubuntu 17.04 xubuntu hvc0
   xubuntu login: root
   Password:

To use pty as a virtio-console backend, use the following command:

.. code-block:: bash

   ./acrn-dm -A -m 1168M \
      -s 0:0,hostbridge \
      -s 1,virtio-blk,./uos.img \
      -s 2,virtio-net,tap0 \
      -s 3,virtio-console,@pty:pty\_port \
      -k ./bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
         nohpet console=hvc0 no_timer_check ignore_loglevel \
         log_buf_len=16M consoleblank=0 tsc=reliable" vm1 &

When ACRN-DM boots User OS successfully, a similar log will be shown
as below:

.. code-block:: none

   **************************************************************
   virt-console backend redirected to /dev/pts/0
   **************************************************************

You can then use the following command to login the User OS:

.. code-block:: bash

   minicom -D /dev/pts/0

or

.. code-block:: bash

   screen /dev/pts/0