doc: add developer primer

Developer Primer and images, and a tweak to figure formatting also renamed from Hypervisor Primer to just Developer Primer since the doc talks about Device Model too. Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
2018-03-09 14:48:04 -08:00 · 2018-03-09 14:48:04 -08:00 · b9b20fa6a8
parent df5c261362
commit b9b20fa6a8
15 changed files with 912 additions and 26 deletions
--- a/doc/.gitignore
+++ b/doc/.gitignore
@ -1,4 +1,4 @@
 doxygen
 _build
-devicemodel
-hypervisor
+*.bak
+*.sav
--- a/doc/hypervisor_primer/index.rst
+++ b/doc/hypervisor_primer/index.rst
@ -1,23 +0,0 @@
-.. _hypervisor_primer:
-
-Hypervisor Developer Primer
-###########################
-
-This Developer Primer introduces the fundamental components and
-virtualization technology used by this open source reference hypervisor
-stack. Code level documentation and additional details can be found by
-consulting the :ref:`hypercall_apis` documentation and the source code
-in GitHub.
-
-The Hypervisor acts as a host with full control of the processor(s) and
-the hardware (physical memory, interrupt management and I/O). It
-provides the Guest OS with an abstraction of a virtual processor,
-allowing the guest to think it is executing directly on a logical
-processor.
-
-.. _source tree structure:
-
-Source Tree Structure
-*********************
-
-blah blah
--- a/doc/index.rst
+++ b/doc/index.rst
@ -27,7 +27,7 @@ Sections
   introduction/index.rst
   hardware.rst
   getting_started/index.rst
-   hypervisor_primer/index.rst
+   primer/index.rst
   release_notes.rst
   contribute.rst
   api/index.rst
--- a/doc/primer/images/primer-dma-address-mapping.png
+++ b/doc/primer/images/primer-dma-address-mapping.png
--- a/doc/primer/images/primer-host-gdt.png
+++ b/doc/primer/images/primer-host-gdt.png
--- a/doc/primer/images/primer-hypervisor-interrupt.png
+++ b/doc/primer/images/primer-hypervisor-interrupt.png
--- a/doc/primer/images/primer-mem-layout.png
+++ b/doc/primer/images/primer-mem-layout.png
--- a/doc/primer/images/primer-pirq-routing.png
+++ b/doc/primer/images/primer-pirq-routing.png
--- a/doc/primer/images/primer-pv-mapping.png
+++ b/doc/primer/images/primer-pv-mapping.png
--- a/doc/primer/images/primer-sos-ept-mapping.png
+++ b/doc/primer/images/primer-sos-ept-mapping.png
--- a/doc/primer/images/primer-symmetric-io.png
+++ b/doc/primer/images/primer-symmetric-io.png
--- a/doc/primer/images/primer-uos-ept-mapping.png
+++ b/doc/primer/images/primer-uos-ept-mapping.png
--- a/doc/primer/images/primer-virtio-net.png
+++ b/doc/primer/images/primer-virtio-net.png
--- a/doc/primer/index.rst
+++ b/doc/primer/index.rst
@ -0,0 +1,903 @@
+.. _primer:
+
+Developer Primer
+################
+
+This Developer Primer introduces the fundamental components of ACRN and
+the virtualization technology used by this open source reference stack.
+Code level documentation and additional details can be found by
+consulting the :ref:`acrn_apis` documentation and the `source code in
+GitHub`_.
+
+.. _source code in GitHub: https://github.com/projectacrn
+
+The ACRN Hypervisor acts as a host with full control of the processor(s)
+and the hardware (physical memory, interrupt management and I/O). It
+provides the User OS with an abstraction of a virtual platform, allowing
+the guest to behave as if were executing directly on a logical
+processor.
+
+.. _source tree structure:
+
+Source Tree Structure
+*********************
+
+Understanding the ACRN hypervisor and the ACRN device model source tree
+structure is helpful for locating the code associated with a particular
+hypervisor and device emulation feature. The ACRN hypervisor and the
+ACRN device model source tree provides the following top-level
+directories:
+
+ACRN hypervisor source tree
+===========================
+
+**arch/x86/**
+  hypervisor architecture, which includes arch x86 related source files
+  to run the hypervisor, such as CPU, memory, interrupt, and vmx.
+
+**boot/**
+  boot stuff mainly including ACPI related
+
+**bsp/**
+  board support package, used to support NUC with UEFI
+
+**common/**
+  common source files for hypervisor, which including VM hypercall
+  definition, VM main loop, and VM software loader
+
+**debug/**
+  all debug related source files, which will not be compiled for
+  release version, mainly including console, uart, logmsg and shell
+
+**include/**
+  include files for all public APIs (doxygen comments in these source
+  files are used to generate the :ref:`acrn_apis` documentation)
+
+**lib/doc/**
+  runtime service libraries
+
+ACRN Device Model source tree
+=============================
+
+**core/**
+  ACRN Device model core logic (main loop, SOS interface, etc.)
+
+**hw/**
+  Hardware emulation code, with the following subdirectories:
+
+  **acpi/**
+     ACPI table generator.
+
+  **pci/**
+     PCI devices, including VBS-Us (virtio backend drivers in user-space).
+
+  **platform/**
+     platform devices such as uart, and keyboard.
+
+**include/**
+  include files for all public APIs (doxygen comments in these source
+  files are used to generate the :ref:`acrn_apis` documentation)
+
+**samples/**
+  include files for all public APIs (doxygen comments in these source
+
+ACRN documentation source tree
+==============================
+
+Project ACRN documentation is written using the reStructuredText markup
+language (.rst file extension) with Sphinx extensions, and processed
+using Sphinx to create a formatted stand-alone website, (the one you're
+reading now.) Developers can view this content either in its raw form as
+.rst markup files in the acrn-documentation repo, or you can generate
+the HTML content and view it with a web browser directly on your
+workstation, useful if you're contributing documentation to the project.
+
+**api/**
+  ReST files for API document generation
+
+**custom-doxygen/**
+  Customization files for doxygen-generated html output (while
+  generated, we currently don't include the doxygen html output but do use
+  the XML output to feed into the Sphinx-generation process)
+
+**getting_started/**
+  ReST files and images for the Getting Started Guide
+
+**primer/**
+  ReST files and images for the Developer Primer
+
+**images/**
+  Image files not specific to a document (logos, and such)
+
+**introduction/**
+  ReST files and images for the Introduction to Project ACRN
+
+**scripts/**
+  Files used to assist building the documentation set
+
+**static/**
+  Sphinx folder for extras added to the generated output (such as custom
+  CSS additions)
+
+CPU virtualization
+******************
+
+The ACRN hypervisor uses static partitioning of the physical CPU cores,
+providing each User OS a virtualized environment containing at least one
+statically assigned physical CPU core. The CPUID features for a
+partitioned physical core is the same as the native CPU features. CPU
+power management (Cx/Px) is managed by the User OS.
+
+The supported Intel |reg| NUC platform (see :ref:`hardware`) has a CPU
+with four cores. The Service OS is assigned one core and the other three
+cores are assigned to the User OS. ``XSAVE`` and ``XRSTOR`` instructions
+(used to perform a full save/restore of the extended state in the
+processor to/from memory) are currently not supported in the User OS.
+(The kernel boot parameters must specify ``noxsave``). Processor core
+sharing among User OSes is planned for a future release.
+
+The following sections introduce CPU virtualization related
+concepts and technologies.
+
+Host GDT
+========
+
+The ACRN hypervisor initializes the host Global Descriptor Table (GDT),
+used to define the characteristics of the various memory areas during
+program execution. Code Segment ``CS:0x8`` and Data Segment ``DS:0x10``
+are configured as Hypervisor selectors, with their settings in host the
+GDT as shown in :numref:`host-gdt`:
+
+.. figure:: images/primer-host-gdt.png
+   :align: center
+   :name: host-gdt
+
+   Host GDT
+
+Host IDT
+========
+
+The ACRN hypervisor installs interrupt gates for both Exceptions and
+Vectors. That means exceptions and interrupts will automatically disable
+interrupts. The ``HOST_GDT_RING0_CODE_SEL`` is used in the Host IDT
+table.
+
+Guest SMP Booting
+=================
+
+The Bootstrap Processor (BSP) vCPU for the User OS boots into x64 long
+mode directly, while the Application Processors (AP) vCPU boots into
+real mode. The virtualized Local Advanced Programmable Interrupt
+Controller (vLAPIC) for the User OS in the hypervisor emulates the
+INIT/STARTUP signals.
+
+The AP vCPU belonging to the User OS begins in an infinite loop, waiting
+for an INIT signal.  Once the User OS issues a Startup IPI (SIPI) signal
+to another vCPU, the vLAPIC traps the request, resets the target vCPU,
+and then enters the ``INIT->STARTUP#1->STARTUP#2`` cycle to boot the
+vCPUs for the User OS.
+
+VMX configuration
+=================
+
+ACRN hypervisor has the Virtual Machine configuration (VMX) shown in
+:numref:`VMX_MSR` below. (These configuration settings may change in the future, according to
+virtualization policies.)
+
+.. table:: VMX Configuration
+   :align: center
+   :widths: auto
+   :name: VMX_MSR
+
+   +----------------------------------------+----------------+---------------------------------------+
+   | **VMX MSR**                            | **Bits**       | **Description**                       |
+   +========================================+================+=======================================+
+   | **MSR\_IA32\_VMX\_PINBASED\_CTLS**     | Bit0 set       | Enable External IRQ VM Exit           |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit6 set       | Enable HV pre-40ms Preemption timer   |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit7 clr       | Post interrupt did not support        |
+   +----------------------------------------+----------------+---------------------------------------+
+   | **MSR\_IA32\_VMX\_PROCBASED\_CTLS**    | Bit25 set      | Enable I/O bitmap                     |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit28 set      | Enable MSR bitmap                     |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit19,20 set   | Enable CR8 store/load                 |
+   +----------------------------------------+----------------+---------------------------------------+
+   | **MSR\_IA32\_VMX\_PROCBASED\_CTLS2**   | Bit1 set       | Enable EPT                            |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit7 set       | Allow guest real mode                 |
+   +----------------------------------------+----------------+---------------------------------------+
+   | **MSR\_IA32\_VMX\_EXIT\_CTLS**         | Bit15          | VMX Exit auto ack vector              |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit18,19       | MSR IA32\_PAT save/load               |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit20,21       | MSR IA32\_EFER save/load              |
+   +                                        +----------------+---------------------------------------+
+   |                                        | Bit9           | 64-bit mode after VM Exit             |
+   +----------------------------------------+----------------+---------------------------------------+
+
+
+CPUID and Guest TSC calibration
+===============================
+
+User OS access to CPUID will be trapped by ACRN hypervisor, however
+the ACRN hypervisor will pass through most of the native CPUID
+information to the guest, except the virtualized CPUID 0x1 (to
+provide fake x86_model).
+
+The Time Stamp Counter (TSC) is a 64-bit register present on all x86
+processors that counts the number of cycles since reset. ACRN hypervisor
+also virtualizes ``MSR_PLATFORM_INFO`` and ``MSR_ATOM_FSB_FREQ``.
+
+RDTSC/RDTSCP
+============
+
+User OS vCPU reads of ``RDTSC``, ``RDTSCP``, or ``MSR_IA32_TSC_AUX``
+will not make the VM Exit to the hypervisor. Thus the vCPUID provided by
+``MSR_IA32_TSC_AUX`` can be changed via the User OS.
+
+The ``RDTSCP`` instruction is widely used by the ACRN hypervisor to
+identify the current CPU (and read the current value of the processor's
+time-stamp counter). Because there is no VM Exit for
+``MSR_IA32_TSC_AUX`` msr register, the hypervisor will save and restore
+the ``MSR_IA32_TSC_AUX`` value on every VM Exit and Enter. Before the
+hypervisor restores the host CPU ID, we must not use a ``RDTSCP``
+instruction because it would return the vCPU ID instead of host CPU ID.
+
+CR Register virtualization
+==========================
+
+Guest CR8 access will make the VM Exit, and is emulated in the
+hypervisor for vLAPIC to update its PPR register. Guest access to CR3
+will not make the VM Exit.
+
+MSR BITMAP
+==========
+
+In the ACRN hypervisor, only these module-specific registers (MSR) are
+supported:
+
+**MSR_IA32_TSC_DEADLINE**
+  emulates Guest TSC timer program
+
+**MSR_PLATFORM_INFO**
+  emulates a fake X86 module
+
+**MSR_ATOM_FSB_FREQ**
+  provides the CPU frequency directly via this MSR to avoid TSC calibration
+
+I/O BITMAP
+==========
+
+All User OS I/O port accesses are trapped into the ACRN hypervisor by
+default. Most of the Service OS I/O port accesses are not trapped into
+the ACRN hypervisor, allowing the Service OS direct access to the
+hardware port.
+
+The Service OS I/O trap policy is:
+
+**0x3F8/0x3FC**
+  for emulated vUART inside hypervisor for SOS only, will be trapped
+
+**0x20/0xA0/0x460**
+  for vPIC emulation in hypervisor, will be trapped
+
+**0xCF8/0xCFC**
+  for hypervisor PCI device interception, will be trapped
+
+Exceptions
+==========
+
+The User OS handles its exceptions inside the VM, including page fault,
+GP, etc. A #MC and #DB exception causes a VM Exit to the ACRN hypervisor
+console.
+
+Memory virtualization
+*********************
+
+ACRN hypervisor provides memory virtualization by using a static
+partition of system memory. Each virtual machine owns its own contiguous
+partition of memory, with the Service OS staying in lower memory and the
+User OS instances in high memory. (High memory is memory which is not
+permanently mapped in the kernel address space, while Low Memory is
+always mapped, so you can access it in the kernel simply by
+dereferencing a pointer.) In future implementations, this will evolve to
+utilize EPT/VT-d.
+
+ACRN hypervisor memory is not visible to any User OS. In the ACRN
+hypervisor, there are a few memory accesses that need to work
+efficiently:
+
+- ACRN hypervisor to access host memory
+- vCPU per VM to access guest memory
+- vCPU per VM to access host memory
+- vCPU per VM to access MMIO memory
+
+The rest of this section introduces how these kinds of memory accesses
+are managed.  It gives an overview of physical memory layout,
+Paravirtualization (MMU) memory mapping in the hypervisor and VMs, and
+Host-Guest Extended Page Table (EPT) memory mapping for each VM.
+
+Physical Memory Layout
+======================
+
+The Physical Memory Layout Example for Service OS & User OS is shown in
+:numref:`primer-mem-layout` below:
+
+.. figure:: images/primer-mem-layout.png
+   :align: center
+   :name: primer-mem-layout
+
+   Memory Layout
+
+:numref:`primer-mem-layout` shows an example of physical memory layout
+of the Service and User OS. The Service OS accepts the whole e820 table
+(all usable memory address ranges not reserved for use by the BIOS)
+after filtering out the Hypervisor memory too. From the SOS's point of
+view, it takes control of all available physical memory, including User
+OS memory, not used by the hypervisor (or BIOS). Each User OSes memory
+is allocated from (High) SOS memory and the User OS only owns this
+section of memory control.
+
+Some of the physical memory of a 32-bit machine, needs to be sacrificed
+by making it hidden so memory-mapped I/O (MMIO) devices have room to
+communicate. This creates an MMIO hole for VMs to access some range of
+MMIO addresses directly for communicating to devices; or they may need
+the hypervisor to trap some range of MMIO to do device emulation. This
+access control is done through EPT mapping.
+
+PV (MMU) Memory Mapping in the Hypervisor
+=========================================
+
+.. figure:: images/primer-pv-mapping.png
+   :align: center
+   :name: primer-pv-mapping
+
+   ACRN Hypervisor PV Mapping Example
+
+The ACRN hypervisor is trusted and can access and control all system
+memory, as shown in :numref:`primer-pv-mapping`. Because the hypervisor
+is running in protected mode, an MMU page table must be prepared for its
+PV translation. To simplify things, the PV translation page table is set
+as a 1:1 mapping.  Some MMIO range mappings could be removed if they are
+not needed. This PV page table is created when the hypervisor memory is
+first initialized.
+
+PV (MMU) Memory Mapping in VMs
+==============================
+
+As mentioned earlier, the Primary vCPU starts to run in protected mode
+when its VM is started. But before it begins, a temporary PV (MMU) page
+table must be prepared..
+
+This page table is a 1:1 mapping for 4 Gb, and only lives for a short
+time when the vCPU first runs. After the vCPU starts to run its kernel
+image (for example Linux\*), the kernel will create its own PV page
+tables, after which, the temporary page table will be obsoleted.
+
+Host-Guest (EPT) Memory Mapping
+===============================
+
+The VMs (both SOS and UOS) need to create an Extended Page Table (EPT) to
+access the host physical memory based on its guest physical memory. The
+guest VMs also need to set an MMIO trap to trigger EPT violations for
+device emulation (such as IOAPIC, and LAPIC).  This memory layout is
+shown in :numref:`primer-sos-ept-mapping`:
+
+.. figure:: images/primer-sos-ept-mapping.png
+   :align: center
+   :name: primer-sos-ept-mapping
+
+   SOS EPT Mapping Example
+
+The SOS takes control of all the host physical memory space: its EPT
+mapping covers almost all of the host memory except that reserved for
+the hypervisor (HV) and a few MMIO trap ranges for IOAPIC & LAPIC
+emulation. The guest to host mapping for SOS is 1:1.
+
+.. figure:: images/primer-uos-ept-mapping.png
+   :align: center
+   :name: primer-uos-ept-mapping
+
+   UOS EPT Mapping Example
+
+However, for the UOS, its memory EPT mapping is linear but with an
+offset (as shown in :numref:`primer-uos-ept-mapping`).  The MMIO hole is
+not mapped to trap all MMIO accesses from the UOS (and do emulating in
+the device model). To support pass through devices in the future, some
+MMIO range mapping may be added.
+
+Graphic mediation
+*****************
+
+Intel |reg| Graphics Virtualization Technology –g (Intel |reg| GVT-g)
+provides GPU sharing capability to multiple VMs by using a mediated
+pass-through techniquer. This allows a VM to access performance critical
+I/O resources (usually partitioned) directly, without intervention from
+the hypervisor in most cases.
+
+Privileged operations from this VM are trap-and-emulated to provide
+secure isolation among VMs. The Hypervisor must ensure that no
+vulnerability is exposed when assigning performance-critical resource to
+each VM. When a performance-critical resource cannot be partitioned, a
+scheduler must be implemented (either in software or hardware) to allow
+time-based sharing among multiple VMs. In this case, the device must
+allow the hypervisor to save and restore the hardware state associated
+with the shared resource, either through direct I/O register read/write
+(when there is no software invisible state) or through a device-specific
+context save/restore mechanism (where there is a software invisible
+state).
+
+In the initial release of Project ACRN, graphic mediation is not
+enabled, and is planned for a future release.
+
+I/O emulation
+*************
+
+The I/O path is explained in the :ref:`ACRN-io-mediator` section of the
+:ref:`introduction`.  The following sections, provide additional device
+assignment management and PIO/MMIO trap flow introduction.
+
+Device Assignment Management
+============================
+
+ACRN hypervisor provides major device assignment management. Since the
+hypervisor owns all native vectors and IRQs, there must be a mapping
+table to handle the Guest IRQ/Vector to Host IRQ/Vector. Currently we
+assign all devices to VM0 except the UART.
+
+If a PCI device (with MSI/MSI-x) is assigned to Guest, the User OS will
+program the PCI config space and set the guest vector to this device. A
+Hypercall ``CWP_VM_PCI_MSIX_FIXUP`` is provided. Once the guest programs
+the guest vector, the User OS may call this hypercall to notify the ACRN
+hypervisor. The hypervisor allocates a host vector, creates a guest-host
+mapping relation, and replaces the guest vector with a real native
+vector for the device:
+
+**PCI MSI/MSI-X**
+  PCI Message Signalled Interrupts (MSI/MSX-x) from
+  devices can be triggered from a hypercall when a guest program
+  vectors. All PCI devices are programed with real vectors
+  allocated by the Hypervisor.
+
+**PCI/INTx**
+  Device assignment is triggered when the guest programs
+  the virtual Advanced I/O Programmable Interrupt Controller
+  (vIOAPC) Redirection Table Entries (RTE).
+
+**Legacy**
+  Legacy devices are assigned to VM0.
+
+User OS device assignment is similar to the above, except the User OS
+doesn't call hypercall. Instead, the Guest program PCI configuration
+space will be trapped into the Device Module, and Device Module may
+issue hypercall to notify hypervisor the guest vector is changing.
+
+Currently, there are two types of I/O Emulation supported: MMIO and
+PORTIO trap handling. MMIO emulation is triggered by an EPT violation
+VMExit only. If there is an EPT misconfiguration and VMExit occurs, the
+hypervisor will halt the system. (Because the hypervisor set up all EPT
+page table mapping at the beginning of the Guest boot, there should not
+be an EPT misconfiguration.)
+
+There are multiple places where I/O emulation can happen - in ACRN
+hypervisor, Service OS Kernel VHM module, or in the Service OS Userland
+ACRN Device Module.
+
+PIO/MMIO trap Flow
+==================
+
+Here is a description of the PIO/MMIO trap flow:
+
+1. Instruction decoder: get the Guest Physical Address (GPA) from VM
+   Exit, go through gla2gpa() page walker if necessary.
+
+2. Emulate the instruction. Here the hypervisor will have an address
+   range check to see if the hypervisor is interested in this IO
+   port or MMIO GPA access.
+
+3. Hypervisor emulates vLAPIC, vIOAPIC, vPIC, and vUART only (for
+   Service OS only). Any other emulation request are forwarded to
+   the SOS for handling. The vCPU raising the I/O request will
+   halt until this I/O request is processed successfully. An IPI will
+   send to vCPU0 of SOS to notify there is an I/O request waiting for
+   service.
+
+4. Service OS VHM module takes the I/O request and dispatches the request
+   to multiple clients. These clients could be SOS kernel space
+   VBS-K, MPT, or User-land Device model. VHM I/O request server
+   selects a default fallback client responsible to handle any I/O
+   request not handled by other clients. (The Device Manager is the
+   default fallback client.) Each client needs to register its I/O
+   range or specific PCI bus/device/function (BDF) numbers. If an I/O
+   request falls into the client range, the I/O request server will
+   send the request to that client.
+
+5. Multiple clients - fallback client (Device Model in user-land),
+   VBS-K client, MPT client.
+   Once the I/O request emulation completes, the client updates the
+   request status and notifies the hypervisor by a hypercall.
+   Hypervisor picks up that request, do any necessary cleanup,
+   and resume the Guest vCPU.
+
+Most I/O emulation tasks are done by the SOS CPU, and requests come from
+UOS vCPUs.
+
+Virtual interrupt
+*****************
+
+All interrupts received by the User OS comes from a virtual interrupt
+injected by a virtual vLAPIC, vIOAPIC, or vPIC. All device emulation is
+done inside the SOS Userspace device model. However for performance
+consideration, vLAPIC, vIOAPIC, and vPIC devices are emulated inside the
+ACRN hypervisor directly. From the guest point of view, vPIC uses
+Virtual Wire Mode via vIOAPIC.
+
+The symmetric I/O Mode is shown in :numref:`primer-symmetric-io`:
+
+.. figure:: images/primer-symmetric-io.png
+   :align: center
+   :name: primer-symmetric-io
+
+   Symmetric I/O Mode
+
+
+**Kernel boot param with vPIC**
+  add "maxcpu=0" to User OS to use PIC
+
+**Kernel boot param with vIOAPIC**
+  add "maxcpu=1" (as long as not "0") User OS will use IOAPIC. Keep
+  IOAPIC pin2 as source of PIC.
+
+Virtual LAPIC
+=============
+
+The LAPIC (Local Advanced Programmable interrupt Controller) is
+virtualized for SOS or UOS. The vLAPIC is currently emulated by a Guest
+MMIO trap to GPA address range: 0xFEE00000 - 0xFEE100000 (1MB). ACRN
+hypervisor will support APIC-v and Post interrupts in a future release.
+
+vLAPIC provides the same feature as a native LAPIC:
+
+-  Mask/Unmask vectors
+-  Inject virtual vectors (Level or Edge trigger mode) to vCPU
+-  Notify vIOAPIC of EOI processing
+-  Provide TSC Timer service
+-  vLAPIC support CR8 to update TPR
+-  INIT/STARTUP handling
+
+Virtual IOAPIC
+==============
+
+A vIOAPIC is emulated by the hypervisor when the Guest accesses MMIO GPA
+Range: 0xFEC00000 - 0xFEC01000. The vIOAPIC for the SOS will match the
+same pin numbers as the native HW IOAPIC. The vIOAPIC for UOS only
+provides 24 Pins. When a vIOAPIC PIN is asserted, the vIOAPIC calls
+vLAPIC APIs to inject the vector to the Guest.
+
+Virtual PIC
+===========
+
+A vPIC is required for TSC calculation. Normally the UOS boots with a
+vIOAPIC. A vPIC is a source of external interrupts to the Guest. On
+every VMExit, the hypervisor checks if there are pending external PIC
+interrupts.
+
+Virtual Interrupt Injection
+===========================
+
+The source of virtual interrupts comes from either the Device Module or
+from assigned devices:
+
+**SOS assigned devices**
+  As we assigned all devices to SOS directly whenever a devices'
+  physical interrupts come, we inject the corresponding virtual interrupts
+  to SOS via the vLAPIC/vIOAPIC.  In this case, the SOS doesn't use the
+  vPIC and does not have emulated devices.
+
+**UOS assigned devices**
+  Only PCI devices are assigned to UOS, and virtual interrupt injection
+  follows the same way as the SOS. A virtual interrupt injection operation
+  is triggered when a device's physical interrupt is triggered.
+
+**UOS emulated devices**
+  Device Module (user-land Device Model) is responsible for UOS emulated
+  devices' interrupt lifecycle management. The Device Model knows when an
+  emulated device needs to assert a virtual IOPAIC/PIC Pin or needs to
+  send a virtual MSI vector to the Guest. This logic is entirely handled
+  by the Device Model.
+
+:numref:`primer-hypervisor-interrupt` shows how the hypervisor handles
+interrupt processing and pending interrupts (acrn_do_intr_process):
+
+.. figure:: images/primer-hypervisor-interrupt.png
+   :align: center
+   :name: primer-hypervisor-interrupt
+
+   Hypervisor Interrupt handler
+
+There are many cases where the Guest RFLAG.IF is cleared and interrupts
+are disabled. The hypervisor will check if the Guest IRQ window is
+available before injection. NMI is unmasked interrupt injection
+regardless of existing guest IRQ window status. If the current IRQ
+windows is not available, hypervisor enables
+``MSR_IA32_VMX_PROCBASED_CTLS_IRQ_WIN`` (PROCBASED_CTRL.bit[2]) and
+VMEnter directly. The injection will be done on next VMExit once the
+Guest issues STI (GuestRFLAG.IF=1).
+
+VT-x and VT-d
+*************
+
+Since 2006, Intel CPUs have supported hardware assist - VT-x
+instructions, where the CPU itself traps specific guest instructions and
+register accesses directly into the VMM without need for binary
+translation (and modification) of the guest operating system. Guest
+operating systems can be run natively without modification, although it
+is common to still install virtualization-aware para-virtualized drivers
+into the guests to improve functionality. One common example is access
+to storage via emulated SCSI devices.
+
+Intel CPUs and chipsets support various Virtualization Technology (VT)
+features - such as VT-x and VT-d. Physical events on the platform
+trigger CPU **VM Exits** (a trap into the VMM) to handle physical
+events such as physical device interrupts,
+
+In the ACRN hypervisor design, VT-d can be used to do DMA Remapping,
+such as Address translation and Isolation.
+:numref:`primer-dma-address-mapping` is an example of address
+translation:
+
+.. figure:: images/primer-dma-address-mapping.png
+   :align: center
+   :name: primer-dma-address-mapping
+
+   DMA address mapping
+
+Hypercall
+*********
+
+ACRN hypervisor currently supports less than a dozen
+:ref:`hypercall_apis` and VHM upcall APIs to support the necessary VM
+management, IO request distribution and guest memory mappings. The
+hypervisor and Service OS (SOS) reserve vector 0xF4 for hypervisor
+notification to the SOS. This upcall is necessary whenever device
+emulation is required by the SOS.  The upcall vector 0xF4 is injected to
+SOS vCPU0.
+
+Refer to the :ref:`acrn_apis` documentation for details.
+
+Device emulation
+****************
+
+The ACRN Device Model emulates different kinds of platform devices, such as
+RTC, LPC, UART, PCI device, and Virtio block device. The most important
+thing about device emulation is to handle the I/O request from different
+devices. The I/O request could be PIO, MMIO, or PCI CFG SPACE access. For
+example:
+
+- a CMOS RTC device may access 0x70/0x71 PIO to get the CMOS time,
+- a GPU PCI device may access its MMIO or PIO BAR space to complete
+  its framebuffer rendering, or
+- the bootloader may access PCI devices' CFG
+  SPACE for BAR reprogramming.
+
+ACRN Device Model injects interrupts/MSIs to its frontend devices when
+necessary as well, for example, a RTC device needs to get its ALARM
+interrupt or a PCI device with MSI capability needs to get its MSI. The
+Data Model also provides a PIRQ routing mechanism for platform devices.
+
+Virtio Devices
+**************
+
+This section introduces the Virtio devices supported by ACRN.  Currently
+all the Back-end virtio drivers are implemented using the virtio APIs
+and the FE drivers are re-using Linux standard Front-end virtio drivers.
+
+Virtio-rnd
+=================
+
+The virtio-rnd entropy device supplies high-quality randomness for guest
+use. The virtio device ID of the virtio-rnd device is 4, and supports
+one virtqueue of 64 entries (configurable in the source code). No
+feature bits are defined.
+
+When the FE driver requires random bytes, the BE device places bytes of
+random data onto the virtqueue.
+
+To launch the virtio-rnd device, you can use the following command:
+
+.. code-block:: bash
+
+   ./acrn-dm -A -m 1168M \
+      -s 0:0,hostbridge \
+      -s 1,virtio-blk,./uos.img \
+      -s 2,virtio-rnd \
+      -k bzImage \
+      -B "root=/dev/vda rw rootwait noxsave maxcpus=0 nohpet \
+          console=hvc0 no_timer_check ignore_loglevel \
+          log_buf_len=16M consoleblank=0 tsc=reliable" vm1
+
+To verify the result in user OS side, you can use the following command:
+
+.. code-block:: bash
+
+   od /dev/random
+
+Virtio-blk
+==========
+
+The virtio-blk device is a simple virtual block device. The FE driver
+will place read, write, and other requests onto the virtqueue, so that
+the BE driver can process them accordingly.
+
+The virtio device ID of the virtio-blk is 2, and it supports one
+virtqueue with 64 entries, configurable in the source code. The feature
+bits supported by the BE device are as follows:
+
+**VTBLK\_F\_SEG\_MAX(bit 2)**
+  Maximum number of segments in a request is in seg_max.
+
+**VTBLK\_F\_BLK\_SIZE(bit 6)**
+  block size of disk is in blk\_size.
+
+**VTBLK\_F\_FLUSH(bit 9)**
+  cache flush command support.
+
+**VTBLK\_F\_TOPOLOGY(bit 10)**
+  device exports information on optimal I/O alignment.
+
+To use the virtio-blk device, use the following command:
+
+.. code-block:: bash
+
+   ./acrn-dm -A -m 1168M \
+      -s 0:0,hostbridge \
+      -s 1,virtio-blk,./uos.img** \
+      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
+         nohpet console=hvc0 no_timer_check ignore_loglevel \
+         log_buf_len=16M consoleblank=0 tsc=reliable" vm1
+
+To verify the result, you should expect the user OS to boot
+successfully.
+
+Virtio-net
+==========
+
+The virtio-net device is a virtual Ethernet device. The virtio device ID
+of the virtio-net is 1. The virtio-net device supports two virtqueues,
+one for transmitting packets and the other for receiving packets. The
+FE driver will place empty buffers onto one virtqueue for receiving
+packets, and enqueue outgoing packets onto the other virtqueue for
+transmission. Currently the size of each virtqueue is 1000, configurable
+in the source code.
+
+To access the external network from user OS, a L2 virtual switch should
+be created in the service OS, and the BE driver is bonded to a tap/tun
+device linking under the L2 virtual switch. See
+:numref:`primer-virtio-net`:
+
+.. figure:: images/primer-virtio-net.png
+   :align: center
+   :name: primer-virtio-net
+
+   Accessing external network from User OS
+
+Currently the feature bits supported by the BE device are:
+
+**VIRTIO\_NET\_F\_MAC(bit 5)**
+  device has given MAC adderss.
+
+**VIRTIO\_NET\_F\_MRG\_RXBUF(bit 15)**
+  BE driver can merge receive buffers.
+
+**VIRTIO\_NET\_F\_STATUS(bit 16)**
+  configuration status field is available.
+
+**VIRTIO\_F\_NOTIFY\_ON\_EMPTY(bit 24)**
+  device will issue an interrupt if it runs out of available
+  descriptors on a virtqueue.
+
+To enable the virtio-net device, use the following command:
+
+.. code-block:: bash
+
+   ./acrn-dm -A -m 1168M \
+      -s 0:0,hostbridge \
+      -s 1,virtio-blk,./uos.img \
+      -s 2,virtio-net,tap0 \
+      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
+         nohpet console=hvc0 no_timer_check ignore_loglevel \
+         log_buf_len=16M consoleblank=0 tsc=reliable" vm1
+
+To verify the correctness of the device, the external
+network should be accessible from the user OS.
+
+Virtio-console
+==============
+
+The virtio-console device is a simple device for data input and output.
+The virtio device ID of the virtio-console device is 3. A device could
+have from one to 16 ports. Each port has a pair of input and output
+virtqueues used to communicate information between the FE and BE
+drivers. Currently the size of each virtqueue is 64, configurable in the
+source code.
+
+Similar to virtio-net device, the two virtqueues specific to a port are
+for transmitting virtqueue and receiving virtqueue. The FE driver will
+place empty buffers onto the receiving virtqueue for incoming data, and
+enqueue outgoing characters onto transmitting virtqueue.
+
+Currently the feature bits supported by the BE device are:
+
+**VTCON\_F\_SIZE(bit 0)**
+  configuration columns and rows are valid.
+
+**VTCON\_F\_MULTIPORT(bit 1)**
+  device supports multiple ports, and control virtqueues will be used.
+
+**VTCON\_F\_EMERG\_WRITE(bit 2)**
+  device supports emergency write.
+
+Virtio-console supports redirecting guest output to various backend
+devices, including stdio/pty/tty. Users could follow the syntax below to
+specify which backend to use:
+
+.. code-block:: none
+
+   virtio-console,[@]stdio\|tty\|pty:portname[=portpath][,[@]stdio\|tty\|pty:portname[=portpath]]
+
+For example, to use stdio as a virtio-console backend, use the following
+command:
+
+.. code-block:: bash
+
+   ./acrn-dm -A -m 1168M \
+      -s 0:0,hostbridge \
+      -s 1,virtio-blk,./uos.img \
+      -s 3,virtio-console,@stdio:stdio\_port \
+      -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
+         nohpet console=hvc0 no_timer_check ignore_loglevel \
+         log_buf_len=16M consoleblank=0 tsc=reliable" vm1
+
+Then user could login into user OS:
+
+.. code-block:: bash
+
+   Ubuntu 17.04 xubuntu hvc0
+   xubuntu login: root
+   Password:
+
+To use pty as a virtio-console backend, use the following command:
+
+.. code-block:: bash
+
+   ./acrn-dm -A -m 1168M \
+      -s 0:0,hostbridge \
+      -s 1,virtio-blk,./uos.img \
+      -s 2,virtio-net,tap0 \
+      -s 3,virtio-console,@pty:pty\_port \
+      -k ./bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \
+         nohpet console=hvc0 no_timer_check ignore_loglevel \
+         log_buf_len=16M consoleblank=0 tsc=reliable" vm1 &
+
+When ACRN-DM boots User OS successfully, a similar log will be shown
+as below:
+
+.. code-block:: none
+
+   **************************************************************
+   virt-console backend redirected to /dev/pts/0
+   **************************************************************
+
+You can then use the following command to login the User OS:
+
+.. code-block:: bash
+
+   minicom -D /dev/pts/0
+
+or
+
+.. code-block:: bash
+
+   screen /dev/pts/0
--- a/doc/static/acrn-custom.css
+++ b/doc/static/acrn-custom.css
@ -18,11 +18,17 @@
    color: rgba(255,255,255,1);
 }

+/* add some space before the figure caption */
 p.caption  {
 #    border-top: 1px solid;
    margin-top: 1em;
 }

+/* add a colon after the figure/table number (before the caption) */
+span.caption-number::after {
+   content: ": ";
+}
+
 /*  make .. hlist:: tables fill the page */
 table.hlist {
    width: 95% !important;