254 lines
11 KiB
ReStructuredText
254 lines
11 KiB
ReStructuredText
.. _hv-startup:
|
|
|
|
Hypervisor Startup
|
|
##################
|
|
|
|
This section is an overview of the ACRN hypervisor startup.
|
|
The ACRN hypervisor
|
|
compiles to a 32-bit multiboot-compliant ELF file.
|
|
The bootloader (ABL/SBL or GRUB) loads the hypervisor according to the
|
|
addresses specified in the ELF header. The BSP starts the hypervisor
|
|
with an initial state compliant to multiboot 1 specification, after the
|
|
bootloader prepares full configurations including ACPI, E820, etc.
|
|
|
|
The HV startup has two parts: the native startup followed by
|
|
VM startup.
|
|
|
|
Multiboot Header
|
|
****************
|
|
|
|
The ACRN hypervisor is built with a multiboot header, which presents
|
|
``MULTIBOOT_HEADER_MAGIC`` and ``MULTIBOOT_HEADER_FLAGS`` at the beginning
|
|
of the image, and it sets bit 6 in ``MULTIBOOT_HEADER_FLAGS`` which requests
|
|
bootloader passing memory mmap information(like e820 entries) through
|
|
Multiboot Information(MBI) structure.
|
|
|
|
Native Startup
|
|
**************
|
|
|
|
.. figure:: images/hld-image107.png
|
|
:align: center
|
|
:name: hvstart-nativeflow
|
|
|
|
Hypervisor Native Startup Flow
|
|
|
|
Native startup sets up a baseline environment for HV, including basic
|
|
memory and interrupt initialization as shown in
|
|
:numref:`hvstart-nativeflow`. Here is a short
|
|
description for the flow:
|
|
|
|
- **BSP Startup:** The starting point for bootstrap processor.
|
|
|
|
- **Relocation**: Relocate the hypervisor image if the hypervisor image
|
|
is not placed at the assumed base address.
|
|
|
|
- **UART Init:** Initialize a pre-configured UART device used
|
|
as the base physical console for HV and Service OS.
|
|
|
|
- **Memory Init:** Initialize memory type and cache policy, and creates
|
|
MMU page table mapping for HV.
|
|
|
|
- **Scheduler Init:** Initialize scheduler framework, which provide the
|
|
capability to switch different threads(like vcpu vs. idle thread) on a
|
|
physical CPU, and to support CPU sharing.
|
|
|
|
- **Interrupt Init:** Initialize interrupt and exception for native HV
|
|
including IDT and ``do_IRQ`` infrastructure; a timer interrupt
|
|
framework is then built. The native/physical interrupts will go
|
|
through this ``do_IRQ`` infrastructure then distribute to special
|
|
targets (HV or VMs).
|
|
|
|
- **Start AP:** BSP kicks ``INIT-SIPI-SIPI`` IPI sequence to start other
|
|
native APs (application processor). Each AP will initialize its
|
|
own memory and interrupts, notifies the BSP on completion and
|
|
enter the default idle loop.
|
|
|
|
- **Shell Init:** Start a command shell for HV accessible via the UART.
|
|
|
|
Symbols in the hypervisor are placed with an assumed base address, but
|
|
the bootloader may not place the hypervisor at that specified base. In
|
|
this case, the hypervisor will relocate itself to where the bootloader
|
|
loads it.
|
|
|
|
Here is a summary of CPU and memory initial states that are set up after
|
|
the native startup.
|
|
|
|
CPU
|
|
ACRN hypervisor brings all physical processors to 64-bit IA32e
|
|
mode, with the assumption that the BSP starts in protection mode where
|
|
segmentation and paging sets an identical mapping of the first 4G
|
|
addresses without permission restrictions. The control registers and
|
|
some MSRs are set as follows:
|
|
|
|
- cr0: The following features are enabled: paging, write protection,
|
|
protection mode, numeric error and co-processor monitoring.
|
|
|
|
- cr3: refer to the initial state of memory.
|
|
|
|
- cr4: The following features are enabled: physical address extension,
|
|
machine-check, FXSAVE/FXRSTOR, SMEP, VMX operation and unmask
|
|
SIMD FP exception. The other features are disabled.
|
|
|
|
- MSR_IA32_EFER: only IA32e mode is enabled.
|
|
|
|
- MSR_IA32_FS_BASE: the address of stack canary, used for detecting
|
|
stack smashing.
|
|
|
|
- MSR_IA32_TSC_AUX: a unique logical ID is set for each physical
|
|
processor.
|
|
|
|
- stack: each physical processor has a separate stack.
|
|
|
|
Memory
|
|
All physical processors are in 64-bit IA32e mode after
|
|
startup. The GDT holds four entries, one unused, one for code and
|
|
another for data, both of which have a base of all 0's and a limit of
|
|
all 1's, and the other for 64-bit TSS. The TSS only holds three stack
|
|
pointers (for machine-check, double fault and stack fault) in the
|
|
interrupt stack table (IST) which are different across physical
|
|
processors. LDT is disabled.
|
|
|
|
Refer to :ref:`physical-interrupt-initialization` for a detailed description of interrupt-related
|
|
initial states, including IDT and physical PICs.
|
|
|
|
After the BSP detects that all APs are up, it will continue to enter guest mode; similar, after one AP
|
|
complete its initialization, it will start entering guest mode as well.
|
|
When BSP & APs enter guest mode, they will try to launch pre-defined VMs whose vBSP associated with
|
|
this physical core; these pre-defined VMs are static configured in ``vm config`` and they could be
|
|
pre-launched Safety VM or Service VM; the VM startup will be explained in next section.
|
|
|
|
.. _vm-startup:
|
|
|
|
VM Startup
|
|
**********
|
|
|
|
The Service VM or a pre-launched VM is created and launched on the physical
|
|
CPU which configured as its vBSP. Meanwhile, for the physical CPUs which
|
|
configured as vAPs for dedicated VMs, they will enter the default idle loop
|
|
(refer to :ref:`VCPU_lifecycle` for details), waiting for any vCPU to be
|
|
scheduled to them.
|
|
|
|
:numref:`hvstart-vmflow` illustrates a high-level execution flow of
|
|
creating and launching a VM, applicable to pre-launched VM, Service VM
|
|
and User VM. One major difference in the creation of User VM and pre-launched
|
|
/Service VM is that pre-launched/Service VM is created by the hypervisor,
|
|
while the creation of User VMs is triggered by the DM in Service OS.
|
|
The main steps include:
|
|
|
|
- **Create VM**: A VM structure is allocated and initialized. A unique
|
|
VM ID is picked, EPT is initialized, e820 table for this VM is prepared,
|
|
I/O bitmap is set up, virtual PIC/IOAPIC/PCI/UART is initialized, EPC for
|
|
virtual SGX is prepared, guest PM IO is set up, IOMMU for PT dev support
|
|
is enabled, virtual CPUID entries are filled, and vCPUs configured in this VM's
|
|
``vm config`` are prepared. For post-launched User VM, the EPT page table and
|
|
e820 table is actually prepared by DM instead of hypervisor.
|
|
|
|
- **Prepare vCPUs:** Create the vCPUs, assign the physical processor it
|
|
is pinned to, a unique-per-VM vCPU ID and a globally unique VPID,
|
|
and initializes its virtual lapic and MTRR, and its vCPU thread object got setup
|
|
for vcpu scheduling. The vCPU number and affinity are defined in corresponding
|
|
``vm config`` for this VM.
|
|
|
|
- **Build vACPI:** For the Service VM, the hypervisor will customize a virtual ACPI
|
|
table based on the native ACPI table (this is in the TODO).
|
|
For a pre-launched VM, the hypervisor will build a simple ACPI table with necessary
|
|
information like MADT.
|
|
For a post-launched User VM, the DM will build its ACPI table dynamically.
|
|
|
|
- **SW Load:** Prepares for each VM's SW configuration according to guest OS
|
|
requirement, which may include kernel entry address, ramdisk address,
|
|
bootargs, or zero page for launching bzImage etc.
|
|
This is done by the hypervisor for pre-launched or Service VM, and the VM will
|
|
start from the standard real or protected mode which is not related to the
|
|
native environment. For post-launched VMs, the VM's SW configuration is done
|
|
by DM.
|
|
|
|
- **Start VM:** The vBSP of vCPUs in this VM is kick to do schedule.
|
|
|
|
- **Schedule vCPUs:** The vCPUs are scheduled to the corresponding
|
|
physical processors for execution.
|
|
|
|
- **Init VMCS:** Initialize vCPU's VMCS for its host state, guest
|
|
state, execution control, entry control and exit control. It's
|
|
the last configuration before vCPU runs.
|
|
|
|
- **vCPU thread:** vCPU kicks out to run. For vBSP of vCPUs, it will
|
|
start running into kernel image which SW Load is configured; for
|
|
any vAP of vCPUs, it will wait for INIT-SIPI-SIPI IPI sequence
|
|
trigger from its vBSP.
|
|
|
|
.. figure:: images/hld-image104.png
|
|
:align: center
|
|
:name: hvstart-vmflow
|
|
|
|
Hypervisor VM Startup Flow
|
|
|
|
SW configuration for Service VM (bzimage SW load as example):
|
|
|
|
- **ACPI**: HV passes the entire ACPI table from bootloader to Service
|
|
VM directly. Legacy mode is currently supported as the ACPI table
|
|
is loaded at F-Segment.
|
|
|
|
- **E820**: HV passes e820 table from bootloader through zero-page
|
|
after the HV reserved (32M for example) and pre-launched VM owned
|
|
memory is filtered out.
|
|
|
|
- **Zero Page**: HV prepares the zero page at the high end of Service
|
|
VM memory which is determined by SOS_VM guest FIT binary build. The
|
|
zero page includes configuration for ramdisk, bootargs and e820
|
|
entries. The zero page address will be set to vBSP RSI register
|
|
before VCPU gets run.
|
|
|
|
- **Entry address**: HV will copy Service OS kernel image to
|
|
kernel_load_addr, which could be got from "pref_addr" field in bzimage
|
|
header; the entry address will be calculated based on kernel_load_addr,
|
|
and will be set to vBSP RIP register before VCPU gets run.
|
|
|
|
SW configuration for post-launched User VMs (OVMF SW load as example):
|
|
|
|
- **ACPI**: the virtual ACPI table is built by DM and put at User VM's
|
|
F-Segment. Refer to :ref:`hld-io-emulation` for details.
|
|
|
|
- **E820**: the virtual E820 table is built by the DM then passed to
|
|
the virtual bootloader. Refer to :ref:`hld-io-emulation` for details.
|
|
|
|
- **Entry address**: the DM will copy User OS kernel(OVMF) image to
|
|
OVMF_NVSTORAGE_OFFSET - normally is @(4G - 2M), and set the entry
|
|
address to 0xFFFFFFF0. As the vBSP will kick to run virtual bootloader
|
|
(OVMF) from real-mode, so its CS base will be set as 0xFFFF0000, and
|
|
RIP register will be set as 0xFFF0.
|
|
|
|
SW configuration for pre-launched VMs (raw SW load as example):
|
|
|
|
- **ACPI**: the virtual ACPI table is built by the hypervisor and put at
|
|
this VM's F-Segment.
|
|
|
|
- **E820**: the virtual E820 table is built by the hypervisor then passed to
|
|
the VM according to different SW loaders. For raw SW load here, it's not
|
|
used.
|
|
|
|
- **Entry address**: the hypervisor will copy User OS kernel image to
|
|
kernel_load_addr which set by ``vm config``, and set the entry
|
|
address to kernel_entry_addr which set by ``vm config`` as well.
|
|
|
|
Here is initial mode of vCPUs:
|
|
|
|
|
|
+----------------------------------+----------------------------------------------------------+
|
|
| VM and Processor Type | Initial Mode |
|
|
+=================+================+==========================================================+
|
|
| Service VM | BSP | Same as physical BSP, or Real Mode if Service VM boot |
|
|
| | | w/ OVMF |
|
|
| +----------------+----------------------------------------------------------+
|
|
| | AP | Real Mode |
|
|
+-----------------+----------------+----------------------------------------------------------+
|
|
| User VM | BSP | Real Mode |
|
|
| +----------------+----------------------------------------------------------+
|
|
| | AP | Real Mode |
|
|
+-----------------+----------------+----------------------------------------------------------+
|
|
| Pre-launched VM | BSP | Real Mode or Protected Mode |
|
|
| +----------------+----------------------------------------------------------+
|
|
| | AP | Real Mode |
|
|
+-----------------+----------------+----------------------------------------------------------+
|
|
|