197 lines
9.7 KiB
ReStructuredText
197 lines
9.7 KiB
ReStructuredText
.. _rt_perf_tips_rtvm:
|
|
|
|
ACRN Real-Time VM Performance Tips
|
|
##################################
|
|
|
|
Background
|
|
**********
|
|
|
|
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM.
|
|
This document shows how you can configure RTVMs to potentially achieve
|
|
near bare-metal performance by configuring certain key technologies and
|
|
eliminating use of a VM-exit within RT tasks, thereby avoiding this
|
|
common virtualization overhead issue.
|
|
|
|
Neighbor VMs such as Service VMs, Human-Machine-Interface (HMI) VMs, or
|
|
other real-time VMs, may negatively affect the execution of real-time
|
|
tasks on an RTVM. This document also shows technologies used to isolate
|
|
potential runtime noise from neighbor VMs.
|
|
|
|
Here are some key technologies that can significantly improve
|
|
RTVM performance:
|
|
|
|
- LAPIC passthrough with core partitioning.
|
|
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices are
|
|
supported for the RTVM.
|
|
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
|
|
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
|
|
uses a CLOS that will not overlap with the RTVM CLOS.
|
|
- PMD virtio: Both virtio BE and FE work in polling mode so
|
|
interrupts and notification between the Service VM and RTVM are not needed.
|
|
All RTVM guest memory is hidden from the Service VM except for the virtio
|
|
queue memory.
|
|
|
|
This document summarizes tips from issues encountered and
|
|
resolved during real-time development and performance tuning.
|
|
|
|
Mandatory Options for an RTVM
|
|
*****************************
|
|
|
|
An RTVM is a post-launched VM with LAPIC passthrough. Pay attention to
|
|
these options when you launch an ACRN RTVM:
|
|
|
|
Tip: Apply the acrn-dm option ``--lapic_pt``
|
|
The LAPIC passthrough feature of ACRN is configured via the
|
|
``--lapic_pt`` option, but the feature is actually enabled when LAPIC is
|
|
switched to X2APIC mode. Both conditions should be met to enable an
|
|
RTVM. The ``--rtvm`` option will be automatically attached once
|
|
``--lapic_pt`` is applied.
|
|
|
|
Tip: Use virtio polling mode
|
|
Polling mode prevents the frontend of the VM-exit from sending a
|
|
notification to the backend. We recommend that you passthrough a
|
|
physical peripheral device (such as block or an Ethernet device), to an
|
|
RTVM. If no physical device is available, ACRN supports virtio devices
|
|
and enables polling mode to avoid a VM-exit at the frontend. Enable
|
|
virtio polling mode via the option ``--virtio_poll [polling interval]``.
|
|
|
|
Avoid VM-exit Latency
|
|
*********************
|
|
|
|
VM-exit has a significant negative impact on virtualization performance.
|
|
A single VM-exit causes several micro-seconds or longer latency,
|
|
depending on what's done in VMX-root mode. VM-exit is classified into two
|
|
types: triggered by external CPU events or triggered by operations initiated
|
|
by the vCPU.
|
|
|
|
ACRN eliminates almost all VM-exits triggered by external events by
|
|
using LAPIC passthrough. A few exceptions exist:
|
|
|
|
- SMI - This brings the processor into the SMM, causing a much longer
|
|
performance impact. The SMI should be handled in the BIOS.
|
|
|
|
- NMI - ACRN uses NMI for system-level notification.
|
|
|
|
You should avoid VM-exits triggered by operations initiated by the vCPU. Refer
|
|
to the `Intel 64 and IA-32 Architectures Software Developer's Manual (SDM)
|
|
<https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html>`_
|
|
"Instructions That Cause VM Exits Unconditionally" (SDM V3, 25.1.2) and
|
|
"Instructions That Cause VM Exits Conditionally" (SDM V3, 25.1.3).
|
|
|
|
Tip: Do not use CPUID in a real-time critical section.
|
|
The CPUID instruction causes VM-exits unconditionally. You should
|
|
detect CPU capability **before** entering a RT-critical section.
|
|
CPUID can be executed at any privilege level to serialize instruction
|
|
execution and its high efficiency of execution. It's commonly used as a
|
|
serializing instruction in an application by using CPUID
|
|
immediately before and after RDTSC. Remove use of CPUID in this case by
|
|
using RDTSCP instead of RDTSC. RDTSCP waits until all previous
|
|
instructions have been executed before reading the counter, and the
|
|
subsequent instructions after the RDTSCP normally have data dependency
|
|
on it, so they must wait until the RDTSCP has been executed.
|
|
|
|
RDMSR and WRMSR are instructions that cause VM-exits conditionally. On the
|
|
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
|
|
VM-exit. But there are exceptions for security consideration:
|
|
|
|
1) read from APICID and LDR;
|
|
2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
|
|
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE;
|
|
3) write to ICR.
|
|
|
|
Tip: Do not use RDMSR to access APICID and LDR in an RT critical section.
|
|
ACRN does not present a physical APICID to a guest, so APICID
|
|
and LDR are virtualized even though LAPIC is passthrough. As a result,
|
|
access to APICID and LDR can cause a VM-exit.
|
|
|
|
Tip: Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
|
|
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
|
|
pTSC_ADJUST. If VMX_TSC_OFFSET_FULL is zero, intercepting
|
|
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
|
|
intercepted to guarantee functionality.
|
|
|
|
Tip: Utilize Preempt-RT Linux mechanisms to reduce the access of ICR from the RT core.
|
|
#. Add ``domain`` to ``isolcpus`` ( ``isolcpus=nohz,domain,1`` ) to the kernel parameters.
|
|
#. Add ``idle=poll`` to the kernel parameters.
|
|
#. Add ``rcu_nocb_poll`` along with ``rcu_nocbs=1`` to the kernel parameters.
|
|
#. Disable the logging service such as ``journald`` or ``syslogd`` if possible.
|
|
|
|
The parameters shown above are recommended for the guest Preempt-RT
|
|
Linux. For an UP RTVM, ICR interception is not a problem. But for an SMP
|
|
RTVM, IPI may be needed between vCPUs. These tips are about reducing ICR
|
|
access. The example above assumes it is a dual-core RTVM, while core 0
|
|
is a housekeeping core and core 1 is a real-time core. The ``domain``
|
|
flag makes strong isolation of the RT core from the general SMP
|
|
balancing and scheduling algorithms. The parameters ``idle=poll`` and
|
|
``rcu_nocb_poll`` could prevent the RT core from sending reschedule IPI
|
|
to wakeup tasks on core 0 in most cases. The logging service is disabled
|
|
because an IPI may be issued to the housekeeping core to notify the
|
|
logging service when there are kernel messages output on the RT core.
|
|
|
|
.. note::
|
|
If an ICR access is inevitable within the RT critical section, be
|
|
aware of the extra 3~4 microsecond latency for each access.
|
|
|
|
Tip: Create and initialize the RT tasks at the beginning to avoid runtime access to control registers.
|
|
Accessing Control Registers is another cause of a VM-exit. An ACRN access
|
|
to CR3 and CR8 does not cause a VM-exit. However, writes to CR0 and CR4 may cause a
|
|
VM-exit, which would happen at the spawning and initialization of a new task.
|
|
|
|
Isolating the Impact of Neighbor VMs
|
|
************************************
|
|
|
|
ACRN makes use of several technologies and hardware features to avoid
|
|
performance impact on the RTVM by neighbor VMs:
|
|
|
|
Tip: Do not share CPUs allocated to the RTVM with other RT or non-RT VMs.
|
|
ACRN enables CPU sharing to improve the utilization of CPU resources.
|
|
However, for an RT VM, CPUs should be dedicatedly allocated for determinism.
|
|
|
|
Tip: Use RDT such as CAT and MBA to allocate dedicated resources to the RTVM.
|
|
ACRN enables Intel Resource Director Technology such as CAT, and MBA
|
|
components such as the GPU via the memory hierarchy. The availability of RDT is
|
|
hardware-specific. Refer to the :ref:`rdt_configuration`.
|
|
|
|
Tip: Lock the GPU to a feasible lowest frequency.
|
|
A GPU can put a heavy load on the power/memory subsystem. Locking
|
|
the GPU frequency as low as possible can help improve RT performance
|
|
determinism. GPU frequency can usually be locked in the BIOS, but such
|
|
BIOS support is platform-specific.
|
|
|
|
Miscellaneous
|
|
*************
|
|
|
|
Tip: Disable timer migration on Preempt-RT Linux.
|
|
Because most tasks are set affinitive to the housekeeping core, the timer
|
|
armed by RT tasks might be migrated to the nearest busy CPU for power
|
|
saving. But it will hurt RT determinism because the timer interrupts raised
|
|
on the housekeeping core need to be resent to the RT core. The timer
|
|
migration can be disabled by the command::
|
|
|
|
echo 0 > /proc/kernel/timer_migration
|
|
|
|
Tip: Add ``mce=off`` to RT VM kernel parameters.
|
|
This parameter disables the MCE periodic timer and avoids a VM-exit.
|
|
|
|
Tip: Disable the Intel processor C-state and P-state of the RTVM.
|
|
Power management of a processor could save power, but it could also impact
|
|
the RT performance because the power state is changing. C-state and P-state
|
|
PM mechanism can be disabled by adding ``processor.max_cstate=0
|
|
intel_idle.max_cstate=0 intel_pstate=disable`` to the kernel parameters.
|
|
|
|
Tip: Exercise caution when setting ``/proc/sys/kernel/sched_rt_runtime_us``.
|
|
Setting ``/proc/sys/kernel/sched_rt_runtime_us`` to ``-1`` can be a
|
|
problem. A value of ``-1`` allows RT tasks to monopolize a CPU, so that
|
|
a mechanism such as ``nohz`` might get no chance to work, which can hurt
|
|
the RT performance or even (potentially) lock up a system.
|
|
|
|
Tip: Disable the software workaround for Machine Check Error on Page Size Change.
|
|
By default, the software workaround for Machine Check Error on Page Size
|
|
Change is conditionally applied to the models that may be affected by the
|
|
issue. However, the software workaround has a negative impact on
|
|
performance. If all guest OS kernels are trusted, the
|
|
:option:`hv.FEATURES.MCE_ON_PSC_DISABLED` option could be set for performance.
|
|
|
|
.. note::
|
|
The tips for preempt-RT Linux are mostly applicable to the Linux-based RTOS as well, such as Xenomai.
|