286 lines
12 KiB
ReStructuredText
286 lines
12 KiB
ReStructuredText
|
.. _l1tf:
|
||
|
|
||
|
L1TF Overview
|
||
|
#############
|
||
|
|
||
|
L1 Terminal Fault is a speculative side channel which allows unprivileged
|
||
|
speculative access to data which is available in the Level 1 Data Cache
|
||
|
when the page table entry controlling the virtual address, which is used
|
||
|
for the access, has the Present bit cleared or reserved bits set.
|
||
|
|
||
|
When the processor accesses a linear address, it first looks for a
|
||
|
translation to a physical address in the translation lookaside buffer (TLB).
|
||
|
For an unmapped address this will not provide a physical address, so the
|
||
|
processor performs a table walk of a hierarchical paging structure in
|
||
|
memory that provides translations from linear to physical addresses. A page
|
||
|
fault is signaled if this table walk fails.
|
||
|
|
||
|
During the process of a terminal fault, the processor speculatively computes
|
||
|
a physical address from the paging structure entry and the address of the
|
||
|
fault. This physical address is composed of the address of the page frame
|
||
|
and low order bits from the linear address. If data with this physical
|
||
|
address is present in the L1D, that data may be loaded and forwarded to
|
||
|
dependent instructions. These dependent instructions may create a side
|
||
|
channel.
|
||
|
|
||
|
Because the resulting probed physical address is not a true translation of
|
||
|
the virtual address, the resulting address is not constrained by various
|
||
|
memory range checks or nested translations. Specifically:
|
||
|
|
||
|
* Intel SGX protected memory checks are not applied.
|
||
|
* Extended Page Table (EPT) guest physical to host physical address
|
||
|
translation is not applied.
|
||
|
* SMM protected memory checks are not applied.
|
||
|
|
||
|
The following CVE entries are related to the L1TF:
|
||
|
|
||
|
============= ================= ==============================
|
||
|
CVE-2018-3615 L1 Terminal Fault SGX related aspects
|
||
|
CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
|
||
|
CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
|
||
|
============= ================= ==============================
|
||
|
|
||
|
Please refer to `Intel Analysis of L1TF`_ and `Linux L1TF document`_ for
|
||
|
more details.
|
||
|
|
||
|
.. _Intel Analysis of L1TF:
|
||
|
https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-l1-terminal-fault
|
||
|
|
||
|
.. _Linux L1TF document:
|
||
|
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/l1tf.rst
|
||
|
|
||
|
L1TF Problem in ACRN
|
||
|
####################
|
||
|
|
||
|
There are mainly three attack scenarios considered in ACRN:
|
||
|
|
||
|
- Guest->hypervisor attack
|
||
|
- Guest->guest attack
|
||
|
- Normal_world->secure_world attack (Android specific)
|
||
|
|
||
|
Malicious user space is not a concern to ACRN hypervisor, because
|
||
|
every guest runs in VMX non-root. It is reponsibility of guest kernel
|
||
|
to protect itself from malicious user space attack.
|
||
|
|
||
|
SGX/SMM related attacks are mitigated by using latest ucode. There is
|
||
|
no additional action in ACRN hypervisor.
|
||
|
|
||
|
Guest->hypervisor Attack
|
||
|
******************************
|
||
|
|
||
|
ACRN always enables EPT for all guests (SOS and UOS), thus a malicious
|
||
|
guest can directly control guest PTEs to construct L1TF-based attack
|
||
|
to hypervisor. Alternatively if ACRN EPT are not sanitized with some
|
||
|
PTEs (with present bit cleared, or reserved bit set) pointing to valid
|
||
|
host PFNs, a malicious guest may use those EPT PTEs to construct attack.
|
||
|
|
||
|
A special aspect of L1TF in the context of virtualization is symmetric
|
||
|
multi threading (SMT), e.g. Intel(R) Hyper-Threading Technology.
|
||
|
Logical processors on the affected physical cores share the L1 Data Cache
|
||
|
(L1D). This fact could make more variants of L1TF-based attack, e.g.
|
||
|
a malicious guest running on one logical processor can attack the data which
|
||
|
is brought into L1D by the context which runs on the sibling thread of
|
||
|
the same physical core. This context can be any code in hypervisor.
|
||
|
|
||
|
--secure data in ACRN hypervisor--
|
||
|
|
||
|
It is hard to decide which data in ACRN hypervisor is secret or valuable
|
||
|
data. The amount of valuable data from ACRN contexts cannot be declared as
|
||
|
non-interesting for an attacker without deep inspection of the code.
|
||
|
|
||
|
But obviously, the most import secret data in ACRN is the physical platform
|
||
|
seed generated from CSME and virtual seeds which are derived from that
|
||
|
platform seed. They are critical secrets to serve for guest keystore or
|
||
|
other security usage, e.g. disk encryption, secure storage.
|
||
|
|
||
|
Guest->guest Attack
|
||
|
******************************
|
||
|
|
||
|
A malicious guest may use the same side channel as introduced in
|
||
|
last section to attack other guests. The possibility of guest->guest
|
||
|
attack varies on specific configuration, e.g. whether CPU partitioning
|
||
|
is used, whether Hyper-Threading is on, etc.
|
||
|
|
||
|
If CPU partitioning is enabled (default policy in ACRN), there is
|
||
|
1:1 mapping between vCPUs and pCPUs i.e. no sharing on pCPU. Then
|
||
|
the only attack possibility is when Hyper-Threading is on, where
|
||
|
logical processors of same physical core may be allocated to two
|
||
|
different guests. Then one guest may be able to attack the other guest
|
||
|
on sibling thread due to shared L1D.
|
||
|
|
||
|
If CPU sharing is enabled (not supported now), two VMs may share
|
||
|
same pCPU thus next VM may steal information in L1D which comes
|
||
|
from activity of previous VM on the same pCPU.
|
||
|
|
||
|
Normal_world->Secure_world Attack
|
||
|
******************************
|
||
|
|
||
|
ACRN supports Android guest, which requires two running worlds
|
||
|
(normal world vs. secure world). Two worlds run on the same CPU,
|
||
|
and world switch is conducted on demand. Then it is possible for
|
||
|
normal world to construct L1TF-based stack to secure world, thus
|
||
|
break the security model as expected by Android guest.
|
||
|
|
||
|
Affected Processors
|
||
|
******************************
|
||
|
|
||
|
L1TF affects a range of Intel processors, but Intel ATOM processors
|
||
|
(including Apollo Lake) are immune to it. Currently ACRN hypervisor
|
||
|
supports only Apollo Lake, but other core-based platforms may be also
|
||
|
supported in the future so we still need a mitigation plan in ACRN.
|
||
|
|
||
|
Processors that have the RDCL_NO bit set to one (1) in the
|
||
|
IA32_ARCH_CAPABILITIES MSR are not susceptible to the L1TF
|
||
|
speculative exectuion side channel.
|
||
|
|
||
|
Please refer to `Intel Analysis of L1TF`_ for more details.
|
||
|
|
||
|
L1TF Mitigation in ACRN
|
||
|
#######################
|
||
|
|
||
|
The basic assumption is to use latest ucode, which mitigates SMM
|
||
|
and SGX cases while also providing necessary capability for VMM
|
||
|
to use for further mitigation.
|
||
|
|
||
|
ACRN will check the platform capability based on `CPUID enumeration
|
||
|
and architectural MSR`_. For L1TF affected platform (CPUID.07H.EDX.29
|
||
|
with MSR_IA32_ARCH_CAPABILITIES), L1D_FLUSH capability(CPUID.07H.EDX.28)
|
||
|
must be supported.
|
||
|
|
||
|
.. _CPUID enumeration and architectural MSR:
|
||
|
https://software.intel.com/security-software-guidance/insights/deep-dive-cpuid-enumeration-and-architectural-msrs
|
||
|
|
||
|
Not all of below mitigations will be implemented. Even for
|
||
|
implemented mitigations not all of them apply in a given ACRN
|
||
|
deployment. Please always check *status* section and
|
||
|
*recommendation* section for detail guidance.
|
||
|
|
||
|
EPT Sanitization
|
||
|
****************
|
||
|
|
||
|
EPT is sanitized to avoid pointing to valid host memory in PTEs
|
||
|
which has present bit cleared or reserved bits set.
|
||
|
|
||
|
For non-present PTEs, ACRN currently set pfn bits to ZERO, which
|
||
|
means page ZERO might fall into risk if containing security info.
|
||
|
ACRN reserves page ZERO (0~4K) from page allocator thus page ZERO
|
||
|
won't be used by anybody for valid usage. This sanitization logic
|
||
|
is always enabled on all platforms.
|
||
|
|
||
|
ACRN hypervisor doesn't set reserved bits in any EPT entry.
|
||
|
|
||
|
L1D flush on VMENTRY
|
||
|
**************************
|
||
|
|
||
|
ACRN may optionally flush L1D at VMENTRY, which ensures no
|
||
|
sensitive information from hypervisor or previous VM revealed
|
||
|
to current VM (in case of CPU sharing).
|
||
|
|
||
|
Flushing the L1D evicts not only the data which should not be
|
||
|
accessed by a potentially malicious guest, it also flushes the
|
||
|
guest data. Flushing the L1D has a performance impact as the
|
||
|
processor has to bring the flushed guest data back into the L1D,
|
||
|
and actual overhead is proportional to the frequency of vmentry.
|
||
|
|
||
|
Due to such performance reason, ACRN provides a config option
|
||
|
(L1D_FLUSH_VMENTRY) to enable/disable L1D flush during
|
||
|
VMENTRY. By default this option is disabled.
|
||
|
|
||
|
Put Secret Data into Uncached Memory
|
||
|
************************************
|
||
|
|
||
|
If the critical secret data in ACRN is identified, then such
|
||
|
data can be put into un-cached memory. As the content will
|
||
|
never go to L1D, it is immune to L1TF attack
|
||
|
|
||
|
For example, after getting the physical seed from CSME, before any guest
|
||
|
starts, ACRN can pre-derive all the virtual seeds for all the
|
||
|
guests and then put these virtual seeds into uncached memory,
|
||
|
at the same time flush & erase physical seed.
|
||
|
|
||
|
If all security data are identified and put in uncached
|
||
|
meomry in a specific deployment, then it is not necessary to
|
||
|
prevent guest->hypervisor attack, since there is nothing
|
||
|
useful to be attacked.
|
||
|
|
||
|
However if such 100% identification is not possible, user should
|
||
|
consider other mitigation options to protect hypervisor.
|
||
|
|
||
|
L1D flush on World Switch
|
||
|
**************************
|
||
|
|
||
|
For L1D-affected platforms, ACRN writes to aforementioned MSR
|
||
|
to flush L1D when switching from secure world to normal world.
|
||
|
Doing so guarantees no sensitive information from secure world
|
||
|
leaked in L1D. Performance impact is expected to small since world
|
||
|
switch frequency is not expected high.
|
||
|
|
||
|
It's not necessary to flush L1D in the other direction, since
|
||
|
normal world is less privileged entity to secure world.
|
||
|
|
||
|
This mitigation is always enabled.
|
||
|
|
||
|
Core-based scheduling
|
||
|
***********************
|
||
|
|
||
|
If Hyper-Threading is enabled, there is no easy method to mitigate
|
||
|
L1TF attack from a sibling processor on the same physical core.
|
||
|
|
||
|
A basic idea is to avoid running sensitive context (if containing
|
||
|
security data which a given VM has no premission to access) on
|
||
|
the same physical core that runs said VM. It requires scheduler
|
||
|
enhancement to enable core-based scheduling policy, so all threads
|
||
|
on the same core are always scheduled to the same VM. Also there
|
||
|
are some further actions required to protect hypervisor and
|
||
|
secure world from sibling attacks in core-based scheduler.
|
||
|
|
||
|
Please note there is no commitment of implementation it so far.
|
||
|
ACRN community will keep evaluating this part based on usage
|
||
|
requirements and hardware platform status.
|
||
|
|
||
|
Mitigation Recommendations
|
||
|
##########################
|
||
|
|
||
|
There is no mitigation required on Apollo Lake based platforms.
|
||
|
|
||
|
For other affected platforms:
|
||
|
|
||
|
The majority use case for ACRN is in pre-configured environment,
|
||
|
where the whole software stack (from ACRN hypervisor to guest
|
||
|
kernel to SOS root) is tightly controlled by solution provider
|
||
|
and not allowed for run-time change after sale (guest kernel is
|
||
|
sort of trusted). In that case solution provider will make sure
|
||
|
that guest kernel is up-to-date including necessary page table
|
||
|
sanitization, thus there is no attack interface exposed within
|
||
|
guest. Then a minimal mitigation configuration is sufficient
|
||
|
with negligible performance impact, as explained below:
|
||
|
|
||
|
1) Use latest ucode
|
||
|
2) Guest kernel is up-to-date with page table sanitization
|
||
|
3) EPT sanitization (always enabled)
|
||
|
4) Flush L1D at world switch (Android specific, always enabled)
|
||
|
|
||
|
In case that someone wants to deploy ACRN into an open environment
|
||
|
where guest kernel is considered untrusted. There are more
|
||
|
mitigation options required according to the specific usage
|
||
|
requirements:
|
||
|
|
||
|
5) Put hypervisor security data in UC memory if possible
|
||
|
6) Enable L1D_FLUSH_VMENTRY option, if
|
||
|
- Doing 5) is not feasible, or
|
||
|
- CPU sharing is enabled (in the future)
|
||
|
|
||
|
If Hyper-Threading is enabled, there is no available option
|
||
|
before core scheduling is planned. User should understand
|
||
|
the security implication and only turn on Hyper-Threading
|
||
|
when the potential risk is acceptable to their usage.
|
||
|
|
||
|
Status
|
||
|
######
|
||
|
|
||
|
EPT sanitization: supported
|
||
|
L1D flush on VMENTRY: supported
|
||
|
L1D flush on world switch: supported
|
||
|
Uncached security data: n/a
|
||
|
Core scheduling: n/a
|