358 lines
15 KiB
ReStructuredText
358 lines
15 KiB
ReStructuredText
.. _float_v2:
|
|
|
|
Floating Point Services
|
|
#######################
|
|
|
|
The kernel allows threads to use floating point registers on board
|
|
configurations that support these registers.
|
|
|
|
.. note::
|
|
Floating point services are currently available only for boards
|
|
based on ARM Cortex-M SoCs supporting the Floating Point Extension,
|
|
the Intel x86 architecture, the SPARC architecture and ARCv2 SoCs
|
|
supporting the Floating Point Extension. The services provided
|
|
are architecture specific.
|
|
|
|
The kernel does not support the use of floating point registers by ISRs.
|
|
|
|
.. contents::
|
|
:local:
|
|
:depth: 2
|
|
|
|
Concepts
|
|
********
|
|
|
|
The kernel can be configured to provide only the floating point services
|
|
required by an application. Three modes of operation are supported,
|
|
which are described below. In addition, the kernel's support for the SSE
|
|
registers can be included or omitted, as desired.
|
|
|
|
No FP registers mode
|
|
====================
|
|
|
|
This mode is used when the application has no threads that use floating point
|
|
registers. It is the kernel's default floating point services mode.
|
|
|
|
If a thread uses any floating point register,
|
|
the kernel generates a fatal error condition and aborts the thread.
|
|
|
|
Unshared FP registers mode
|
|
==========================
|
|
|
|
This mode is used when the application has only a single thread
|
|
that uses floating point registers.
|
|
|
|
On x86 platforms, the kernel initializes the floating point registers so they can
|
|
be used by any thread (initialization in skipped on ARM Cortex-M platforms and
|
|
ARCv2 platforms). The floating point registers are left unchanged whenever a
|
|
context switch occurs.
|
|
|
|
.. note::
|
|
The behavior is undefined, if two or more threads attempt to use
|
|
the floating point registers, as the kernel does not attempt to detect
|
|
(or prevent) multiple threads from using these registers.
|
|
|
|
Shared FP registers mode
|
|
========================
|
|
|
|
This mode is used when the application has two or more threads that use
|
|
floating point registers. Depending upon the underlying CPU architecture,
|
|
the kernel supports one or more of the following thread sub-classes:
|
|
|
|
* non-user: A thread that cannot use any floating point registers
|
|
|
|
* FPU user: A thread that can use the standard floating point registers
|
|
|
|
* SSE user: A thread that can use both the standard floating point registers
|
|
and SSE registers
|
|
|
|
The kernel initializes and enables access to the floating point registers,
|
|
so they can be used
|
|
by any thread, then saves and restores these registers during
|
|
context switches to ensure the computations performed by each FPU user
|
|
or SSE user are not impacted by the computations performed by the other users.
|
|
|
|
ARM Cortex-M architecture (with the Floating Point Extension)
|
|
-------------------------------------------------------------
|
|
|
|
.. note::
|
|
The Shared FP registers mode is the default Floating Point
|
|
Services mode in ARM Cortex-M.
|
|
|
|
On the ARM Cortex-M architecture with the Floating Point Extension, the kernel
|
|
treats *all* threads as FPU users when shared FP registers mode is enabled.
|
|
This means that any thread is allowed to access the floating point registers.
|
|
The ARM kernel automatically detects that a given thread is using the floating
|
|
point registers the first time the thread accesses them.
|
|
|
|
Pretag a thread that intends to use the FP registers by
|
|
using one of the techniques listed below.
|
|
|
|
* A statically-created ARM thread can be pretagged by passing the
|
|
:c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`.
|
|
|
|
* A dynamically-created ARM thread can be pretagged by passing the
|
|
:c:macro:`K_FP_REGS` option to :c:func:`k_thread_create`.
|
|
|
|
Pretagging a thread with the :c:macro:`K_FP_REGS` option instructs the
|
|
MPU-based stack protection mechanism to properly configure the size of
|
|
the thread's guard region to always guarantee stack overflow detection,
|
|
and enable lazy stacking for the given thread upon thread creation.
|
|
|
|
During thread context switching the ARM kernel saves the *callee-saved*
|
|
floating point registers, if the switched-out thread has been using them.
|
|
Additionally, the *caller-saved* floating point registers are saved on
|
|
the thread's stack. If the switched-in thread has been using the floating
|
|
point registers, the kernel restores the *callee-saved* FP registers of
|
|
the switched-in thread and the *caller-saved* FP context is restored from
|
|
the thread's stack. Thus, the kernel does not save or restore the FP
|
|
context of threads that are not using the FP registers.
|
|
|
|
Each thread that intends to use the floating point registers must provide
|
|
an extra 72 bytes of stack space where the callee-saved FP context can
|
|
be saved.
|
|
|
|
`Lazy Stacking
|
|
<https://developer.arm.com/documentation/dai0298/a>`_
|
|
is currently enabled in Zephyr applications on ARM Cortex-M
|
|
architecture, minimizing interrupt latency, when the floating
|
|
point context is active.
|
|
|
|
When the MPU-based stack protection mechanism is not enabled, lazy stacking
|
|
is always active in the Zephyr application. When the MPU-based stack protection
|
|
is enabled, the following rules apply with respect to lazy stacking:
|
|
|
|
* Lazy stacking is activated by default on threads that are pretagged with
|
|
:c:macro:`K_FP_REGS`
|
|
* Lazy stacking is activated dynamically on threads that are not pretagged with
|
|
:c:macro:`K_FP_REGS`, as soon as the kernel detects that they are using the
|
|
floating point registers.
|
|
|
|
|
|
If an ARM thread does not require use of the floating point registers any
|
|
more, it can call :c:func:`k_float_disable`. This instructs the kernel
|
|
not to save or restore its FP context during thread context switching.
|
|
|
|
ARM64 architecture
|
|
------------------
|
|
|
|
.. note::
|
|
The Shared FP registers mode is the default Floating Point
|
|
Services mode on ARM64. The compiler is free to optimize code
|
|
using FP/SIMD registers, and library functions such as memcpy
|
|
are known to make use of them.
|
|
|
|
On the ARM64 (Aarch64) architecture the kernel treats each thread as a FPU
|
|
user on a case-by-case basis. A "lazy save" algorithm is used during context
|
|
switching which updates the floating point registers only when it is absolutely
|
|
necessary. For example, the registers are *not* saved when switching from an
|
|
FPU user to a non-user thread, and then back to the original FPU user.
|
|
|
|
FPU register usage by ISRs is supported although not recommended. When an
|
|
ISR uses floating point or SIMD registers, then the access is trapped, the
|
|
current FPU user context is saved in the thread object and the ISR is resumed
|
|
with interrupts disabled so to prevent another IRQ from interrupting the ISR
|
|
and potentially requesting FPU usage. Because ISR don't have a persistent
|
|
register context, there are no provision for saving an ISR's FPU context
|
|
either, hence the IRQ disabling.
|
|
|
|
Each thread object becomes 512 bytes larger when Shared FP registers mode
|
|
is enabled.
|
|
|
|
ARCv2 architecture
|
|
------------------
|
|
|
|
On the ARCv2 architecture, the kernel treats each thread as a non-user
|
|
or FPU user and the thread must be tagged by one of the
|
|
following techniques.
|
|
|
|
* A statically-created ARC thread can be tagged by passing the
|
|
:c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`.
|
|
|
|
* A dynamically-created ARC thread can be tagged by passing the
|
|
:c:macro:`K_FP_REGS` to :c:func:`k_thread_create`.
|
|
|
|
If an ARC thread does not require use of the floating point registers any
|
|
more, it can call :c:func:`k_float_disable`. This instructs the kernel
|
|
not to save or restore its FP context during thread context switching.
|
|
|
|
During thread context switching the ARC kernel saves the *callee-saved*
|
|
floating point registers, if the switched-out thread has been using them.
|
|
Additionally, the *caller-saved* floating point registers are saved on
|
|
the thread's stack. If the switched-in thread has been using the floating
|
|
point registers, the kernel restores the *callee-saved* FP registers of
|
|
the switched-in thread and the *caller-saved* FP context is restored from
|
|
the thread's stack. Thus, the kernel does not save or restore the FP
|
|
context of threads that are not using the FP registers. An extra 16 bytes
|
|
(single floating point hardware) or 32 bytes (double floating point hardware)
|
|
of stack space is required to load and store floating point registers.
|
|
|
|
RISC-V architecture
|
|
-------------------
|
|
|
|
On the RISC-V architecture the kernel treats each thread as an FPU
|
|
user on a case-by-case basis with the FPU access allocated on demand.
|
|
A "lazy save" algorithm is used during context switching which updates
|
|
the floating point registers only when it is absolutely necessary.
|
|
For example, the FPU registers are *not* saved when switching from an
|
|
FPU user to a non-user thread (or an FPU user that doesn't touch the FPU
|
|
during its scheduling slot), and then back to the original FPU user.
|
|
|
|
FPU register usage by ISRs is supported although not recommended. When an
|
|
ISR uses floating point or SIMD registers, then the access is trapped, the
|
|
current FPU user context is saved in the thread object and the ISR is resumed
|
|
with interrupts disabled so to prevent another IRQ from interrupting the ISR
|
|
and potentially requesting FPU usage. Because ISR don't have a persistent
|
|
register context, there are no provision for saving an ISR's FPU context
|
|
either, hence the IRQ disabling.
|
|
|
|
As an optimization, the FPU context is preemptively restored upon scheduling
|
|
back an "active FPU user" thread that had its FPU context saved away due to
|
|
FPU usage by another thread. Active FPU users are so designated when they
|
|
make the FPU state "dirty" during their most recent scheduling slot before
|
|
being scheduled out. So if a thread doesn't modify the FPU state within its
|
|
scheduling slot and another thread claims the FPU for itself afterwards then
|
|
that first thread will be subjected to the on-demand regime and won't have
|
|
its FPU context restored until it attempts to access it again. But if that
|
|
thread does modify the FPU before being scheduled out then it is likely to
|
|
continue using it when scheduled back in and preemptively restoring its FPU
|
|
context saves on the exception trap overhead that would occur otherwise.
|
|
|
|
Each thread object becomes 136 bytes (single-precision floating point
|
|
hardware) or 264 bytes (double-precision floating point hardware) larger
|
|
when Shared FP registers mode is enabled.
|
|
|
|
SPARC architecture
|
|
------------------
|
|
|
|
On the SPARC architecture, the kernel treats each thread as a non-user
|
|
or FPU user and the thread must be tagged by one of the
|
|
following techniques:
|
|
|
|
* A statically-created thread can be tagged by passing the
|
|
:c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`.
|
|
|
|
* A dynamically-created thread can be tagged by passing the
|
|
:c:macro:`K_FP_REGS` to :c:func:`k_thread_create`.
|
|
|
|
During thread context switch at exit from interrupt handler, the SPARC
|
|
kernel saves *all* floating point registers, if the FPU was enabled in
|
|
the switched-out thread. Floating point registers are saved on the thread's
|
|
stack. Floating point registers are restored when a thread context is restored
|
|
iff they were saved at the context save. Saving and restoring of the floating
|
|
point registers is synchronous and thus not lazy. The FPU is always disabled
|
|
when an ISR is called (independent of :kconfig:option:`CONFIG_FPU_SHARING`).
|
|
|
|
Floating point disabling with :c:func:`k_float_disable` is not implemented.
|
|
|
|
When :kconfig:option:`CONFIG_FPU_SHARING` is used, then 136 bytes of stack space
|
|
is required for each FPU user thread to load and store floating point
|
|
registers. No extra stack is required if :kconfig:option:`CONFIG_FPU_SHARING` is
|
|
not used.
|
|
|
|
x86 architecture
|
|
----------------
|
|
|
|
On the x86 architecture the kernel treats each thread as a non-user,
|
|
FPU user or SSE user on a case-by-case basis. A "lazy save" algorithm is used
|
|
during context switching which updates the floating point registers only when
|
|
it is absolutely necessary. For example, the registers are *not* saved when
|
|
switching from an FPU user to a non-user thread, and then back to the original
|
|
FPU user. The following table indicates the amount of additional stack space a
|
|
thread must provide so the registers can be saved properly.
|
|
|
|
=========== =============== ==========================
|
|
Thread type FP register use Extra stack space required
|
|
=========== =============== ==========================
|
|
cooperative any 0 bytes
|
|
preemptive none 0 bytes
|
|
preemptive FPU 108 bytes
|
|
preemptive SSE 464 bytes
|
|
=========== =============== ==========================
|
|
|
|
The x86 kernel automatically detects that a given thread is using
|
|
the floating point registers the first time the thread accesses them.
|
|
The thread is tagged as an SSE user if the kernel has been configured
|
|
to support the SSE registers, or as an FPU user if the SSE registers are
|
|
not supported. If this would result in a thread that is an FPU user being
|
|
tagged as an SSE user, or if the application wants to avoid the exception
|
|
handling overhead involved in auto-tagging threads, it is possible to
|
|
pretag a thread using one of the techniques listed below.
|
|
|
|
* A statically-created x86 thread can be pretagged by passing the
|
|
:c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to
|
|
:c:macro:`K_THREAD_DEFINE`.
|
|
|
|
* A dynamically-created x86 thread can be pretagged by passing the
|
|
:c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to
|
|
:c:func:`k_thread_create`.
|
|
|
|
* An already-created x86 thread can pretag itself once it has started
|
|
by passing the :c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to
|
|
:c:func:`k_float_enable`.
|
|
|
|
If an x86 thread uses the floating point registers infrequently it can call
|
|
:c:func:`k_float_disable` to remove its tagging as an FPU user or SSE user.
|
|
This eliminates the need for the kernel to take steps to preserve
|
|
the contents of the floating point registers during context switches
|
|
when there is no need to do so.
|
|
When the thread again needs to use the floating point registers it can re-tag
|
|
itself as an FPU user or SSE user by calling :c:func:`k_float_enable`.
|
|
|
|
Implementation
|
|
**************
|
|
|
|
Performing Floating Point Arithmetic
|
|
====================================
|
|
|
|
No special coding is required for a thread to use floating point arithmetic
|
|
if the kernel is properly configured.
|
|
|
|
The following code shows how a routine can use floating point arithmetic
|
|
to avoid overflow issues when computing the average of a series of integer
|
|
values.
|
|
|
|
.. code-block:: c
|
|
|
|
int average(int *values, int num_values)
|
|
{
|
|
double sum;
|
|
int i;
|
|
|
|
sum = 0.0;
|
|
|
|
for (i = 0; i < num_values; i++) {
|
|
sum += *values;
|
|
values++;
|
|
}
|
|
|
|
return (int)((sum / num_values) + 0.5);
|
|
}
|
|
|
|
Suggested Uses
|
|
**************
|
|
|
|
Use the kernel floating point services when an application needs to
|
|
perform floating point operations.
|
|
|
|
Configuration Options
|
|
*********************
|
|
|
|
To configure unshared FP registers mode, enable the :kconfig:option:`CONFIG_FPU`
|
|
configuration option and leave the :kconfig:option:`CONFIG_FPU_SHARING` configuration
|
|
option disabled.
|
|
|
|
To configure shared FP registers mode, enable both the :kconfig:option:`CONFIG_FPU`
|
|
configuration option and the :kconfig:option:`CONFIG_FPU_SHARING` configuration option.
|
|
Also, ensure that any thread that uses the floating point registers has
|
|
sufficient added stack space for saving floating point register values
|
|
during context switches, as described above.
|
|
|
|
For x86, use the :kconfig:option:`CONFIG_X86_SSE` configuration option to enable
|
|
support for SSEx instructions.
|
|
|
|
API Reference
|
|
*************
|
|
|
|
.. doxygengroup:: float_apis
|