461 lines
19 KiB
ReStructuredText
461 lines
19 KiB
ReStructuredText
.. _architecture_porting_guide:
|
|
|
|
Architecture Porting Guide
|
|
##########################
|
|
|
|
An architecture port is needed to enable Zephyr to run on an :abbr:`ISA
|
|
(instruction set architecture)` or an :abbr:`ABI (Application Binary
|
|
Interface)` that is not currently supported.
|
|
|
|
The following are examples of ISAs and ABIs that Zephyr supports:
|
|
|
|
* x86_32 ISA with System V ABI
|
|
* x86_32 ISA with IAMCU ABI
|
|
* ARMv7-M ISA with Thumb2 instruction set and ARM Embedded ABI (aeabi)
|
|
* ARCv2 ISA
|
|
|
|
An architecture port can be divided in several parts; most are required and
|
|
some are optional:
|
|
|
|
* **The early boot sequence**: each architecture has different steps it must
|
|
take when the CPU comes out of reset (required).
|
|
|
|
* **Interrupt and exception handling**: each architecture handles asynchronous
|
|
and un-requested events in a specific manner (required).
|
|
|
|
* **Thread context switching**: the Zephyr context switch is dependent on the
|
|
ABI and each ISA has a different set of registers to save (required).
|
|
|
|
* **Thread creation and termination**: A thread's initial stack frame is ABI
|
|
and architecture-dependent, and thread abortion possibly as well (required).
|
|
|
|
* **Device drivers**: most often, the system clock timer and the interrupt
|
|
controller are tied to the architecture (some required, some optional).
|
|
|
|
* **Utility libraries**: some common kernel APIs rely on a
|
|
architecture-specific implementation for performance reasons (required).
|
|
|
|
* **CPU idling/power management**: most architectures implement instructions
|
|
for putting the CPU to sleep (partly optional, most likely very desired).
|
|
|
|
* **Fault management**: for implementing architecture-specific debug help and
|
|
handling of fatal error in threads (partly optional).
|
|
|
|
* **Linker scripts and toolchains**: architecture-specific details will most
|
|
likely be needed in the build system and when linking the image (required).
|
|
|
|
Early Boot Sequence
|
|
*******************
|
|
|
|
The goal of the early boot sequence is to take the system from the state it is
|
|
after reset to a state where is can run C code and thus the common kernel
|
|
initialization sequence. Most of the time, very few steps are needed, while
|
|
some architectures require a bit more work to be performed.
|
|
|
|
Common steps for all architectures:
|
|
|
|
* Setup an initial stack.
|
|
* If running an :abbr:`XIP (eXecute-In-Place)` kernel, copy initialized data
|
|
* from ROM to RAM.
|
|
* If not using an ELF loader, zero the BSS section.
|
|
* Jump to :code:`_Cstart()`, the early kernel initialization
|
|
|
|
* :code:`_Cstart()` is responsible for context switching out of the fake
|
|
context running at startup into the main thread.
|
|
|
|
Some examples of architecture-specific steps that have to be taken:
|
|
|
|
* If given control in real mode on x86_32, switch to 32-bit protected mode.
|
|
* Setup the segment registers on x86_32 to handle boot loaders that leave them
|
|
in an unknown or broken state.
|
|
* Initialize a board-specific watchdog on Cortex-M3/4.
|
|
* Switch stacks from MSP to PSP on Cortex-M.
|
|
* Use a different approach than calling into _Swap() on Cortex-M to prevent
|
|
race conditions.
|
|
* Setup FIRQ and regular IRQ handling on ARCv2.
|
|
|
|
Interrupt and Exception Handling
|
|
********************************
|
|
|
|
Each architecture defines interrupt and exception handling differently.
|
|
|
|
When a device wants to signal the processor that there is some work to be done
|
|
on its behalf, it raises an interrupt. When a thread does an operation that is
|
|
not handled by the serial flow of the software itself, it raises an exception.
|
|
Both, interrupts and exceptions, pass control to a handler. The handler is
|
|
known as an :abbr:`ISR (Interrupt Service Routine)` in the case of
|
|
interrupts. The handler perform the work required the exception or the
|
|
interrupt. For interrupts, that work is device-specific. For exceptions, it
|
|
depends on the exception, but most often the core kernel itself is responsible
|
|
for providing the handler.
|
|
|
|
The kernel has to perform some work in addition to the work the handler itself
|
|
performs. For example:
|
|
|
|
* Prior to handing control to the handler:
|
|
|
|
* Save the currently executing context.
|
|
* Possibly getting out of power saving mode, which includes waking up
|
|
devices.
|
|
* Updating the kernel uptime if getting out of tickless idle mode.
|
|
|
|
* After getting control back from the handler:
|
|
|
|
* Decide whether to perform a context switch.
|
|
* When performing a context switch, restore the context being context
|
|
switched in.
|
|
|
|
This work is conceptually the same across architectures, but the details are
|
|
completely different:
|
|
|
|
* The registers to save and restore.
|
|
* The processor instructions to perform the work.
|
|
* The numbering of the exceptions.
|
|
* etc.
|
|
|
|
It thus needs an architecture-specific implementation, called the
|
|
interrupt/exception stub.
|
|
|
|
Another issue is that the kernel defines the signature of ISRs as:
|
|
|
|
.. code-block:: C
|
|
|
|
void (*isr)(void *parameter)
|
|
|
|
Architectures do not have a consistent or native way of handling parameters to
|
|
an ISR. As such there are two commonly used methods for handling the
|
|
parameter.
|
|
|
|
* Using some architecture defined mechanism, the parameter value is forced in
|
|
the stub. This is commonly found in X86-based architectures.
|
|
|
|
* The parameters to the ISR are inserted and tracked via a separate table
|
|
requiring the architecture to discover at runtime which interrupt is
|
|
executing. A common interrupt handler demuxer is installed for all entries of
|
|
the real interrupt vector table, which then fetches the device's ISR and
|
|
parameter from the separate table. This approach is commonly used in the ARC
|
|
and ARM architectures via the :option:`CONFIG_GEN_ISR_TABLES` implementation.
|
|
You can find examples of the stubs by looking at :code:`_interrupt_enter()` in
|
|
x86, :code:`_IntExit()` in ARM, :code:`_isr_wrapper()` in ARM, or the full
|
|
implementation description for ARC in :file:`arch/arc/core/isr_wrapper.S`.
|
|
|
|
Each architecture also has to implement primitives for interrupt control:
|
|
|
|
* locking interrupts: :c:func:`irq_lock`, :c:func:`irq_unlock`.
|
|
* registering interrupts: :c:func:`IRQ_CONNECT`.
|
|
* programming the priority if possible :c:func:`irq_priority_set`.
|
|
* enabling/disabling interrupts: :c:func:`irq_enable`, :c:func:`irq_disable`.
|
|
|
|
.. note::
|
|
|
|
:c:macro:`IRQ_CONNECT` is a macro that uses assembler and/or linker script
|
|
tricks to connect interrupts at build time, saving boot time and text size.
|
|
|
|
The vector table should contain a handler for each interrupt and exception that
|
|
can possibly occur. The handler can be as simple as a spinning loop. However,
|
|
we strongly suggest that handlers at least print some debug information. The
|
|
information helps figuring out what went wrong when hitting an exception that
|
|
is a fault, like divide-by-zero or invalid memory access, or an interrupt that
|
|
is not expected (:dfn:`spurious interrupt`). See the ARM implementation in
|
|
:file:`arch/arm/core/fault.c` for an example.
|
|
|
|
Thread Context Switching
|
|
************************
|
|
|
|
Multi-threading is the basic purpose to have a kernel at all. Zephyr supports
|
|
two types of threads: preemptible and cooperative.
|
|
|
|
Two crucial concepts when writing an architecture port are the following:
|
|
|
|
* Cooperative threads run at a higher priority than preemptible ones, and
|
|
always preempt them.
|
|
|
|
* After handling an interrupt, if a cooperative thread was interrupted, the
|
|
kernel always goes back to running that thread, since it is not preemptible.
|
|
|
|
A context switch can happen in several circumstances:
|
|
|
|
* When a thread executes a blocking operation, such as taking a semaphore that
|
|
is currently unavailable.
|
|
|
|
* When a preemptible thread unblocks a thread of higher priority by releasing
|
|
the object on which it was blocked.
|
|
|
|
* When an interrupt unblocks a thread of higher priority than the one currently
|
|
executing, if the currently executing thread is preemptible.
|
|
|
|
* When a thread runs to completion.
|
|
|
|
* When a thread causes a fatal exception and is removed from the running
|
|
threads. For example, referencing invalid memory,
|
|
|
|
Therefore, the context switching must thus be able to handle all these cases.
|
|
|
|
The kernel keeps the next thread to run in a "cache", and thus the context
|
|
switching code only has to fetch from that cache to select which thread to run.
|
|
|
|
There are two types of context switches: :dfn:`cooperative` and :dfn:`preemptive`.
|
|
|
|
* A *cooperative* context switch happens when a thread willfully gives the
|
|
control to another thread. There are two cases where this happens
|
|
|
|
* When a thread explicitly yields.
|
|
* When a thread tries to take an object that is currently unavailable and is
|
|
willing to wait until the object becomes available.
|
|
|
|
* A *preemptive* context switch happens either because an ISR or a
|
|
thread causes an operation that schedules a thread of higher priority than the
|
|
one currently running, if the currently running thread is preemptible.
|
|
An example of such an operation is releasing an object on which the thread
|
|
of higher priority was waiting.
|
|
|
|
.. note::
|
|
|
|
Control is never taken from cooperative thread when one of them is the
|
|
running thread.
|
|
|
|
A cooperative context switch is always done by having a thread call the
|
|
:code:`_Swap()` kernel internal symbol. When :code:`_Swap` is called, the
|
|
kernel logic knows that a context switch has to happen: :code:`_Swap` does not
|
|
check to see if a context switch must happen. Rather, :code:`_Swap` decides
|
|
what thread to context switch in. :code:`_Swap` is called by the kernel logic
|
|
when an object being operated on is unavailable, and some thread
|
|
yielding/sleeping primitives.
|
|
|
|
.. note::
|
|
|
|
On x86 and Nios2, :code:`_Swap` is generic enough and the architecture
|
|
flexible enough that :code:`_Swap` can be called when exiting an interrupt
|
|
to provoke the context switch. This should not be taken as a rule, since
|
|
neither the ARM Cortex-M or ARCv2 port do this.
|
|
|
|
Since :code:`_Swap` is cooperative, the caller-saved registers from the ABI are
|
|
already on the stack. There is no need to save them in the k_thread structure.
|
|
|
|
A context switch can also be performed preemptively. This happens upon exiting
|
|
an ISR, in the kernel interrupt exit stub:
|
|
|
|
* :code:`_interrupt_enter` on x86 after the handler is called.
|
|
* :code:`_IntExit` on ARM.
|
|
* :code:`_firq_exit` and :code:`_rirq_exit` on ARCv2.
|
|
|
|
In this case, the context switch must only be invoked when the interrupted
|
|
thread was preemptible, not when it was a cooperative one, and only when the
|
|
current interrupt is not nested.
|
|
|
|
The kernel also has the concept of "locking the scheduler". This is a concept
|
|
similar to locking the interrupts, but lighter-weight since interrupts can
|
|
still occur. If a thread has locked the scheduler, is it temporarily
|
|
non-preemptible.
|
|
|
|
So, the decision logic to invoke the context switch when exiting an interrupt
|
|
is simple:
|
|
|
|
* If the interrupted thread is not preemptible, do not invoke it.
|
|
* Else, fetch the cached thread from the ready queue, and:
|
|
|
|
* If the cached thread is not the current thread, invoke the context switch.
|
|
* Else, do not invoke it.
|
|
|
|
This is simple, but crucial: if this is not implemented correctly, the kernel
|
|
will not function as intended and will experience bizarre crashes, mostly due
|
|
to stack corruption.
|
|
|
|
.. note::
|
|
|
|
If running a coop-only system, i.e. if :option:`CONFIG_NUM_PREEMPT_PRIORITIES`
|
|
is 0, no preemptive context switch ever happens. The interrupt code can be
|
|
optimized to not take any scheduling decision when this is the case.
|
|
|
|
Thread Creation and Termination
|
|
*******************************
|
|
|
|
To start a new thread, a stack frame must be constructed so that the context
|
|
switch can pop it the same way it would pop one from a thread that had been
|
|
context switched out. This is to be implemented in an architecture-specific
|
|
:code:`_new_thread` internal routine.
|
|
|
|
The thread entry point is also not to be called directly, i.e. it should not be
|
|
set as the :abbr:`PC (program counter)` for the new thread. Rather it must be
|
|
wrapped in :code:`_thread_entry`. This means that the PC in the stack
|
|
frame shall be set to :code:`_thread_entry`, and the thread entry point shall
|
|
be passed as the first parameter to :code:`_thread_entry`. The specifics of
|
|
this depend on the ABI.
|
|
|
|
The need for an architecture-specific thread termination implementation depends
|
|
on the architecture. There is a generic implementation, but it might not work
|
|
for a given architecture.
|
|
|
|
One reason that has been encountered for having an architecture-specific
|
|
implementation of thread termination is that aborting a thread might be
|
|
different if aborting because of a graceful exit or because of an exception.
|
|
This is the case for ARM Cortex-M, where the CPU has to be taken out of handler
|
|
mode if the thread triggered a fatal exception, but not if the thread
|
|
gracefully exits its entry point function.
|
|
|
|
This means implementing an architecture-specific version of
|
|
:c:func:`k_thread_abort`, and setting the Kconfig option
|
|
:option:`CONFIG_ARCH_HAS_THREAD_ABORT` as needed for the architecture (e.g. see
|
|
:file:`arch/arm//core/cortex_m/Kconfig`).
|
|
|
|
Device Drivers
|
|
**************
|
|
|
|
The kernel requires very few hardware devices to function. In theory, the only
|
|
required device is the interrupt controller, since the kernel can run without a
|
|
system clock. In practice, to get access to most, if not all, of the sanity
|
|
check test suite, a system clock is needed as well. Since these two are usually
|
|
tied to the architecture, they are part of the architecture port.
|
|
|
|
Interrupt Controllers
|
|
=====================
|
|
|
|
There can be significant differences between the interrupt controllers and the
|
|
interrupt concepts across architectures.
|
|
|
|
For example, x86 has the concept of an :abbr:`IDT (Interrupt Descriptor Table)`
|
|
and different interrupt controllers. Although modern systems mostly
|
|
standardized on the :abbr:`APIC (Advanced Programmable Interrupt Controller)`,
|
|
some small Quark-based systems use the :abbr:`MVIC (Micro-controller Vectored
|
|
Interrupt Controller)`. Also, the position of an interrupt in the IDT
|
|
determines its priority.
|
|
|
|
On the other hand, the ARM Cortex-M has the :abbr:`NVIC (Nested Vectored
|
|
Interrupt Controller)` as part of the architecture definition. There is no need
|
|
for an IDT-like table that is separate from the NVIC vector table. The position
|
|
in the table has nothing to do with priority of an IRQ: priorities are
|
|
programmable per-entry.
|
|
|
|
The ARCv2 has its interrupt unit as part of the architecture definition, which
|
|
is somewhat similar to the NVIC. However, where ARC defines interrupts as
|
|
having a one-to-one mapping between exception and interrupt numbers (i.e.
|
|
exception 1 is IRQ1, and device IRQs start at 16), ARM has IRQ0 being
|
|
equivalent to exception 16 (and weirdly enough, exception 1 can be seen as
|
|
IRQ-15).
|
|
|
|
All these differences mean that very little, if anything, can be shared between
|
|
architectures with regards to interrupt controllers.
|
|
|
|
System Clock
|
|
============
|
|
|
|
x86 has APIC timers and the HPET as part of its architecture definition. ARM
|
|
Cortex-M has the SYSTICK exception. Finally, ARCv2 has the timer0/1 device.
|
|
|
|
Kernel timeouts are handled in the context of the system clock timer driver's
|
|
interrupt handler.
|
|
|
|
Tickless Idle
|
|
-------------
|
|
|
|
The kernel has support for tickless idle. Tickless idle is the concept where no
|
|
system clock timer interrupt is to be delivered to the CPU when the kernel is
|
|
about to go idle and the closest timeout expiry is passed a certain threshold.
|
|
When this condition happens, the system clock is reprogrammed far in the future
|
|
instead of for a periodic tick. For this to work, the system clock timer driver
|
|
must support it.
|
|
|
|
Tickless idle is optional but strongly recommended to achieve low-power
|
|
consumption.
|
|
|
|
The kernel has built-in support for going into tickless idle.
|
|
|
|
The system clock timer driver must implement some hooks to support tickless
|
|
idle. See existing drivers for examples.
|
|
|
|
The interrupt entry stub (:code:`_interrupt_enter`, :code:`_isr_wrapper`) needs
|
|
to be adapted to handle exiting tickless idle. See examples in the code for
|
|
existing architectures.
|
|
|
|
Console Over Serial Line
|
|
========================
|
|
|
|
There is one other device that is almost a requirement for an architecture
|
|
port, since it is so useful for debugging. It is a simple polling, output-only,
|
|
serial port driver on which to send the console (:code:`printk`,
|
|
:code:`printf`) output.
|
|
|
|
It is not required, and a RAM console (:option:`CONFIG_RAM_CONSOLE`)
|
|
can be used to send all output to a circular buffer that can be read
|
|
by a debugger instead.
|
|
|
|
Utility Libraries
|
|
*****************
|
|
|
|
The kernel depends on a few functions that can be implemented with very few
|
|
instructions or in a lock-less manner in modern processors. Those are thus
|
|
expected to be implemented as part of an architecture port.
|
|
|
|
* Atomic operators.
|
|
|
|
* If instructions do not exist for a given architecture,
|
|
a generic version that wraps :c:func:`irq_lock` or :c:func:`irq_unlock`
|
|
around non-atomic operations exists. It is configured using the
|
|
:option:`CONFIG_ATOMIC_OPERATIONS_C` Kconfig option.
|
|
|
|
* Find-least-significant-bit-set and find-most-significant-bit-set.
|
|
|
|
* If instructions do not exist for a given architecture, it is always
|
|
possible to implement these functions as generic C functions.
|
|
|
|
It is possible to use compiler built-ins to implement these, but be careful
|
|
they use the required compiler barriers.
|
|
|
|
CPU Idling/Power Management
|
|
***************************
|
|
|
|
The kernel provides support for CPU power management with two functions:
|
|
:c:func:`k_cpu_idle` and :c:func:`k_cpu_atomic_idle`.
|
|
|
|
:c:func:`k_cpu_idle` can be as simple as calling the power saving instruction
|
|
for the architecture with interrupts unlocked, for example :code:`hlt` on x86,
|
|
:code:`wfi` or :code:`wfe` on ARM, :code:`sleep` on ARC. This function can be
|
|
called in a loop within a context that does not care if it get interrupted or
|
|
not by an interrupt before going to sleep. There are basically two scenarios
|
|
when it is correct to use this function:
|
|
|
|
* In a single-threaded system, in the only thread when the thread is not used
|
|
for doing real work after initialization, i.e. it is sitting in a loop doing
|
|
nothing for the duration of the application.
|
|
|
|
* In the idle thread.
|
|
|
|
:c:func:`k_cpu_atomic_idle`, on the other hand, must be able to atomically
|
|
re-enable interrupts and invoke the power saving instruction. It can thus be
|
|
used in real application code, again in single-threaded systems.
|
|
|
|
Normally, idling the CPU should be left to the idle thread, but in some very
|
|
special scenarios, these APIs can be used by applications.
|
|
|
|
Both functions must exist for a given architecture. However, the implementation
|
|
can be simply the following steps, if desired:
|
|
|
|
#. unlock interrupts
|
|
#. NOP
|
|
|
|
However, a real implementation is strongly recommended.
|
|
|
|
Fault Management
|
|
****************
|
|
|
|
Each architecture provides two fatal error handlers:
|
|
|
|
* :code:`_NanoFatalErrorHandler`, called by software for unrecoverable errors.
|
|
* :code:`_SysFatalErrorHandler`, which makes the decision on how to handle
|
|
the thread where the error is generated, most likely by terminating it.
|
|
|
|
See the current architecture implementations for examples.
|
|
|
|
Toolchain and Linking
|
|
*********************
|
|
|
|
Toolchain support has to be added to the build system.
|
|
|
|
Some architecture-specific definitions are needed in :file:`toolchain/gcc.h`.
|
|
See what exists in that file for currently supported architectures.
|
|
|
|
Each architecture also needs its own linker script, even if most sections can
|
|
be derived from the linker scripts of other architectures. Some sections might
|
|
be specific to the new architecture, for example the SCB section on ARM and the
|
|
IDT section on x86.
|