xtensa: New asm layer to support SMP
SMP needs a new context switch primitive (to disentangle _swap() from
the scheduler) and new interrupt entry behavior (to be able to take a
global spinlock on behalf of legacy drivers). The existing code is
very obtuse, and working with it led me down a long path of "this
would be so much better if..." So this is a new context and entry
framework, intended to replace the code that exists now, at least on
SMP platforms.
New features:
* The new context switch primitive is xtensa_switch(), which takes a
"new" context handle as an argument instead of getting it from the
scheduler, returns an "old" context handle through a pointer
(e.g. to save it to the old thread context), and restores the lock
state(PS register) exactly as it is at entry instead of taking it as
an argument.
* The register spill code understands wrap-around register windows and
can avoid spilling A4-A15 registers when they are unused by the
interrupted function, saving as much as 48 bytes of stack space on
the interrupted stacks.
* The "spill register windows" routine is entirely different, using a
different mechanism, and is MUCH FASTER (to the tune of almost 200
cycles). See notes in comments.
* Even better, interrupt entry can be done via a clever "cross stack
call" I worked up, meaning that the interrupted thread's registers
do not need to be spilled at all until they are naturally pushed out
by the interrupt handler or until we return from the interrupt into
a different thread. This is a big efficiency win for tiny
interrupts (e.g. timers), and a big latency win for all interrupts.
* Interrupt entry is 100% symmetric with respect to medium/high
interrupts, avoiding the problems seen with hooking high priority
interrupts with the current code (e.g. ESP-32's watchdog driver).
* Much smaller code size. No cut and paste assembly. No use of HAL
calls.
* Assumes "XEA2" interrupt architecture, the register window extension
(i.e. no CALL0 ABI), and the "high priority interrupts" extension.
Does not support the legacy processor variants for which we have no
targets. The old code has some stuff in there to support this, but
it seems bitrotten, untestable, and I'm all but certain it doesn't
work.
Note that this simply adds the primitives to the existing tree in a
form where they can be unit tested. It does not replace the existing
interrupt/exception handling or _Swap() implementation.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-12-08 07:01:33 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2017, Intel Corporation
|
|
|
|
*
|
|
|
|
* SPDX-License-Identifier: Apache-2.0
|
|
|
|
*/
|
2018-09-15 01:43:44 +08:00
|
|
|
#ifndef ZEPHYR_ARCH_XTENSA_INCLUDE_XTENSA_ASM2_CONTEXT_H_
|
|
|
|
#define ZEPHYR_ARCH_XTENSA_INCLUDE_XTENSA_ASM2_CONTEXT_H_
|
xtensa: New asm layer to support SMP
SMP needs a new context switch primitive (to disentangle _swap() from
the scheduler) and new interrupt entry behavior (to be able to take a
global spinlock on behalf of legacy drivers). The existing code is
very obtuse, and working with it led me down a long path of "this
would be so much better if..." So this is a new context and entry
framework, intended to replace the code that exists now, at least on
SMP platforms.
New features:
* The new context switch primitive is xtensa_switch(), which takes a
"new" context handle as an argument instead of getting it from the
scheduler, returns an "old" context handle through a pointer
(e.g. to save it to the old thread context), and restores the lock
state(PS register) exactly as it is at entry instead of taking it as
an argument.
* The register spill code understands wrap-around register windows and
can avoid spilling A4-A15 registers when they are unused by the
interrupted function, saving as much as 48 bytes of stack space on
the interrupted stacks.
* The "spill register windows" routine is entirely different, using a
different mechanism, and is MUCH FASTER (to the tune of almost 200
cycles). See notes in comments.
* Even better, interrupt entry can be done via a clever "cross stack
call" I worked up, meaning that the interrupted thread's registers
do not need to be spilled at all until they are naturally pushed out
by the interrupt handler or until we return from the interrupt into
a different thread. This is a big efficiency win for tiny
interrupts (e.g. timers), and a big latency win for all interrupts.
* Interrupt entry is 100% symmetric with respect to medium/high
interrupts, avoiding the problems seen with hooking high priority
interrupts with the current code (e.g. ESP-32's watchdog driver).
* Much smaller code size. No cut and paste assembly. No use of HAL
calls.
* Assumes "XEA2" interrupt architecture, the register window extension
(i.e. no CALL0 ABI), and the "high priority interrupts" extension.
Does not support the legacy processor variants for which we have no
targets. The old code has some stuff in there to support this, but
it seems bitrotten, untestable, and I'm all but certain it doesn't
work.
Note that this simply adds the primitives to the existing tree in a
form where they can be unit tested. It does not replace the existing
interrupt/exception handling or _Swap() implementation.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-12-08 07:01:33 +08:00
|
|
|
|
|
|
|
#include <xtensa/corebits.h>
|
|
|
|
#include <xtensa/config/core-isa.h>
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Stack frame layout for a saved processor context, in memory order,
|
|
|
|
* high to low address:
|
|
|
|
*
|
|
|
|
* SP-0 <-- Interrupted stack pointer points here
|
|
|
|
*
|
|
|
|
* SP-4 Caller A3 spill slot \
|
|
|
|
* SP-8 Caller A2 spill slot |
|
|
|
|
* SP-12 Caller A1 spill slot + (Part of ABI standard)
|
|
|
|
* SP-16 Caller A0 spill slot /
|
|
|
|
*
|
|
|
|
* SP-20 Saved A3
|
|
|
|
* SP-24 Saved A2
|
|
|
|
* SP-28 Unused (not "Saved A1" because the SP is saved externally as a handle)
|
|
|
|
* SP-32 Saved A0
|
|
|
|
*
|
|
|
|
* SP-36 Saved PC (address to jump to following restore)
|
|
|
|
* SP-40 Saved/interrupted PS special register
|
|
|
|
*
|
|
|
|
* SP-44 Saved SAR special register
|
|
|
|
*
|
|
|
|
* SP-48 Saved LBEG special register (if loops enabled)
|
|
|
|
* SP-52 Saved LEND special register (if loops enabled)
|
|
|
|
* SP-56 Saved LCOUNT special register (if loops enabled)
|
|
|
|
*
|
|
|
|
* (The above fixed-size region is known as the "base save area" in the
|
|
|
|
* code below)
|
|
|
|
*
|
|
|
|
* - Saved A7 \
|
|
|
|
* - Saved A6 |
|
|
|
|
* - Saved A5 +- If not in-use by another frame
|
|
|
|
* - Saved A4 /
|
|
|
|
*
|
|
|
|
* - Saved A11 \
|
|
|
|
* - Saved A10 |
|
|
|
|
* - Saved A9 +- If not in-use by another frame
|
|
|
|
* - Saved A8 /
|
|
|
|
*
|
|
|
|
* - Saved A15 \
|
|
|
|
* - Saved A14 |
|
|
|
|
* - Saved A13 +- If not in-use by another frame
|
|
|
|
* - Saved A12 /
|
|
|
|
*
|
|
|
|
* - Saved intermediate stack pointer (points to low word of base save
|
|
|
|
* area, i.e. the saved LCOUNT or SAR). The pointer to this value
|
|
|
|
* (i.e. the final stack pointer) is stored externally as the
|
|
|
|
* "restore handle" in the thread context.
|
|
|
|
*
|
|
|
|
* Essentially, you can recover a pointer to the BSA by loading *SP.
|
|
|
|
* Adding the fixed BSA size to that gets you back to the
|
|
|
|
* original/interrupted stack pointer.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#if XCHAL_HAVE_LOOPS
|
|
|
|
#define BASE_SAVE_AREA_SIZE 56
|
|
|
|
#else
|
|
|
|
#define BASE_SAVE_AREA_SIZE 44
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#define BSA_A3_OFF (BASE_SAVE_AREA_SIZE - 20)
|
|
|
|
#define BSA_A2_OFF (BASE_SAVE_AREA_SIZE - 24)
|
|
|
|
#define BSA_SCRATCH_OFF (BASE_SAVE_AREA_SIZE - 28)
|
|
|
|
#define BSA_A0_OFF (BASE_SAVE_AREA_SIZE - 32)
|
|
|
|
#define BSA_PC_OFF (BASE_SAVE_AREA_SIZE - 36)
|
|
|
|
#define BSA_PS_OFF (BASE_SAVE_AREA_SIZE - 40)
|
|
|
|
#define BSA_SAR_OFF (BASE_SAVE_AREA_SIZE - 44)
|
|
|
|
#define BSA_LBEG_OFF (BASE_SAVE_AREA_SIZE - 48)
|
|
|
|
#define BSA_LEND_OFF (BASE_SAVE_AREA_SIZE - 52)
|
|
|
|
#define BSA_LCOUNT_OFF (BASE_SAVE_AREA_SIZE - 56)
|
|
|
|
|
2018-09-15 01:43:44 +08:00
|
|
|
#endif /* ZEPHYR_ARCH_XTENSA_INCLUDE_XTENSA_ASM2_CONTEXT_H_ */
|