109 lines
5.3 KiB
ReStructuredText
109 lines
5.3 KiB
ReStructuredText
# How Xtensa register windows work
|
|
|
|
There is a paucity of introductory material on this subject, and
|
|
Zephyr plays some tricks here that require understanding the base
|
|
layer.
|
|
|
|
## Hardware
|
|
|
|
When register windows are configured in the CPU, there are either 32
|
|
or 64 "real" registers in hardware, with 16 visible at one time.
|
|
Registers are grouped and rotated in units of 4, so there are 8 or 16
|
|
such "quads" (my term, not Tensilica's) in hardware of which 4 are
|
|
visible as A0-A15.
|
|
|
|
The first quad (A0-A3) is pointed to by a special register called
|
|
WINDOWBASE. The register file is cyclic, so for example if NREGS==64
|
|
and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as
|
|
(respectively) A0-A3, A4-A7, A8-A11, and A12-A15.
|
|
|
|
There is a ROTW instruction that can be used to manually rotate the
|
|
window by a immediate number of quads that are added to WINDOWBASE.
|
|
Positive rotations "move" high registers into low registers
|
|
(i.e. after "ROTW 1" the register that used to be called A4 is now
|
|
A0).
|
|
|
|
There are CALL4/CALL8/CALL12 instructions to effect rotated calls
|
|
which rotate registers upward (i.e. "hiding" low registers from the
|
|
callee) by 1, 2 or 3 quads. These do not rotate the window
|
|
themselves. Instead they place the rotation amount in two places
|
|
(yes, two; see below): the 2-bit CALLINC field of the PS register, and
|
|
the top two bits of the return address placed in A0.
|
|
|
|
There is an ENTRY instruction that does the rotation. It adds CALLINC
|
|
to WINDOWBASE, at the same time copying the old (now hidden) stack
|
|
pointer in A1 into the "new" A1 in the rotated frame, subtracting an
|
|
immediate offset from it to make space for the new frame.
|
|
|
|
There is a RETW instruction that undoes the rotation. It reads the
|
|
top two bits from the return address in A0 and subtracts that value
|
|
from WINDOWBASE before returning. This is why the CALLINC bits went
|
|
in two places. They have to be stored on the stack across potentially
|
|
many calls, so they need to be GPR data that lives in registers and
|
|
can be spilled. But ENTRY isn't specified to assume a particular
|
|
return value format and is used immediately, so it makes more sense
|
|
for it to use processor state instead.
|
|
|
|
Note that we still don't know how to detect when the register file has
|
|
wrapped around and needs to be spilled or filled. To do this there is
|
|
a WINDOWSTART register used to detect which register quads are in use.
|
|
The name "start" is somewhat confusing, this is not a pointer.
|
|
WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8
|
|
or 16 bits wide). The bit in windowstart corresponding to WINDOWBASE
|
|
will be set by the ENTRY instruction, and remain set after rotations
|
|
until cleared by a function return (by RETW, see below). Other bits
|
|
stay zero. So there is one set bit in WINDOWSTART corresponding to
|
|
each call frame that is live in hardware registers, and it will be
|
|
followed by 0, 1 or 2 zero bits that tell you how "big" (how many
|
|
quads of registers) that frame is.
|
|
|
|
So the CPU executing RETW checks to make sure that the register quad
|
|
being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit
|
|
indicating it's valid. If it does not, the registers must have been
|
|
spilled and the CPU traps to an exception handler to fill them.
|
|
|
|
Likewise, the processor can tell if a high register is "owned" by
|
|
another call by seeing if there is a one in WINDOWSTART between that
|
|
register's quad and WINDOWBASE. If there is, the CPU traps to a spill
|
|
handler to spill one frame. Note that a frame might be only four
|
|
registers, but it's possible to hit registers 12 out from WINDOWBASE,
|
|
so it's actually possible to trap again when the instruction restarts
|
|
to spill a second quad, and even a third time at maximum.
|
|
|
|
Finally: note that hardware checks the two bits of WINDOWSTART after
|
|
the frame bit to detect how many quads are represented by the one
|
|
frame. So there are six separate exception handlers to spill/fill
|
|
1/2/3 quads of registers.
|
|
|
|
## Software & ABI
|
|
|
|
The advantage of the scheme above is that it allows the registers to
|
|
be spilled naturally into the stack by using the stack pointers
|
|
embedded in the register file. But the hardware design assumes and to
|
|
some extent enforces a fairly complicated stack layout to make that
|
|
work:
|
|
|
|
The spill area for a single frame's A0-A3 registers is not in its own
|
|
stack frame. It lies in the 16 bytes below its CALLEE's stack
|
|
pointer. This is so that the callee (and exception handlers invoked
|
|
on its behalf) can see its caller's potentially-spilled stack pointer
|
|
register (A1) on the stack and be able to walk back up on return.
|
|
Other architectures do this too by e.g. pushing the incoming stack
|
|
pointer onto the stack as a standard "frame pointer" defined in the
|
|
platform ABI. Xtensa wraps this together with the natural spill area
|
|
for register windows.
|
|
|
|
By convention spill regions always store the lowest numbered register
|
|
in the lowest address.
|
|
|
|
The spill area for a frame's A4-A11 registers may or may not exist
|
|
depending on whether the call was made with CALL8/CALL12. It is legal
|
|
to write a function using only A0-A3 and CALL4 calls and ignore higher
|
|
registers. But if those 0-2 register quads are in use, they appear at
|
|
the top of the stack frame, immediately below the parent call's A0-A3
|
|
spill area.
|
|
|
|
There is no spill area for A12-A15. Those registers are always
|
|
caller-save. When using CALLn, you always need to overlap 4 registers
|
|
to provide arguments and take a return value.
|