Doc: Migrate ARMv7-M run time stack checking doc

Migrate the ARMv7-M run time stack checking documentation page from the Confluence wiki to the official documentation. Signed-off-by: Ludovic Vanasse <ludovicvanasse@gmail.com>
2024-10-12 00:22:24 -04:00 · 2024-10-12 00:22:24 -04:00 · 14c115b20f
parent 2a65a34bee
commit 14c115b20f
2 changed files with 180 additions and 0 deletions
--- a/Documentation/guides/armv7m_runtimestackcheck.rst
+++ b/Documentation/guides/armv7m_runtimestackcheck.rst
@ -0,0 +1,179 @@
+ARMv7-M Run Time Stack Checking
+===============================
+
+.. warning:: 
+    Migrated from: 
+    https://cwiki.apache.org/confluence/display/NUTTX/ARMv7-M+Run+Time+Stack+Checking
+
+Overview
+--------
+
+Nuttx supports facilities to verify the dynamically allocated stacks and fixed 
+stacks used by the tasks and interrupt context running under Nuttx. There are 2
+types of stack checking that can be used together or separately.
+
+1. The Stack Monitor
+2. Per function Call (ARMV7 Only)
+
+The Stack Monitor
+-----------------
+
+The use of the Stack Monitor application requires that 
+``CONFIG_STACK_COLORATION`` be enabled. This compile time option enables the 
+writing of a know pattern ``STACK_COLOR`` to the stack memory at creation time. 
+In the case of the idle task and interrupt, this is done in the code that runs 
+just after reset at startup. This is known as stack coloring.
+
+Once the pattern has been established, the functions ``up_check_stack`` and 
+friends are used to perform the stack checking by finding the lowest word of 
+the pattern in the allocated stack.
+
+The Stack Monitor is enabled with ``CONFIG_SYSTEM_STACKMONITOR`` which will 
+enable a daemon that will periodically run and check the stack penetration of 
+the tasks running on the system.
+
+The stack monitor is good to help size and check the usage of tasks. However it 
+is not really useful to detect, certain kinds, nor the cause of a stack overrun.
+
+The reason for this is because a corrupted stack may not be eaten away at. It 
+may have overruns where the stack pointer is set way below the stack bottom in 
+a function call as it allocates local variables on the stack. The code in the 
+function call can then corrupt the memory below the stack bottom, restores the 
+stack pointer and returns to the caller without actually overwriting the 
+coloring at the base of the stack.
+
+This brings us the the next method of stack checking.
+
+
+Per function Call
+-----------------
+This method of stack checking leverages the profiler hook mechanism supported 
+by the compiler. Once enabled using ``CONFIG_ARMV7M_STACKCHECK``, one register 
+is set aside (``R10`` is the default) and the value of the base of stack is 
+saved there (``rBS``). Then every function call will have a preamble and a
+postamble code added to it. The preamble (``__cyg_profile_func_enter``) checks 
+the current stack pointer, minus a margin of 64 bytes (with an additional 136 
+bytes for the ``FP`` registers) against the value in the reserved register 
+``rBS``. If the computed value lies below the value in ``rBS`` a hard fault is 
+generated. The postamble code (``__cyg_profile_func_exit``)  just returns to 
+the caller.
+
+The rationale for subtracting the margin can be viewed two ways. If the 
+configuration is not using a separate ISR stack, then the space reserved will 
+accommodate the context save of the CPU and optionally the FPU registers to 
+service an interrupt on the users stack. If the configuration is using a 
+separate ISR stack, some the 64 bytes will accommodate the transition to the 
+interrupt stack and the remaining 60-200 bytes are just margin. Either way 
+stacks should always be allocated with at least 200 bytes of margin.
+
+Because of the reserved register rBS contains the current context's stack base, 
+and rBS is not updated on the entry to an ISR, it is not possible to check the 
+stack penetration for an interrupt with Per function Call stack checking.
+
+One thing to consider is the impact on code size and speed this method of stack 
+checking will have. Each function will have two additional call and return 
+instructions added to it. In the execution path of each function, there will be 
+an added set of instructions to perform the preamble and postamble 
+functionality. In a call tree that is nested several layers deep, this can add 
+up. In one particular use, we saw an increase of 30% to 35% additional CPU 
+utilization required to support per function call stack checking.
+
+Is this just a debugging tool? One could imagine that in a mission critical 
+application, this might be part of a release build if the code size and speed 
+impact can be tolerated.
+
+As of commit 4942867 Register R11 will contain the value that the stack pointer 
+would hit that caused the fault. This can be used to calculate the stack size 
+needed for that task that faulted. To do so, take the difference of R10-R11 and 
+round it up by 12 bytes (The round up is to make up for the 8 byte stack 
+alignment and 4 byte decrease that may happen in the stack allocation) Then add 
+that amount to the failing tasks stack size.
+
+Details for Support of Per Call Stack Checking
+----------------------------------------------
+
+Currently only ARMV7 derivatives support Per Call Stack Checking. Support 
+requires the following components:
+
+The start function must establish the value in ``rBS`` (``R10`` by default see 
+below). Yet to do this the start function must NOT have the preamble and 
+postamble code added to it. This is accomplished with the use the following 
+gcc attribute:
+
+.. code-block:: c
+
+    #ifdef CONFIG_ARMV7M_STACKCHECK
+    /* we need to get r10 set before we can allow instrumentation calls */
+    
+    void __start(void) __attribute__ ((no_instrument_function));
+    #endif
+
+...
+
+.. code-block:: c
+
+    void __start(void)
+    {
+      const uint32_t *src;
+      uint32_t *dest;
+    
+    #ifdef CONFIG_ARMV7M_STACKCHECK
+    
+      /* Set the stack limit before we attempt to call any functions */
+    
+      __asm__ volatile ("sub r10, sp, %0" : : "r" (CONFIG_IDLETHREAD_STACKSIZE -64) : );
+    #endif
+
+The minus 64 is setting the limit 64 bytes above the bottom of the stack. Note: 
+This may be adding another 64 bytes of margin
+
+For the creation of a task's context the following code is needed to set up 
+``rBS``
+
+.. code-block:: c
+
+    void up_initial_state(struct tcb_s *tcb)
+    {
+      struct xcptcontext *xcp = &tcb->xcp;
+    
+      /* Initialize the initial exception register context structure */
+    
+      memset(xcp, 0, sizeof(struct xcptcontext));
+
+      /* Save the initial stack pointer */
+
+      xcp->regs[REG_SP]      = (uint32_t)tcb->adj_stack_ptr;
+
+    #ifdef CONFIG_ARMV7M_STACKCHECK
+      /* Set the stack limit value */
+
+      xcp->regs[REG_R10]     = (uint32_t)tcb->stack_alloc_ptr + 64;
+    #endif
+
+And finally up_stackcheck.c needs to be included in the build and the compiler 
+flags set to reserve ``R10`` and enable the instrumentation.
+
+This is done for a given architecture in nuttx/arch/arm/src/<arch>/Make.defs:
+
+.. code-block:: makefile
+
+    ifeq ($(CONFIG_ARMV7M_STACKCHECK),y)
+    CMN_CSRCS += up_stackcheck.c
+    endif
+
+The compiler flags are added in the nuttx/arch/arm/src/armv7-m/Toolchain.defs
+
+.. code-block:: makefile
+
+    # enable precise stack overflow tracking
+    ifeq ($(CONFIG_ARMV7M_STACKCHECK),y)
+    INSTRUMENTATIONDEFINES   = -finstrument-functions -ffixed-r10
+    endif
+
+Other Considerations
+--------------------
+
+If using the export build feature of Nuttx: For the runtime stack checking both 
+the Application and Nuttx need to be built with the 
+``CONFIG_ARMV7M_STACKCHECK`` option set the same state, enabled or disabled. 
+Any mismatch will created either compile time or runtime issues.
--- a/Documentation/guides/index.rst
+++ b/Documentation/guides/index.rst
@ -38,3 +38,4 @@ Guides
  debuggingflash_nuttxonarm.rst
  changing_systemclockconfig.rst
  usingkernelthreads.rst
+  armv7m_runtimestackcheck.rst