ZEPHYR TASK PROFILER
####################
This project permits to enable profiling in zephyr application
Profiling trace can be retrived over JTAG or UART
- JTAG: slow but no impact on CPU execution
- UART: much faster but impacting CPU
On QEMU, profiler trace can be retrieved using gdb (following JTAG guidelines)
or on virtual serial port (when enabling UART flush)
Profiler consists in:
1) Enabling KERNEL_EVENT_LOGGER in zephyr
- Various config switches allows selecting events to capture
2) Applying a set of patches on top of zephyr kernel (provided in this package)
3) Extracting profiler events via JTAG (requires debugger) or UART (requires
profiler background task)
- for UART: profiler background task must be added to the application
4) Post-processing tools for getting profiler binary data and get ASCII/graphic
output
---
NOTE: KERNEL_TASK_MONITOR logs have been merged in KERNEL_EVENT_LOGGER ring
buffer since Zephyr 1.1
If using Zephyr version inferior to 1.1, monitor buffer can be dumped using
JTAG and post-processed using profile_monitor.py
The command to dump the buffer is:
(gdb) dump binary value <monitor_file> k_monitor_buff
Then post-processing:
./profile_monitor.py (--qemu) <monitor_file> <zephyr_map_file>
Project organisation
====================
This project includes:
|_ profiler: profiler code (kernel event logger flush) and post-processing
| | scripts
| |_ src: profiler code to integrate into zephyr application to enable
| | flushing of kernel events
| |_ scripts: post-processing scripts
| | |_ term: terminal to use to get zephyr console logs and retrieve profiler
| | data into a separate file for post-processing (linux only)
|_ sample_microkernel_app: sample code to exercise profiler in microkernel
| mode
|_ sample_nanokernel_app: sample code to exercise profiler in nanokernel mode
Following guide explains how to enable profiling in your application and
how to use profiling data
Requirements
============
To be installed on host PC:
- python and matplotlib
For JTAG only
- gdb
- For galileo
- JTAG dongle (e.g. OLIMEX ARM-USB-OCD-H)
- openocd
How to enable profiler in your application
==========================================
Conventions:
$ZEPHYR_BASE = Zephyr base folder
$APP_BASE = Application base folder
$PROFILER_BASE = Profiler base folder
prof.log = profiler output file generated from Zephyr target (binary)
1) Enable KERNEL_EVENT_LOGGER
-----------------------------
Edit project configuration file (e.g. $APP_BASE/prj.conf) and add the following
flags:
<-- snippet
CONFIG_RING_BUFFER=y
CONFIG_KERNEL_EVENT_LOGGER=y
CONFIG_NANO_TIMEOUTS=y
CONFIG_KERNEL_EVENT_LOGGER_BUFFER_SIZE=<BUFFER_SIZE>
-->
<BUFFER_SIZE> may depend on use case. Note that
- on JTAG, buffer is no more populated when full
- on UART, buffer is flushed at UART speed every 5s (can be modified in
src/profiler.c)
Events have to be enabled on top of previous flags:
- Task/fiber switches
CONFIG_KERNEL_EVENT_LOGGER_CONTEXT_SWITCH=y
- Interrupts logging
CONFIG_KERNEL_EVENT_LOGGER_INTERRUPT=y
- Sleep logging
CONFIG_KERNEL_EVENT_LOGGER_SLEEP=y
- Task monitor (microkernel only)
CONFIG_TASK_MONITOR=y
CONFIG_TASK_MONITOR_MASK=<MASK>
<MASK> bits are defined in kernel/microkernel/include/micro_private.h
For information:
- Task swap event: 1
- Task state bit change: 2
- Kserver commands: 4
- Kevents: 8
If wanting to understand microkernel object usage and task bit handling,
recommended value is 6
Otherwise, task monitor can be disabled (to decrease logger ring buffer usage)
2) Enable UART flush [OPTIONAL]
-------------------------------
2.1) Enable profiler UART flush function
----------------------------------------
2.1.1) MICROKERNEL mode
-----------------------
Add to $APP_BASE/prj.mdef
% TASK NAME PRIO ENTRY STACK GROUPS
% ==================================
<--snippet
TASK PROF 10 prof 512 [EXE]
-->
Note:
Task priority must be the lowest (depending on application code and zephyr
lowest supported priority). Anyway, an application never letting lower priority
task execute (so never going to background task) may avoid the profiler task
to execute. In this case, NANOKERNEL below method can be used (calling
prof_flush in the main loop, where processing will be the least impacted)
2.1.2) NANOKERNEL mode
----------------------
prof_flush function must be called from main task before going in idle mode
<--snippet
extern void prof_flush()
...
void main(void)
{
...
while(1) {
...
prof_flush();
// Going to idle e.g. calling nano_cpu_idle()
-->
2.2) Add path to $PROFILER_BASE folder in application Makefile
--------------------------------------------------------------
For example in $APP_BASE/src/Makefile:
<--snippet
obj-y += $PROFILER_BASE/
-->
2.3) Increase console UART baud rate (depending on your board)
------------------------------------
On Quark X1000, maximum standard baud rate is 921600
This is done by adding following line in $APP_BASE/prj.conf:
<--snippet
CONFIG_UART_NS16550_PORT_1_BAUD_RATE=921600
-->
Profiler has been tested at 2 Mbauds on Quark SE.
3) Enable profiler console for dynamic control
----------------------------------------------
For having the capability to dynamically enable/disable/configure the profiler
at runtime, shell command is supported.
For that purpose, following flags must be enabled in project configuration file
(e.g. $APP_BASE/prj.conf):
<-- snippet
CONFIG_KERNEL_EVENT_LOGGER_DYNAMIC=y
CONFIG_CONSOLE_HANDLER=y
CONFIG_CONSOLE_HANDLER_SHELL=y
CONFIG_MINIMAL_LIBC_EXTENDED=y
-->
CONFIG_MINIMAL_LIBC_EXTENDED is required for atoi() function used in profiler shell
command implementation.
If done, the profiler will automatically enable a shell over UART allowing to
use commands to start/stop/configure the profiler:
- Start profiling:
prof start
- Stop profiling:
prof stop
- Configure profiler:
prof cfg <cfg1> <cfg2>
This sets the kernel event logger configuration (cfg1) and kernel monitor
configuration (cfg2)
By default cfg1 is set to log all events enabled at build time and cfg2 is set
to CONFIG_TASK_MONITOR_MASK
The new configuration is applied for the following start of the profiler.
Any on-going profiling session isn't altered
- Read profiler configuration
prof cfg
Get return like X(Y) A(B)
X being currently used kernel event logger configuration / Y being the next
value (Set by cfg command).
A being the currently used kernel monitor configuration / B being the next
value.
WARNING: if the application has already defined its shell, then the profiler
command can be added to the list of application commands this way (apps_cmd
being the array of shell commands defined by the app)
<-- snippet
#include "profiler.h"
const struct shell_cmd apps_cmd[] = {
...
<apps commands defined here>
...
PROF_CMD,
{ NULL, NULL}
}
-->
Additionally, the profiler must not register its console so
PROFILER_NO_SHELL_REGISTER must be set to 1 while building:
export PROFILER_NO_SHELL_REGISTER=1
or
make BOARD=... PROFILER_NO_SHELL_REGISTER=1
4) Build and flash project
--------------------------
5) Get prepared for collecting profiler data
--------------------------------------------
5.1) for using JTAG on galileo with OpenOCD
-------------------------------------------
Connect OpenOCD JTAG to PC
<--snippet
cd $OPENOCD_FOLDER
openocd -f interface/ftdi/olimex-arm-usb-ocd-h.cfg -f board/quark_x10xx_board.cfg
cd $APP_BASE
gdb outdir/zephyr.elf
(gdb) target remote localhost:3333
(gdb) c
-->
5.2) for using UART on any board
--------------------------------
- connect console UART to host PC
- start "profterm"
<--snippet
cd $PROFILER_BASE/scripts/term/
make
cd $PROFILER_BASE/scripts
./profterm /dev/<TTY_USB> prof.log console.log
-->
<TTY_USB> being the board UART console device
Note: profterm UART rate can be set using -s option. Otherwise default speed is
921600.
5.3) for using with QEMU (UART flushing mode)
---------------------------------------------
Two terminals are required
- one for running QEMU, basically redirecting serial to a virtual device
- one for running profterm (as it would be done with a board)
<--snippet
cd $PROFILER_BASE/scripts/term/
make
-->
Term1: QEMU must be launched the following way
<--snippet
cd $APP_BASE
make BOARD=... QEMU_PTY=1 qemu
-->
QEMU will display virtually created TTY
"char device redicected to /dev/pts/XY"
Term2:
<--snippet
./profterm /dev/pts/XY prof.log console.log
-->
6) Run program
--------------
7) Get log (JTAG only)
----------------------
Break execution and dump buffer
<--snippet
(gdb) ** break <CTRL-C> **
(gdb) dump binary value prof.log *(int[<BUFFER_SIZE>] *)sys_k_event_logger.ring_buf.buf
-->
<BUFFER_SIZE> being the buffer size defined in $APP_BASE/prj.conf file
8) Post-processing
------------------
WARNING: zephyr ELF file is required for post-processing. This file must be
kept together with dump files in order to be able to run the post-processing.
profile script under $PROFILER_BASE/scripts must be called in its folder for the
moment
Run "./profile.sh -h" for usage
Note that if using JTAG, HW cycles per sec is not part of the dump and so HW
cycles per sec must be provided to "profile" script using -c option (see
profile.sh help)
Notes:
- python is required for the post processing. matplotlib shall be installed for
graphical output