444 lines
21 KiB
ReStructuredText
444 lines
21 KiB
ReStructuredText
Data Structures
|
|
###############
|
|
|
|
Zephyr provides a library of common general purpose data structures
|
|
used within the kernel, but useful by application code in general.
|
|
These include list and balanced tree structures for storing ordered
|
|
data, and a ring buffer for managing "byte stream" data in a clean
|
|
way.
|
|
|
|
Note that in general, the collections are implemented as "intrusive"
|
|
data structures. The "node" data is the only struct used by the
|
|
library code, and it does not store a pointer or other metadata to
|
|
indicate what user data is "owned" by that node. Instead, the
|
|
expectation is that the node will be itself embedded within a
|
|
user-defined struct. Macros are provided to retrieve a user struct
|
|
address from the embedded node pointer in a clean way. The purpose
|
|
behind this design is to allow the collections to be used in contexts
|
|
where dynamic allocation is disallowed (i.e. there is no need to
|
|
allocate node objects because the memory is provided by the user).
|
|
|
|
Note also that these libraries are uniformly unsynchronized; access to
|
|
them is not threadsafe by default. These are data structures, not
|
|
synchronization primitives. The expectation is that any locking
|
|
needed will be provided by the user.
|
|
|
|
Single-linked List
|
|
==================
|
|
|
|
Zephyr provides a ``sys_slist_t`` type for storing simple
|
|
singly-linked list data (i.e. data where each list element stores a
|
|
pointer to the next element, but not the previous one). This supports
|
|
constant-time access to the first (head) and last (tail) elements of
|
|
the list, insertion before the head and after the tail of the list and
|
|
constant time removal of the head. Removal of subsequent nodes
|
|
requires access to the "previous" pointer and thus can only be
|
|
performed in linear time by searching the list.
|
|
|
|
The ``sys_slist_t`` struct may be instantiated by the user in any
|
|
accessible memory. It should be initialized with either
|
|
``sys_slist_init()`` or by static assignment from SYS_SLIST_STATIC_INIT
|
|
before use. Its interior fields are opaque and should not be accessed
|
|
by user code.
|
|
|
|
The end nodes of a list may be retrieved with
|
|
``sys_slist_peek_head()`` and ``sys_slist_peek_tail()``, which will
|
|
return NULL if the list is empty, otherwise a pointer to a
|
|
``sys_snode_t`` struct.
|
|
|
|
The ``sys_snode_t`` struct represents the data to be inserted. In
|
|
general, it is expected to be allocated/controlled by the user,
|
|
usually embedded within a struct which is to be added to the list.
|
|
The container struct pointer may be retrieved from a list node using
|
|
``SYS_SLIST_CONTAINER()``, passing it the struct name of the
|
|
containing struct and the field name of the node. Internally, the
|
|
``sys_snode_t`` struct contains only a next pointer, which may be
|
|
accessed with ``sys_slist_peek_next()``.
|
|
|
|
Lists may be modified by adding a single node at the head or tail with
|
|
``sys_slist_prepend()`` and ``sys_slist_append()``. They may also
|
|
have a node added to an interior point with ``sys_slist_insert()``,
|
|
which inserts a new node after an existing one. Similarly
|
|
``sys_slist_remove()`` will remove a node given a pointer to its
|
|
predecessor. These operations are all constant time.
|
|
|
|
Convenience routines exist for more complicated modifications to a
|
|
list. ``sys_slist_merge_slist()`` will append an entire list to an
|
|
existing one. ``sys_slist_append_list()`` will append a bounded
|
|
subset of an existing list in constant time. And
|
|
``sys_slist_find_and_remove()`` will search a list (in linear time)
|
|
for a given node and remove it if present.
|
|
|
|
Finally the slist implementation provides a set of "for each" macros
|
|
that allows for iterating over a list in a natural way without needing
|
|
to manually traverse the next pointers. ``SYS_SLIST_FOR_EACH_NODE()``
|
|
will enumerate every node in a list given a local variable to store
|
|
the node pointer. ``SYS_SLIST_FOR_EACH_NODE_SAFE()`` behaves
|
|
similarly, but has a more complicated implementation that requires an
|
|
extra scratch variable for storage and allows the user to delete the
|
|
iterated node during the iteration. Each of those macros also exists
|
|
in a "container" variant (``SYS_SLIST_FOR_EACH_CONTAINER()`` and
|
|
``SYS_SLIST_FOR_EACH_CONTAINER_SAFE()``) which assigns a local
|
|
variable of a type that matches the user's container struct and not
|
|
the node struct, performing the required offsets internally. And
|
|
``SYS_SLIST_ITERATE_FROM_NODE()`` exists to allow for enumerating a
|
|
node and all its successors only, without inspecting the earlier part
|
|
of the list.
|
|
|
|
Slist Internals
|
|
---------------
|
|
|
|
The slist code is designed to be minimal and conventional.
|
|
Internally, a ``sys_slist_t`` struct is nothing more than a pair of
|
|
"head" and "tail" pointer fields. And a ``sys_snode_t`` stores only a
|
|
single "next" pointer.
|
|
|
|
.. figure:: slist.png
|
|
:align: center
|
|
:alt: slist example
|
|
:figclass: align-center
|
|
|
|
An slist containing three elements.
|
|
|
|
.. figure:: slist-empty.png
|
|
:align: center
|
|
:alt: empty slist example
|
|
:figclass: align-center
|
|
|
|
An empty slist
|
|
|
|
The specific implementation of the list code, however, is done with an
|
|
internal "Z_GENLIST" template API which allows for extracting those
|
|
fields from arbitrary structures and emits an arbitrarily named set of
|
|
functions. This allows for implementing more complicated
|
|
single-linked list variants using the same basic primitives. The
|
|
genlist implementor is responsible for a custom implementation of the
|
|
primitive operations only: an "init" step for each struct, and a "get"
|
|
and "set" primitives for each of head, tail and next pointers on their
|
|
relevant structs. These inline functions are passed as parameters to
|
|
the genlist macro expansion.
|
|
|
|
Only one such variant, sflist, exists in Zephyr at the moment.
|
|
|
|
Flagged List
|
|
------------
|
|
|
|
The ``sys_sflist_t`` is implemented using the described genlist
|
|
template API. With the exception of symbol naming ("sflist" instead
|
|
of "sflist"), it operates in all ways identically to the slist API.
|
|
|
|
It includes the ability to associate exactly two bits of user defined
|
|
"flags" with each list node. These can be accessed and modified with
|
|
``sys_sflist_flags_get()`` and ``sys_sflist_flags_get()``.
|
|
Internally, the flags are stored unioned with the bottom bits of the
|
|
next pointer and incur no SRAM storage overhead when compared with the
|
|
simpler slist code.
|
|
|
|
Double-linked List
|
|
==================
|
|
|
|
Similar to the single-linked list in many respects, Zephyr includes a
|
|
double-linked implementation. This provides the same algorithmic
|
|
behavior for all the existing slist operations, but also allows for
|
|
constant-time removal and insertion (at all points: before or after
|
|
the head, tail or any internal node). To do this, the list stores two
|
|
pointers per node, and thus has somewhat higher runtime code and
|
|
memory space needs.
|
|
|
|
A ``sys_dlist_t`` struct may be instantiated by the user in any
|
|
accessible memory. It must be initialized with ``sys_dlist_init()``
|
|
or ``SYS_DLIST_STATIC_INIT()`` before use. The ``sys_dnode_t`` struct
|
|
is expected to be provided by the user for any nodes addded to the
|
|
list (typically embedded within the struct to be tracked, as described
|
|
above). It must be initialized in zeroed/bss memory or with
|
|
``sys_dnode_init()`` before use.
|
|
|
|
Primitive operations may retrieve the head/tail of a list and the
|
|
next/prev pointers of a node with ``sys_dlist_peek_head()``,
|
|
``sys_dlist_peek_tail()``, ``sys_dlist_peek_next()`` and
|
|
``sys_dlist_peek_prev()``. These can all return NULL where
|
|
appropriate (i.e. for empty lists, or nodes at the endpoints of the
|
|
list).
|
|
|
|
A dlist can be modified in constant time by removing a node with
|
|
``sys_dlist_remove()``, by adding a node to the head or tail of a list
|
|
with ``sys_dlist_prepend()`` and ``sys_dlist_append()``, or by
|
|
inserting a node before an existing node with ``sys_dlist_insert()``.
|
|
|
|
As for slist, each node in a dlist can be processed in a natural code
|
|
block style using ``SYS_DLIST_FOR_EACH_NODE()``. This macro also
|
|
exists in a "FROM_NODE" form which allows for iterating from a known
|
|
starting point, a "SAFE" variant that allows for removing the node
|
|
being inspected within the code block, a "CONTAINER" style that
|
|
provides the pointer to a containing struct instead of the raw node,
|
|
and a "CONTAINER_SAFE" variant that provides both properties.
|
|
|
|
Convenience utilities provided by dlist include
|
|
``sys_dlist_insert_at()``, which inserts a node that linearly searches
|
|
through a list to find the right insertion point, which is provided by
|
|
the user as a C callback function pointer, and
|
|
``sys_dlist_is_linked()``, which will affirmatively return whether or
|
|
not a node is currently linked into a dlist or not (via an
|
|
implementation that has zero overhead vs. the normal list processing).
|
|
|
|
Dlist Internals
|
|
---------------
|
|
|
|
Internally, the dlist implementation is minimal: the ``sys_dlist_t``
|
|
struct contains "head" and "tail" pointer fields, the ``sys_dnode_t``
|
|
contains "prev" and "next" pointers, and no other data is stored. But
|
|
in practice the two structs are internally identical, and the list
|
|
struct is inserted as a node into the list itself. This allows for a
|
|
very clean symmetry of operations:
|
|
|
|
* An empty list has backpointers to itself in the list struct, which
|
|
can be trivially detected.
|
|
|
|
* The head and tail of the list can be detected by comparing the
|
|
prev/next pointers of a node vs. the list struct address.
|
|
|
|
* An insertion or deletion never needs to check for the special case
|
|
of inserting at the head or tail. There are never any NULL pointers
|
|
within the list to be avoided. Exactly the same operations are run,
|
|
without tests or branches, for all list modification primitives.
|
|
|
|
Effectively, a dlist of N nodes can be thought of as a "ring" of "N+1"
|
|
nodes, where one node represents the list tracking struct.
|
|
|
|
.. figure:: dlist.png
|
|
:align: center
|
|
:alt: dlist example
|
|
:figclass: align-center
|
|
|
|
A dlist containing three elements. Note that the list struct
|
|
appears as a fourth "element" in the list.
|
|
|
|
.. figure:: dlist-single.png
|
|
:align: center
|
|
:alt: single-element dlist example
|
|
:figclass: align-center
|
|
|
|
An dlist containing just one element.
|
|
|
|
.. figure:: dlist-empty.png
|
|
:align: center
|
|
:alt: dlist example
|
|
:figclass: align-center
|
|
|
|
An empty dlist.
|
|
|
|
Balanced Red/Black Tree
|
|
=======================
|
|
|
|
For circumstances where sorted containers may become large at runtime,
|
|
a list becomes problematic due to algorithmic costs of searching it.
|
|
For these situations, Zephyr provides a balanced tree implementation
|
|
which has runtimes on search and removal operations bounded at
|
|
O(log2(N)) for a tree of size N. This is implemented using a
|
|
conventional red/black tree as described by multiple academic sources.
|
|
|
|
The ``struct rbtree`` tracking struct for a rbtree may be initialized
|
|
anywhere in user accessible memory. It should contain only zero bits
|
|
before first use. No specific initialization API is needed or
|
|
required.
|
|
|
|
Unlike a list, where position is explicit, the ordering of nodes
|
|
within an rbtree must be provided as a predicate function by the user.
|
|
A function of type ``rb_lessthan_t()`` should be assigned to the
|
|
``lessthan_fn`` field of the ``struct rbtree`` before any tree
|
|
operations are attempted. This function should, as its name suggests,
|
|
return a boolean True value if the first node argument is "less than"
|
|
the second in the ordering desired by the tree. Note that "equal" is
|
|
not allowed, nodes within a tree must have a single fixed order for
|
|
the algorithm to work correctly.
|
|
|
|
As with the slist and dlist containers, nodes within an rbtree are
|
|
represented as a ``struct rbnode`` structure which exists in
|
|
user-managed memory, typically embedded within the the data structure
|
|
being tracked in the tree. Unlike the list code, the data within an
|
|
rbnode is entirely opaque. It is not possible for the user to extract
|
|
the binary tree topology and "manually" traverse the tree as it is for
|
|
a list.
|
|
|
|
Nodes can be inserted into a tree with ``rb_insert()`` and removed
|
|
with ``rb_remove()``. Access to the "first" and "last" nodes within a
|
|
tree (in the sense of the order defined by the comparison function) is
|
|
provided by ``rb_get_min()`` and ``rb_get_max()``. There is also a
|
|
predicate, ``rb_contains()``, which returns a boolean True if the
|
|
provided node pointer exists as an element within the tree. As
|
|
described above, all of these routines are guaranteed to have at most
|
|
log time complexity in the size of the tree.
|
|
|
|
There are two mechanisms provided for enumerating all elements in an
|
|
rbtree. The first, ``rb_walk()``, is a simple callback implementation
|
|
where the caller specifies a C function pointer and an untyped
|
|
argument to be passed to it, and the tree code calls that function for
|
|
each node in order. This has the advantage of a very simple
|
|
implementation, at the cost of a somewhat more cumbersome API for the
|
|
user (not unlike ISO C's ``bsearch()`` routine). It is a recursive
|
|
implementation, however, and is thus not always available in
|
|
environments that forbid the use of unbounded stack techniques like
|
|
recursion.
|
|
|
|
There is also a ``RB_FOR_EACH()`` iterator provided, which, like the
|
|
similar APIs for the lists, works to iterate over a list in a more
|
|
natural way, using a nested code block instead of a callback. It is
|
|
also nonrecursive, though it requires log-sized space on the stack by
|
|
default (however, this can be configured to use a fixed/maximally size
|
|
buffer instead where needed to avoid the dynamic allocation). As with
|
|
the lists, this is also available in a ``RB_FOR_EACH_CONTAINER()``
|
|
variant which enumerates using a pointer to a container field and not
|
|
the raw node pointer.
|
|
|
|
Tree Internals
|
|
--------------
|
|
|
|
As described, the Zephyr rbtree implementation is a conventional
|
|
red/black tree as described pervasively in academic sources. Low
|
|
level details about the algorithm are out of scope for this document,
|
|
as they match existing conventions. This discussion will be limited
|
|
to details notable or specific to the Zephyr implementation.
|
|
|
|
The core invariant guaranteed by the tree is that the path from the root of
|
|
the tree to any leaf is no more than twice as long as the path to any
|
|
other leaf. This is achieved by associating one bit of "color" with
|
|
each node, either red or black, and enforcing a rule that no red child
|
|
can be a child of another red child (i.e. that the number of black
|
|
nodes on any path to the root must be the same, and that no more than
|
|
that number of "extra" red nodes may be present). This rule is
|
|
enforced by a set of rotation rules used to "fix" trees following
|
|
modification.
|
|
|
|
.. figure:: rbtree.png
|
|
:align: center
|
|
:alt: rbtree example
|
|
:figclass: align-center
|
|
|
|
A maximally unbalanced rbtree with a black height of two. No more
|
|
nodes can be added underneath the rightmost node without
|
|
rebalancing.
|
|
|
|
These rotations are conceptually implemented on top of a primitive
|
|
that "swaps" the position of one node with another in the list.
|
|
Typical implementations effect this by simply swapping the nodes
|
|
internal "data" pointers, but because the Zephyr ``struct rbnode`` is
|
|
intrusive, that cannot work. Zephyr must include somewhat more
|
|
elaborate code to handle the edge cases (for example, one swapped node
|
|
can be the root, or the two may already be parent/child).
|
|
|
|
The ``struct rbnode`` struct for a Zephyr rbtree contains only two
|
|
pointers, representing the "left", and "right" children of a node
|
|
within the binary tree. Traversal of a tree for rebalancing following
|
|
modification, however, routinely requires the ability to iterate
|
|
"upwards" from a node as well. It is very common for red/black trees
|
|
in the industry to store a third "parent" pointer for this purpose.
|
|
Zephyr avoids this requirement by building a "stack" of node pointers
|
|
locally as it traverses downward thorugh the tree and updating it
|
|
appropriately as modifications are made. So a Zephyr rbtree can be
|
|
implemented with no more runtime storage overhead than a dlist.
|
|
|
|
These properties, of a balanced tree data structure that works with
|
|
only two pointers of data per node and that works without any need for
|
|
a memory allocation API, are quite rare in the industry and are
|
|
somewhat unique to Zephyr.
|
|
|
|
Ring Buffer
|
|
===========
|
|
|
|
For circumstances where an application needs to implement asynchronous
|
|
"streaming" copying of data, Zephyr provides a ``struct ring_buf``
|
|
abstraction to manage copies of such data in and out of a shared
|
|
buffer of memory. Ring buffers may be used in either "bytes" mode,
|
|
where the data to be streamed is an uninterpreted array of bytes, or
|
|
"items" mode where the data much be an integral number of 32 bit
|
|
words. While the underlying data structure is the same, it is not
|
|
legal to mix these two modes on a single ring buffer instance. A ring
|
|
buffer initialized with a byte count must be used only with the
|
|
"bytes" API, one initialized with a word count must use the "items"
|
|
calls.
|
|
|
|
A ``struct ring_buf`` may be placed anywhere in user-accessible
|
|
memory, and must be initialized with ``ring_buf_init()`` before use.
|
|
This must be provided a region of user-controlled memory for use as
|
|
the buffer itself. Note carefully that the units of the size of the
|
|
buffer passed change (either bytes or words) depending on how the ring
|
|
buffer will be used later. Macros for combining these steps in a
|
|
single static declaration exist for convenience.
|
|
``RING_BUF_DECLARE()`` will declare and statically initialize a ring
|
|
buffer with a specified byte count, where
|
|
``RING_BUF_ITEM_DECLARE_SIZE()`` will declare and statically
|
|
initialize a buffer with a given count of 32 bit words.
|
|
``RING_BUF_ITEM_DECLARE_POW2()`` can be used to initialize an
|
|
items-mode buffer with a memory region guaranteed to be a power of
|
|
two, which enables various optimizations internal to the
|
|
implementation. No power-of-two initialization is available for
|
|
bytes-mode ring buffers.
|
|
|
|
"Bytes" data may be copied into the ring buffer using
|
|
``ring_buf_put()``, passing a data pointer and byte count. These
|
|
bytes will be copied into the buffer in order, as many as will fit in
|
|
the allocated buffer. The total number of bytes copied (which may be
|
|
fewer than provided) will be returned. Likewise ``ring_buf_get()``
|
|
will copy bytes out of the ring buffer in the order that they were
|
|
written, into a user-provided buffer, returning the number of bytes
|
|
that were transferred.
|
|
|
|
To avoid multiply-copied-data situations, a "claim" API exists for
|
|
byte mode. ``ring_buf_put_claim()`` takes a byte size value from the
|
|
user and returns a pointer to memory internal to the ring buffer that
|
|
can be used to receive those bytes, along with a size of the
|
|
contiguous internal region (which may be smaller than requested). The
|
|
user can then copy data into that region at a later time without
|
|
assembling all the bytes in a single region first. When complete,
|
|
``ring_buf_put_finish()`` can be used to signal the buffer that the
|
|
transfer is complete, passing the number of bytes actually
|
|
transferred. At this point a new transfer can be initiated.
|
|
Similarly, ``ring_buf_get_claim()`` returns a pointer to internal ring
|
|
buffer data from which the user can read without making a verbatim
|
|
copy, and ``ring_buf_get_finish()`` signals the buffer with how many
|
|
bytes have been consumed and allows for a new transfer to begin.
|
|
|
|
"Items" mode works similarly to bytes mode, except that all transfers
|
|
are in units of 32 bit words and all memory is assumed to be aligned
|
|
on 32 bit boundaries. The write and read operations are
|
|
``ring_buf_item_put()`` and ``ring_buf_item_get()``, and work
|
|
otherwise identically to the bytes mode APIs. There no "claim" API
|
|
provided for items mode. One important difference is that unlike
|
|
``ring_buf_put()``, ``ring_buf_item_put()`` will not do a partial
|
|
transfer; it will return an error in the case where the provided data
|
|
does not fit in its entirety.
|
|
|
|
The user can manage the capacity of a ring buffer without modifying it
|
|
using the ``ring_buf_space_get()`` call (which returns a value of
|
|
either bytes or items depending on how the ring buffer has been used),
|
|
or by testing the ``ring_buf_is_empty()`` predicate.
|
|
|
|
Finally, a ``ring_buf_reset()`` call exists to immediately empty a
|
|
ring buffer, discarding the tracking of any bytes or items already
|
|
written to the buffer. It does not modify the memory contents of the
|
|
buffer itself, however.
|
|
|
|
Ring Buffer Internals
|
|
---------------------
|
|
|
|
Data streamed through a ring buffer is always written to the next byte
|
|
within the buffer, wrapping around to the first element after reaching
|
|
the end, thus the "ring" structure. Internally, the ``struct
|
|
ring_buf`` contains its own buffer pointer and its size, and also a
|
|
"head" and "tail" index representing where the next read and write
|
|
|
|
This boundary is invisible to the user using the normal put/get APIs,
|
|
but becomes a barrier to the "claim" API, because obviously no
|
|
contiguous region can be returned that crosses the end of the buffer.
|
|
This can be surprising to application code, and produce performance
|
|
artifacts when transfers need to alias closely to the size of the
|
|
buffer, as the number of calls to claim/finish need to double for such
|
|
transfers.
|
|
|
|
When running in items mode (only), the ring buffer contains two
|
|
implementations for the modular arithmetic required to compute "next
|
|
element" offsets. One is used for arbitrary sized buffers, but the
|
|
other is optimized for power of two sizes and can replace the compare
|
|
and subtract steps with a simple bitmask in several places, at the
|
|
cost of testing the "mask" value for each call.
|