265 lines
14 KiB
ReStructuredText
265 lines
14 KiB
ReStructuredText
.. _safety_overview:
|
|
|
|
Zephyr Safety Overview
|
|
########################
|
|
|
|
Introduction
|
|
************
|
|
|
|
This document is the safety documentation providing an overview over the safety-relevant activities
|
|
and what the Zephyr Project and the Zephyr Safety Working Group / Committee try to achieve.
|
|
|
|
This overview is provided for people who are interested in the functional safety development part
|
|
of the Zephyr RTOS and project members who want to contribute to the safety aspects of the
|
|
project.
|
|
|
|
Overview
|
|
********
|
|
|
|
In this section we give the reader an overview of what the general goal of the safety certification
|
|
is, what standard we aim to achieve and what quality standards and processes need to be implemented
|
|
to reach such a safety certification.
|
|
|
|
Safety Document update
|
|
**********************
|
|
|
|
This document is a living document and may evolve over time as new requirements, guidelines, or
|
|
processes are introduced.
|
|
|
|
#. Changes will be submitted from the interested party(ies) via pull requests to the Zephyr
|
|
documentation repository.
|
|
|
|
#. The Zephyr Safety Committee will review these changes and provide feedback or acceptance of
|
|
the changes.
|
|
|
|
#. Once accepted, these changes will become part of the document.
|
|
|
|
General safety scope
|
|
********************
|
|
|
|
The general scope of the Safety Committee is to achieve a certification for the `IEC 61508
|
|
<https://en.wikipedia.org/wiki/IEC_61508>`__ standard and the Safety Integrity Level (SIL) 3 /
|
|
Systematic Capability (SC) 3 for a limited source scope (see certification scope TBD). Since the
|
|
code base is pre-existing, we use the route 3s/1s approach defined by the IEC 61508 standard.
|
|
|
|
Route 3s
|
|
*Assessment of non-compliant development. Which is basically the route 1s with existing
|
|
sources.*
|
|
|
|
Route 1s
|
|
*Compliant development. Compliance with the requirements of this standard for the avoidance and
|
|
control of systematic faults in software.*
|
|
|
|
Summarization IEC 61508 standard
|
|
================================
|
|
|
|
The IEC 61508 standard is a widely recognized international standard for functional safety of
|
|
electrical, electronic, and programmable electronic safety-related systems. Here's an overview of
|
|
some of the key safety aspects of the standard:
|
|
|
|
#. **Hazard and Risk Analysis**: The IEC 61508 standard requires a thorough analysis of potential
|
|
hazards and risks associated with a system in order to determine the appropriate level of safety
|
|
measures needed to reduce those risks to acceptable levels.
|
|
|
|
#. **Safety Integrity Level (SIL)**: The standard introduces the concept of Safety Integrity Level
|
|
(SIL) to classify the level of risk reduction required for each safety function. The higher the
|
|
SIL, the greater the level of risk reduction required.
|
|
|
|
#. **System Design**: The IEC 61508 standard requires a systematic approach to system design that
|
|
includes the identification of safety requirements, the development of a safety plan, and the
|
|
use of appropriate safety techniques and measures to ensure that the system meets the required
|
|
SIL.
|
|
|
|
#. **Verification and Validation**: The standard requires rigorous testing and evaluation of the
|
|
safety-related system to ensure that it meets the specified SIL and other safety requirements.
|
|
This includes verification of the system design, validation of the system's functionality, and
|
|
ongoing monitoring and maintenance of the system.
|
|
|
|
#. **Documentation and Traceability**: The IEC 61508 standard requires a comprehensive
|
|
documentation process to ensure that all aspects of the safety-related system are fully
|
|
documented and that there is full traceability from the safety requirements to the final system
|
|
design and implementation.
|
|
|
|
Overall, the IEC 61508 standard provides a framework for the design, development, and
|
|
implementation of safety-related systems that aims to reduce the risk of accidents and improve
|
|
overall safety. By following the standard, organizations can ensure that their safety-related
|
|
systems are designed and implemented to the highest level of safety integrity.
|
|
|
|
Quality
|
|
*******
|
|
|
|
Quality is a mandatory expectation for software across the industry. The code base of the project
|
|
must achieve various software quality goals in order to be considered an auditable code base from a
|
|
safety perspective and to be usable for certification purposes. But software quality is not an
|
|
additional requirement caused by functional safety standards. Functional safety considers quality
|
|
as an existing pre-condition and therefore the "quality managed" status should be pursued for any
|
|
project regardless of the functional safety goals. The following list describes the quality goals
|
|
which need to be reached to achieve an auditable code base:
|
|
|
|
1. Basic software quality standards
|
|
|
|
a. :ref:`coding_guidelines` (including: static code analysis, coding style, etc.)
|
|
b. Requirements and requirements tracing
|
|
c. Test coverage
|
|
|
|
2. Software architecture design principles
|
|
|
|
a. Layered architecture model
|
|
b. Encapsulated components
|
|
c. Encapsulated single functionality (if not fitable and manageable in safety)
|
|
|
|
Basic software quality standards - Safety view
|
|
==============================================
|
|
|
|
In this chapter the Safety Committee describes why they need the above listed quality goals as
|
|
pre-condition and what needs to be done to achieve an auditable code base from the safety
|
|
perspective. Generally speaking, it can be said that all of these quality measures regarding safety
|
|
are used to minimize the error rate during code development.
|
|
|
|
Coding Guidelines
|
|
-----------------
|
|
|
|
The coding guidelines are the basis to a common understanding and a unified ruleset and development
|
|
style for industrial software products. For safety the coding guidelines are essential and have
|
|
another purpose beside the fact of a unified ruleset. It is also necessary to prove that the
|
|
developers follow a unified development style to prevent **systematic errors** in the process of
|
|
developing software and thus to minimize the overall **error rate** of the complete software
|
|
system.
|
|
|
|
Also the **IEC 61508 standard** sets a pre-condition and recommendation towards the use of coding
|
|
standards / guidelines to reduce likelihood of errors.
|
|
|
|
Requirements and requirements tracing
|
|
-------------------------------------
|
|
|
|
Requirements and requirement management are not only important for software development, but also
|
|
very important in terms of safety. On the one hand, this specifies and describes in detail and on a
|
|
technical level what the software should do, and on the other hand, it is an important and
|
|
necessary tool to verify whether the described functionality is implemented as expected. For this
|
|
purpose, tracing the requirements down to the code level is used. With the requirements management
|
|
and tracing in hand, it can now be verified whether the functionality has been tested and
|
|
implemented correctly, thus minimizing the systematic error rate.
|
|
|
|
Also the IEC 61508 standard highly recommends (which is like a must-have for the certification)
|
|
requirements and requirements tracing.
|
|
|
|
Test coverage
|
|
-------------
|
|
|
|
A high test coverage, in turn, is evidence of safety that the code conforms precisely to what it
|
|
was developed for and does not execute any unforeseen instructions. If the entire code is tested
|
|
and has a high (ideally 100%) test coverage, it has the additional advantage of quickly detecting
|
|
faulty changes and further minimizing the error rate. However, it must be noted that different
|
|
requirements apply to safety for test coverage, and various metrics must be considered, which are
|
|
prescribed by the IEC 61508 standard for the SIL 3 / SC3 target. The following must be fulfilled,
|
|
among other things:
|
|
|
|
* Structural test coverage (entry points) 100%
|
|
* Structural test coverage (statements) 100%
|
|
* Structural test coverage (branches) 100%
|
|
|
|
If the 100% cannot be reached (e.g. statement coverage of defensive code) that part needs to be
|
|
described and justified in the documentation.
|
|
|
|
Software architecture design principles
|
|
=======================================
|
|
|
|
To create and maintain a structured software product it is also necessary to consider individual
|
|
software architecture designs and implement them in accordance with safety standards because some
|
|
designs and implementations are not reasonable in safety, so that the overall software and code
|
|
base can be used as auditable code. However, most of these software architecture designs have
|
|
already been implemented in the Zephyr project and need to be verified by the Safety Committee /
|
|
Safety Working Group and the safety architect.
|
|
|
|
Layered architecture model
|
|
--------------------------
|
|
|
|
The **IEC 61508 standard** strongly recommends a modular approach to software architecture. This
|
|
approach has been pursued in the Zephyr project from the beginning with its layered architecture.
|
|
The idea behind this architecture is to organize modules or components with similar functionality
|
|
into layers. As a result, each layer can be assigned a specific role in the system. This model has
|
|
the advantage in safety that interfaces between different components and layers can be shown at a
|
|
very high level, and thus it can be determined which functionalities are safety-relevant and can be
|
|
limited. Furthermore, various analyses and documentations can be built on top of this architecture,
|
|
which are important for certification and the responsible certification body.
|
|
|
|
Encapsulated components
|
|
-----------------------
|
|
|
|
Encapsulated components are an essential part of the architecture design for safety at this point.
|
|
The most important aspect is the separation of safety-relevant components from non-safety-relevant
|
|
components, including their associated interfaces. This ensures that the components have no
|
|
**repercussions** on other components.
|
|
|
|
Encapsulated single functionality (if not reasonable and manageable in safety)
|
|
------------------------------------------------------------------------------
|
|
|
|
Another requirement for the overall system and software environment is that individual
|
|
functionalities can be disabled within components. This is because if a function is absolutely
|
|
unacceptable for safety (e.g. complete dynamic memory management), then these individual
|
|
functionalities should be able to be turned off. The Zephyr Project already offers such a
|
|
possibility through the use of Kconfig and its flexible configurability.
|
|
|
|
Processes and workflow
|
|
**********************
|
|
|
|
.. figure:: images/zephyr-safety-process.svg
|
|
:align: center
|
|
:alt: Safety process and workflow overview
|
|
:figclass: align-center
|
|
|
|
Safety process and workflow overview
|
|
|
|
The diagram describes the rough process defined by the Safety Committee to ensure safety in the
|
|
development of the Zephyr project. To ensure understanding, a few points need to be highlighted and
|
|
some details explained regarding the role of the safety architect and the role of the safety
|
|
committee in the whole process. The diagram only describes the paths that are possible when a
|
|
change is related to safety.
|
|
|
|
#. On the main branch, the safety scope of the project should be identified, which typically
|
|
represents a small subset of the entire code base. This subset should then be made auditable
|
|
during normal development on “main”, which means that special attention is paid to quality goals
|
|
(`Quality`_) and safety processes within this scope. The Safety Architect works alongside the
|
|
Technical Steering Committee (TSC) in this area, monitoring the development process to ensure
|
|
that the architecture meets the safety requirements.
|
|
|
|
#. At this point, the safety architect plays an increasingly important role. For PRs/issues that
|
|
fall within the safety scope, the safety architect should ideally be involved in the discussions
|
|
and decisions of minor changes in the safety scope to be able to react to safety-relevant
|
|
changes that are not conformant. If a pull request or issue introduces a significant and
|
|
influential change or improvement that requires extended discussion or decision-making, the
|
|
safety architect should bring it to the attention of the Safety Committee or the Technical
|
|
Steering Committee (TSC) as appropriate, so that they can make a decision on the best course of
|
|
action.
|
|
|
|
#. This section describes the certification side. At this point, the code base has to be in an
|
|
"auditable" state, and ideally no further changes should be necessary or made to the code base.
|
|
There is still a path from the main branch to this area. This is needed in case a serious bug or
|
|
important change is found or implemented on the main branch in the safety scope, after the LTS
|
|
and the auditable branch were created. In this case, the Safety Committee, together with the
|
|
safety architect, must decide whether this bug fix or change should be integrated into the LTS
|
|
so that the bug fix or change could also be integrated into the auditable branch. This
|
|
integration can take three forms: First either as only a code change or second as only an update
|
|
to the safety documentation or third as both.
|
|
|
|
#. This describes the necessary safety process required for certification itself. Here, the final
|
|
analyses, tests, and documents are created and conducted which must be created and conducted
|
|
during the certification, and which are prescribed by the certifying authority and the standard
|
|
being certified. If the certification body approves everything at this stage and the safety
|
|
process is completed, a safety release can be created and published.
|
|
|
|
#. This transition from the auditable branch to the main branch should only occur in exceptional
|
|
circumstances, specifically when something has been identified during the certification process
|
|
that needs to be quickly adapted on the “auditable” branch in order to obtain certification. In
|
|
order to prevent this issue from arising again during the next certification, there needs to be
|
|
a path to merge these changes back into the main branch so that they are not lost, and to have
|
|
them ready for the next certification if necessary.
|
|
|
|
.. important::
|
|
Safety should not block the project and minimize the room to grow in any way.
|
|
|
|
.. important::
|
|
**TODO:** Find and define ways, guidelines and processes which minimally impact the daily work
|
|
of the maintainers, reviewers and contributors and also the safety architect itself.
|
|
But which are also suitable for safety.
|