457 lines
21 KiB
ReStructuredText
457 lines
21 KiB
ReStructuredText
.. _sw_design_guidelines:
|
|
|
|
Software Design Guidelines
|
|
##########################
|
|
|
|
Error Detection and Error Handling
|
|
**********************************
|
|
|
|
Workflow
|
|
========
|
|
|
|
Error detection and error handling workflow in the ACRN hypervisor is shown in
|
|
:numref:`work_flow_of_error_detection_and_error_handling`.
|
|
|
|
.. figure:: images/work_flow_of_error_detection_and_error_handling.png
|
|
:align: center
|
|
:name: work_flow_of_error_detection_and_error_handling
|
|
|
|
Error Detection and Error Handling Workflow
|
|
|
|
|
|
Design Assumption
|
|
=================
|
|
|
|
There are three types of design assumptions in the ACRN hypervisor, as shown
|
|
below:
|
|
|
|
**Pre-condition**
|
|
Pre-conditions shall be defined right before the definition/declaration of
|
|
the corresponding function in the C source file or header file.
|
|
All pre-conditions shall be guaranteed by the caller of the function.
|
|
Error checking of the pre-conditions are not needed in release version of the
|
|
function. Developers could use ASSERT to catch design errors in a debug
|
|
version for some cases. Verification of the hypervisor shall check whether
|
|
each caller guarantees all pre-conditions of the callee (or not).
|
|
|
|
This design assumption applies to the following cases:
|
|
|
|
- Input parameters of the function.
|
|
- Global state, such as hypervisor operation mode.
|
|
|
|
**Post-condition**
|
|
Post-conditions shall be defined right before the definition/declaration of
|
|
the corresponding function in the C source file or header file.
|
|
All post-conditions shall be guaranteed by the function. All callers of the
|
|
function should trust these post-conditions are met.
|
|
Error checking of the post-conditions are not needed in release version of
|
|
each caller. Developers could use ASSERT to catch design errors in a debug
|
|
version for some cases. Verification of the hypervisor shall check whether the
|
|
function guarantees all post-conditions (or not).
|
|
|
|
This design assumption applies to the following case:
|
|
|
|
- Return value of the function
|
|
|
|
It is used to guarantee that the return value is valid, such as the return
|
|
pointer is not NULL, the return value is within a valid range, or the
|
|
members of the return structure are valid.
|
|
|
|
|
|
**Application Constraints**
|
|
Application constraints of the hypervisor shall be defined in design document
|
|
and safety manual.
|
|
All application constraints shall be guaranteed by external safety
|
|
applications, such as Board Support Package, firmware, safety VM, or Hardware.
|
|
The verification of application integration shall check whether the safety
|
|
application meets all application constraints.
|
|
|
|
This design assumption applies to the following cases:
|
|
|
|
- Configuration data defined by external safety application, such as physical
|
|
PCI device information specific for each board design.
|
|
|
|
- Input data which is only specified by external safety application.
|
|
|
|
|
|
Architecture Level
|
|
==================
|
|
|
|
Functional Safety Consideration
|
|
-------------------------------
|
|
|
|
The hypervisor will do range check in hypercalls and HW capability checks
|
|
according to Table A.2 of FuSa Standards [IEC_61508-3_2010]_ .
|
|
|
|
Error Handling Methods
|
|
----------------------
|
|
|
|
The error handling methods used in the ACRN hypervisor on an architecture level
|
|
are shown below.
|
|
|
|
**Invoke default fatal error handler**
|
|
The hypervisor shall invoke the default fatal error handler when the below
|
|
cases occur. Customers can define platform-specific handlers, allowing them to
|
|
implement additional error reporting (mostly to hardware) if required. The
|
|
default fatal error handler will invoke platform-specific handlers defined by
|
|
users at first, then it will panic the system.
|
|
|
|
This method applies to the following cases:
|
|
|
|
- Related hardware resources are unavailable.
|
|
- Boot information is invalid during platform initialization.
|
|
- Unexpected exception occurs in root mode due to hardware failures.
|
|
- Failures occur in the VM dedicated for error handling.
|
|
|
|
**Return error code**
|
|
The hypervisor shall return an error code to the VM when the below cases
|
|
occur. The error code shall indicate the error type detected (e.g. invalid
|
|
parameter, device not found, device busy, resource unavailable, etc).
|
|
|
|
This method applies to the following case:
|
|
|
|
- The hypercall parameter from the VM is invalid.
|
|
|
|
**Inform the safety VM through specific register or memory area**
|
|
The hypervisor shall inform the safety VM through a specific register or
|
|
memory area when the below cases occur. The VM will decide how to handle the
|
|
related error. This shall only be done after the VM (Safety OS or Service OS)
|
|
dedicated to error handling has started.
|
|
|
|
This method applies to the following cases:
|
|
|
|
- Machine check errors occur due to hardware failures.
|
|
|
|
- Unexpected VM entry failures occur, where the VM is not the one dedicated
|
|
for error handling.
|
|
|
|
**Panic the system via ASSERT**
|
|
The hypervisor can panic the system when the below cases occur. It shall
|
|
only be used for debug and used to check pre-conditions and post-conditions
|
|
to catch design errors.
|
|
|
|
This method applies to the following case:
|
|
|
|
- Software design errors occur.
|
|
|
|
|
|
Rules of Error Detection and Error Handling
|
|
-------------------------------------------
|
|
|
|
The rules of error detection and error handling on an architecture level are
|
|
shown in :numref:`rules_arch_level` below.
|
|
|
|
.. table:: Rules of Error Detection and Error Handling on Architecture Level
|
|
:align: center
|
|
:widths: auto
|
|
:name: rules_arch_level
|
|
|
|
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
|
| Resource Class | Failure Mode | Error | Error Handling Policy | Example |
|
|
| | | Detection | | |
|
|
| | | via | | |
|
|
| | | Hypervisor | | |
|
|
+====================+=========================+==============+===========================+=========================+
|
|
| External resource | Invalid register/memory | Yes | Follow SDM strictly, or | Unsupported MSR |
|
|
| provided by VM | state on VM exit | | state any deviation to the| or invalid CPU ID |
|
|
| | | | document explicitly | |
|
|
| +-------------------------+--------------+---------------------------+-------------------------+
|
|
| | Invalid hypercall | Yes | The hypervisor shall | Invalid hypercall |
|
|
| | parameter | | return related error code | parameter provided by |
|
|
| | | | to the VM | any VM |
|
|
| +-------------------------+--------------+---------------------------+-------------------------+
|
|
| | Invalid data in the | Yes | Case by case depending | Invalid data in memory |
|
|
| | sharing memory area | | on the data | shared with all VMs, |
|
|
| | | | | such as IO request |
|
|
| | | | | buffers and sbuf for |
|
|
| | | | | debug |
|
|
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
|
| External resource | Invalid E820 table or | Yes | The hypervisor shall | Invalid E820 table or |
|
|
| provided by | invalid boot information| | panic during platform | invalid boot information|
|
|
| bootloader | | | initialization | |
|
|
| (UEFI or SBL) | | | | |
|
|
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
|
| Physical resource | 1GB page is not | Yes | The hypervisor shall | 1GB page is not |
|
|
| used by the | available on the | | panic during platform | available on the |
|
|
| hypervisor | platform or invalid | | initialization | platform or invalid |
|
|
| | physical CPU ID | | | physical CPU ID |
|
|
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
|
|
|
|
|
Examples
|
|
--------
|
|
|
|
Here is an example to illustrate when error handling codes are required on
|
|
an architecture level.
|
|
|
|
There are two pre-condition statements of ``vcpu_from_vid``. It indicates that
|
|
it's the caller's responsibility to guarantee these pre-conditions.
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* @pre vcpu_id < CONFIG_MAX_VCPUS_PER_VM
|
|
* @pre &(vm->hw.vcpu_array[vcpu_id])->state != VCPU_OFFLINE
|
|
*/
|
|
static inline struct acrn_vcpu *vcpu_from_vid(struct acrn_vm *vm, uint16_t vcpu_id)
|
|
{
|
|
return &(vm->hw.vcpu_array[vcpu_id]);
|
|
}
|
|
|
|
``vcpu_from_vid`` is called by ``hcall_set_vcpu_regs``, which is a hypercall.
|
|
``hcall_set_vcpu_regs`` is an external interface and ``vcpu_id`` is provided by
|
|
VM. In this case, we shall add the error checking codes before calling
|
|
``vcpu_from_vid`` to make sure that the passed parameters are valid and the
|
|
pre-conditions are guaranteed.
|
|
|
|
Here is the sample codes for error checking before calling ``vcpu_from_vid``:
|
|
|
|
.. code-block:: c
|
|
|
|
status = 0;
|
|
|
|
if (vcpu_id >= CONFIG_MAX_VCPUS_PER_VM) {
|
|
pr_err("vcpu id is out of range \r\n");
|
|
status = -EINVAL;
|
|
} else if ((&(vm->hw.vcpu_array[vcpu_id]))->state == VCPU_OFFLINE) {
|
|
pr_err("vcpu is offline \r\n");
|
|
status = -EINVAL;
|
|
}
|
|
|
|
if (status == 0) {
|
|
vcpu = vcpu_from_vid(vm, vcpu_id);
|
|
...
|
|
}
|
|
|
|
|
|
Module Level
|
|
============
|
|
|
|
Functional Safety Consideration
|
|
-------------------------------
|
|
|
|
Data verification, and explicit specification of pre-conditions and post-conditions
|
|
are applied for internal functions of the hypervisor according to Table A.4 of
|
|
FuSa Standards [IEC_61508-3_2010]_ .
|
|
|
|
Error Handling Methods
|
|
----------------------
|
|
|
|
The error handling methods used in the ACRN hypervisor on a module level are
|
|
shown below.
|
|
|
|
**Panic the system via ASSERT**
|
|
The hypervisor can panic the system when the below cases occur. It shall
|
|
only be used for debugging, used to check pre-conditions and post-conditions
|
|
to catch design errors.
|
|
|
|
This method applies to the following case:
|
|
|
|
- Software design errors occur.
|
|
|
|
|
|
Rules of Error Detection and Error Handling
|
|
-------------------------------------------
|
|
|
|
The rules of error detection and error handling on a module level are shown in
|
|
:numref:`rules_module_level` below.
|
|
|
|
.. table:: Rules of Error Detection and Error Handling on Module Level
|
|
:align: center
|
|
:widths: auto
|
|
:name: rules_module_level
|
|
|
|
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
|
| Resource Class | Failure | Error Detection via | Error Handling Policy | Example |
|
|
| | Mode | Hypervisor | | |
|
|
+====================+===========+============================+===========================+=========================+
|
|
| Internal data of | N/A | Partial. | The hypervisor shall use | virtual PCI device |
|
|
| the hypervisor | | The related pre-conditions | the internal resource/data| information, defined |
|
|
| | | are required. | directly. | with array 'pci_vdevs[]'|
|
|
| | | The design will guarantee | | through static |
|
|
| | | the correctness and the | | allocation. |
|
|
| | | test cases will verify the | | |
|
|
| | | related pre-conditions. | | |
|
|
| | | If the design can not | | |
|
|
| | | guarantee the correctness, | | |
|
|
| | | the related error handling | | |
|
|
| | | codes need to be added. | | |
|
|
| | | Note: Some examples of | | |
|
|
| | | pre-conditions are listed, | | |
|
|
| | | like non-empty array, valid| | |
|
|
| | | array size and non-null | | |
|
|
| | | pointer. | | |
|
|
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
|
| Configuration data | Corrupted | No. | The bootloader initializes| 'vm_config->pci_ptdevs' |
|
|
| of the VM | VM config | The related pre-conditions | hypervisor (including | is configured |
|
|
| | | are required. | code, data, and bss) and | statically. |
|
|
| | | Note: VM configuration data| verifies the integrity of | |
|
|
| | | are auto generated based on| hypervisor image in which | |
|
|
| | | different board configs, | VM configurations are. | |
|
|
| | | they are defined | Thus hypervisor does not | |
|
|
| | | as static structure. | need any additional | |
|
|
| | | | mechanism. | |
|
|
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
|
| Configuration data | N/A | No. | The hypervisor shall use | The maximum number of |
|
|
| of the hypervisor | | The related pre-conditions | the internal resource/data| PCI devices in the VM, |
|
|
| | | are required. | directly. | defined with |
|
|
| | | The design will guarantee | | CONFIG_MAX_PCI_DEV_NUM |
|
|
| | | the correctness and this | | through configuration. |
|
|
| | | shall be verified manually.| | |
|
|
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
|
|
|
|
|
Examples
|
|
--------
|
|
|
|
Here are some examples to illustrate when error handling codes are required on
|
|
a module level.
|
|
|
|
**Example_1: Analyze the function 'partition_mode_vpci_init'**
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* @pre vm != NULL
|
|
* @pre vm->vpci->pci_vdev_cnt <= CONFIG_MAX_PCI_DEV_NUM
|
|
*/
|
|
static int32_t partition_mode_vpci_init(const struct acrn_vm *vm)
|
|
{
|
|
struct acrn_vpci *vpci = (struct acrn_vpci *)&(vm->vpci);
|
|
struct pci_vdev *vdev;
|
|
struct acrn_vm_config *vm_config = get_vm_config(vm->vm_id);
|
|
struct acrn_vm_pci_ptdev_config *ptdev_config;
|
|
uint32_t i;
|
|
|
|
vpci->pci_vdev_cnt = vm_config->pci_ptdev_num;
|
|
|
|
for (i = 0U; i < vpci->pci_vdev_cnt; i++) {
|
|
vdev = &vpci->pci_vdevs[i];
|
|
vdev->vpci = vpci;
|
|
ptdev_config = &vm_config->pci_ptdevs[i];
|
|
vdev->vbdf.value = ptdev_config->vbdf.value;
|
|
|
|
if (vdev->vbdf.value != 0U) {
|
|
partition_mode_pdev_init(vdev, ptdev_config->pbdf);
|
|
vdev->ops = &pci_ops_vdev_pt;
|
|
} else {
|
|
vdev->ops = &pci_ops_vdev_hostbridge;
|
|
}
|
|
|
|
if (vdev->ops->init != NULL) {
|
|
if (vdev->ops->init(vdev) != 0) {
|
|
pr_err("%s() failed at PCI device (vbdf %x)!",
|
|
__func__, vdev->vbdf);
|
|
}
|
|
}
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
``get_vm_config`` is called by ``partition_mode_vpci_init``.
|
|
There are one pre-condition and two post-conditions of ``get_vm_config``.
|
|
It indicates that the caller of ``get_vm_config`` shall guarantee these
|
|
pre-conditions and ``get_vm_config`` itself shall guarantee the post-condition.
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* @pre vm_id < CONFIG_MAX_VM_NUM
|
|
* @post retval != NULL
|
|
* @post retval->pci_ptdev_num <= MAX_PCI_DEV_NUM
|
|
*/
|
|
struct acrn_vm_config *get_vm_config(uint16_t vm_id)
|
|
{
|
|
return &vm_configs[vm_id];
|
|
}
|
|
|
|
**Question_1: Is error checking required for 'vm_config'?**
|
|
|
|
No. Because 'vm_config' is getting data from ``get_vm_config`` and the
|
|
post-condition of ``get_vm_config`` guarantees that the return value is not NULL.
|
|
|
|
|
|
**Question_2: Is error checking required for 'vdev'?**
|
|
|
|
No. Here are the reasons:
|
|
|
|
a) The pre-condition of ``partition_mode_vpci_init`` guarantees that 'vm' is not
|
|
NULL. It indicates that 'vpci' is not NULL. Since 'vdev' is getting data from
|
|
the array 'pci_vdevs[]' via indexing, 'vdev' is not NULL as long as the index
|
|
is valid.
|
|
|
|
b) The post-condition of ``get_vm_config`` guarantees that 'vpci->pci_vdev_cnt'
|
|
is less than or equal to 'CONFIG_MAX_PCI_DEV_NUM', which is the array size of
|
|
'pci_vdevs[]'. It indicates that the index used to get 'vdev' is always
|
|
valid.
|
|
|
|
Given the two reasons above, 'vdev' is always not NULL. So, the error checking
|
|
codes are not required for 'vdev'.
|
|
|
|
|
|
**Question_3: Is error checking required for 'ptdev_config'?**
|
|
|
|
No. 'ptdev_config' is getting data from the array 'pci_vdevs[]', which is the
|
|
physical PCI device information coming from Board Support Package and firmware.
|
|
For physical PCI device information, the related application constraints
|
|
shall be defined in the design document or safety manual. For debug purpose,
|
|
developers could use ASSERT here to catch the Board Support Package or firmware
|
|
failures, which does not guarantee these application constraints.
|
|
|
|
|
|
**Question_4: Is error checking required for 'vdev->ops->init'?**
|
|
|
|
No. Here are the reasons:
|
|
|
|
a) Question_2 proves that 'vdev' is always not NULL.
|
|
|
|
b) 'vdev->ops' is fully initialized before 'vdev->ops->init' is called.
|
|
|
|
Given the two reasons above, 'vdev->ops->init' is always not NULL. So, the error
|
|
checking codes are not required for 'vdev->ops->init'.
|
|
|
|
|
|
**Question_5: How to handle the case when 'vdev->ops->init(vdev)' returns non-zero?**
|
|
|
|
This case indicates that the initialization of specific virtual device fails.
|
|
Investigation has to be done to figure out the root-cause. Default fatal error
|
|
handler shall be invoked here if it is caused by a hardware failure or invalid
|
|
boot information.
|
|
|
|
|
|
**Example_2: Analyze the function 'partition_mode_vpci_deinit'**
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* @pre vdev != NULL
|
|
* @pre vm->vpci->pci_vdev_cnt <= CONFIG_MAX_PCI_DEV_NUM
|
|
*/
|
|
static void partition_mode_vpci_deinit(const struct acrn_vm *vm)
|
|
{
|
|
struct pci_vdev *vdev;
|
|
uint32_t i;
|
|
|
|
for (i = 0U; i < vm->vpci.pci_vdev_cnt; i++) {
|
|
vdev = (struct pci_vdev *) &(vm->vpci.pci_vdevs[i]);
|
|
if ((vdev->ops != NULL) && (vdev->ops->deinit != NULL)) {
|
|
if (vdev->ops->deinit(vdev) != 0) {
|
|
pr_err("vdev->ops->deinit failed!");
|
|
}
|
|
}
|
|
/* TODO: implement the deinit of 'vdev->ops' */
|
|
}
|
|
}
|
|
|
|
|
|
**Question_6: Is error checking required for 'vdev->ops' and 'vdev->ops->init'?**
|
|
|
|
Yes. Because 'vdev->ops' and 'vdev->ops->init' can not be guaranteed to be
|
|
not NULL. If the VM called ``partition_mode_vpci_deinit`` twice, it may be NULL.
|
|
|
|
References
|
|
**********
|
|
|
|
.. [IEC_61508-3_2010] IEC 61508-3:2010, Functional safety of electrical/electronic/programmable electronic safety-related systems - Part 3: Software requirements
|