acrn-hypervisor/tools/acrn-crashlog/acrnprobe
Liu, Xinwu b30ba3db15 tools:acrn-crashlog: Detect and classify the crash in ACRN and kernel
Since ACRN has the capability to reboot and reboot reason is available
in SOS, acrnprobe could detect the crash of acrn and SOS kernel.

List of added crash types:

1. ACRNCRASH            - crashed in hypervisor, this detection depends on
                          files in /tmp/acrnlog_last(provided by acrnlog).
2. IPANIC               - crashed in SOS kernel, this detection depends on
                          pstore.
3. SWWDT_IPANIC         - crashed in SOS kernel and reboot reason is wdt.
4. HWWDT_UNHANDLE       - only recognize reboot reason is global, there is no
                          further clues that it's a SOS kernel crash or a
                          hypervisor crash.
5. SWWDT_UNHANDLE       - only recognize reboot reason is wdt, there is no
                          further clues that it's a SOS kernel crash or a
                          hypervisor crash.
6. UNKNOWN              - only recognize reboot reason is warm, there is no
                          further clues that it's a SOS kernel crash or a
                          hypervisor crash.

Signed-off-by: Liu, Xinwu <xinwu.liu@intel.com>
Acked-by: Chen Gang <gang.c.chen@intel.com>
2018-07-12 17:29:51 +08:00
..
include tools:acrn-crashlog: Detect and classify the crash in ACRN and kernel 2018-07-12 17:29:51 +08:00
Makefile build: Using id tool to get builder username 2018-06-29 11:55:03 +08:00
README.rst Documentation: fix incorrect link in acrn-probe documentation 2018-06-12 14:08:26 -07:00
android_events.c tools:acrn-crashlog: Improve the process of crash reclassify 2018-07-12 17:29:51 +08:00
channels.c tools:acrn-crashlog: Detect and classify the crash in ACRN and kernel 2018-07-12 17:29:51 +08:00
crash_reclassify.c tools:acrn-crashlog: Improve the process of crash reclassify 2018-07-12 17:29:51 +08:00
event_handler.c tools: acrn-crashlog: compile without telemetrics client 2018-05-23 21:21:51 +08:00
event_queue.c tools: acrn-crashlog: event queue operations for acrnprobe 2018-05-23 17:10:51 +08:00
history.c tools:acrn-crashlog: Improve the process of crash reclassify 2018-07-12 17:29:51 +08:00
load_conf.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
main.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
probeutils.c tools:acrn-crashlog: Detect and classify the crash in ACRN and kernel 2018-07-12 17:29:51 +08:00
property.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
sender.c tools:acrn-crashlog: Improve the process of crash reclassify 2018-07-12 17:29:51 +08:00
startupreason.c tools:acrn-crashlog: Get reboot reason in acrnprobe 2018-07-12 17:29:51 +08:00

README.rst

.. _acrnprobe_doc:

Acrnprobe
#########

Description
***********

The ``acrnprobe`` is a tool to detect all critical events on the platform and
collect specific information for them. The collected information would be saved
as logs. The log path would be delivered to `telemetrics-client`_ as a record if
telemetrics-client exists on the system. In this case ``acrnprobe`` works as a
*probe* of telemetrics-client. If telemetrics-client doesn't exist on the
system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/``
by default) to manage the crash and events records on the platform instead of
``telem_journal``. But in this case, the records can't be delivered to the
backend.

Usage
*****

The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic
options:

Specify a configuration file for ``acrnprobe``. If this option is unused,
``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION
PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_).

.. code-block:: console

   $ acrnprobe -c [configuration_path]

To see the version of ``acrnprobe``.

.. code-block:: console

   $ acrnprobe -V

Architecture
************

Syntax
======

- channel :
  Channel represents a way of detecting the system's events. There are 3
  channels:

  + oneshot: detect once while ``acrnprobe`` startup.
  + polling: run a detecting job with fixed time interval.
  + inotify: monitor the change of file or dir.

- event queue :
  There is a global queue to receive all events detected.
  Generally, events are enqueued in channel, and dequeued in event handler.

- event handler :
  Event handler is a thread to handle events detected by channel.
  It's awakened by an enqueued event.

- sender :
  The sender corresponds to an exit of event.
  There are two senders:

  + Crashlog is responsible for collecting logs and saving it locally.
  + Telemd is responsible for sending log records to telemetrics client.

Description
===========

As a log collection mechanism to record critical events on the platform,
``acrnprobe`` provides these functions:

1. detect event

   From experience, the occurrence of an system event is usually accompanied
   by some effects. The effects could be a generated file, an error message in
   kernel's log, or a system reboot. To get these effects, for some of them we
   can monitor a directory, for other of them we might need to do a detection
   in a time loop.
   *So we implement the channel, which represents a common method of detection.*

2. analyze event and determine the event type

   Generally, a specific effect correspond to a particular type of events.
   However, it is the icing on the cake for analyzing the detailed event types
   according to some phenomena. *Crash reclassify is implemented for this
   purpose.*

3. collect information for detected events

   This is for debug purpose. Events without information are meaningless,
   and developers need to use this information to improve their system. *Sender
   crashlog is implemented for this purpose.*

4. archive these information as logs, and generate records

   There must be a central place to tell user what happened in system.
   *Sender telemd is implemented for this purpose.*

Diagram
=======
::

 +---------------------------------------------+
 | channel:   |oneshot|  |polling|   |inotify| |
 +--------------------------------------+------+
                                        |
 +---------------------+    +-----+     |
 | event queue         +<---+event+<----+
 +-+-------------------+    +-----+
   |
   v
 +-+---------------------------------------------------------------------------+
 |  event handler:                                                             |
 |                                                                             |
 |  event handler will handle internal event                                   |
 |    +----------+    +------------+                                           |
 |    |heart beat+--->+fed watchdog|                                           |
 |    +----------+    +------------+                                           |
 |                                                                             |
 |  call sender for other types                                                |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |    |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile|  |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |                                                                             |
 |    +------+    +------------------+                                         |
 |    |telemd+--->+telemetrics client|                                         |
 |    +------+    +------------------+                                         |
 +-----------------------------------------------------------------------------+


Source files
************

- main.c
  Entry of ``acrnprobe``.
- channel.c
  The implementation of *channel* (see `Syntax`_).
- crash_reclassify.c
  Analyzing the detailed types for crash event.
- probeutils.c
  Provide some utils ``acrnprobe`` needs.
- event_queue.c
  The implementation of *event queue* (see `Syntax`_).
- event_handler.c
  The implementation of *event handler* (see `Syntax`_).
- history.c
  There is a history_event file to manage all logs that ``acrnprobe`` archived.
  "history.c" provides the interfaces to modify the file in fixed format.
- load_conf.c
  Parse and load the configuration file.
- property.c
  The ``acrnprobe`` needs to know some HW/SW properties, such as board version,
  build version. These properties are managed centrally in this file.
- sender.c
  The implementation of *sender* (see `Syntax`_).
- startupreason.c
  This file provides the function to get system reboot reason from kernel
  command line.
- android_events.c
  Sync events detected by android crashlog.

Configuration files
*******************

* ``/usr/share/defaults/telemetrics/acrnprobe.xml``

  If no custom configuration file is found, ``acrnprobe`` uses the settings in
  this file.

* ``/etc/acrnprobe.xml``

  Custom configuration file that ``acrnprobe`` reads.

.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client