acrn-hypervisor/tools/acrn-crashlog/acrnprobe
Liu, Xinwu ae11bd403d tools:acrn-crashlog: check the folder size by calculating the increment
This patch refines the process of log space check.
Check a folder's size need to traverse all nodes of files and subdirs,
in the case of extreme memory strain, accessing these log file nodes
frequently will increase the IO pressure on the eMMC.
Previously, all log files will be accessed and counted in each space
check, this patch refines it by only calculating the increment of new logs.

Tracked-On: #1024
Signed-off-by: Liu, Xinwu <xinwu.liu@intel.com>
Reviewed-by: Liu, Xiaojing <xiaojing.liu@intel.com>
Acked-by: Chen, Gang <gang.c.chen@intel.com>
2019-04-29 17:04:27 +08:00
..
images doc: fix graphviz scanning and processing 2018-08-30 14:05:50 -07:00
include tools:acrn-crashlog: check the folder size by calculating the increment 2019-04-29 17:04:27 +08:00
Makefile tools:acrn-crashlog: update operations about vmrecord 2019-04-29 17:04:27 +08:00
README.rst tools:acrn-crashlog: Document of configuration file 2018-08-29 09:16:44 -07:00
android_events.c tools:acrn-crashlog: unify the event analyzing process in crashlog 2019-04-29 17:04:27 +08:00
channels.c tools:acrn-crashlog: separate the detection and collection of vm events 2019-04-29 17:04:27 +08:00
conf.rst doc: fix graphviz scanning and processing 2018-08-30 14:05:50 -07:00
crash_reclassify.c tools:acrn-crashlog: unify the event analyzing process in crashlog 2019-04-29 17:04:27 +08:00
event_handler.c tools:acrn-crashlog: separate the detection and collection of vm events 2019-04-29 17:04:27 +08:00
event_queue.c tools:acrn-crashlog: request resources of SOS in crashlog 2019-04-29 17:04:27 +08:00
history.c tools: acrn-crashlog: new file to count all events happened in system 2019-02-28 13:18:57 +08:00
load_conf.c tools:acrn-crashlog: simplify the logic in sender 2019-04-29 17:04:27 +08:00
loop.c tools: acrn-crashlog: New apis to replace debugfs 2018-07-27 16:38:21 +08:00
main.c tools:acrn-crashlog: fix the compiling error on gcc version 9.0.1 2019-04-25 14:50:58 +08:00
probeutils.c tools:acrn-crashlog: request resources of SOS in crashlog 2019-04-29 17:04:27 +08:00
property.c tools: acrn-crashlog: remove unsafe strlen in common 2018-10-19 22:49:04 +08:00
sender.c tools:acrn-crashlog: check the folder size by calculating the increment 2019-04-29 17:04:27 +08:00
startupreason.c tools: acrn-crashlog: update string operation in acrnprobe 2018-10-19 22:49:04 +08:00
vmrecord.c tools:acrn-crashlog: request resources of SOS in crashlog 2019-04-29 17:04:27 +08:00

README.rst

.. _acrnprobe_doc:

acrnprobe
#########

Description
***********

The ``acrnprobe`` is a tool to detect all critical events on the platform and
collect specific information for them. The collected information would be saved
as logs. The log path would be delivered to `telemetrics-client`_ as a record if
telemetrics-client exists on the system. In this case ``acrnprobe`` works as a
*probe* of telemetrics-client. If telemetrics-client doesn't exist on the
system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/``
by default) to manage the crash and events records on the platform instead of
``telem_journal``. But in this case, the records can't be delivered to the
backend.

Usage
*****

The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic
options:

Specify a configuration file for ``acrnprobe``. If this option is unused,
``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION
PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_).

.. code-block:: none

   $ acrnprobe -c [configuration_path]

To see the version of ``acrnprobe``.

.. code-block:: none

   $ acrnprobe -V

Architecture
************

Terms
=====

- channel :
  Channel represents a way of detecting the system's events. There are 3
  channels:

  + oneshot: detect once while ``acrnprobe`` startup.
  + polling: run a detecting job with fixed time interval.
  + inotify: monitor the change of file or dir.

- trigger :
  Essentially, trigger represents one section of content. It could be
  a file's content, a directory's content, or a memory's content which can be
  obtained. By monitoring it ``acrnprobe`` could detect certain events which
  happened in the system.

- crash :
  A subtype of event. It often corresponds to a crash of programs, system, or
  hypervisor. ``acrnprobe`` detects it and reports it as ``CRASH``.

- info :
  A subtype of event. ``acrnprobe`` detects it and reports it as ``INFO``.

- event queue :
  There is a global queue to receive all events detected.
  Generally, events are enqueued in channel, and dequeued in event handler.

- event handler :
  Event handler is a thread to handle events detected by channel.
  It's awakened by an enqueued event.

- sender :
  The sender corresponds to an exit of event.
  There are two senders:

  + Crashlog is responsible for collecting logs and saving it locally.
  + Telemd is responsible for sending log records to telemetrics client.

Description
===========

As a log collection mechanism to record critical events on the platform,
``acrnprobe`` provides these functions:

1. detect event

   From experience, the occurrence of an system event is usually accompanied
   by some effects. The effects could be a generated file, an error message in
   kernel's log, or a system reboot. To get these effects, for some of them we
   can monitor a directory, for other of them we might need to do a detection
   in a time loop.
   *So we implement the channel, which represents a common method of detection.*

2. analyze event and determine the event type

   Generally, a specific effect correspond to a particular type of events.
   However, it is the icing on the cake for analyzing the detailed event types
   according to some phenomena. *Crash reclassify is implemented for this
   purpose.*

3. collect information for detected events

   This is for debug purpose. Events without information are meaningless,
   and developers need to use this information to improve their system. *Sender
   crashlog is implemented for this purpose.*

4. archive these information as logs, and generate records

   There must be a central place to tell user what happened in system.
   *Sender telemd is implemented for this purpose.*

Diagram
=======
::

 +---------------------------------------------+
 | channel:   |oneshot|  |polling|   |inotify| |
 +--------------------------------------+------+
                                        |
 +---------------------+    +-----+     |
 | event queue         +<---+event+<----+
 +-+-------------------+    +-----+
   |
   v
 +-+---------------------------------------------------------------------------+
 |  event handler:                                                             |
 |                                                                             |
 |  event handler will handle internal event                                   |
 |    +----------+    +------------+                                           |
 |    |heart beat+--->+fed watchdog|                                           |
 |    +----------+    +------------+                                           |
 |                                                                             |
 |  call sender for other types                                                |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |    |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile|  |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |                                                                             |
 |    +------+    +------------------+                                         |
 |    |telemd+--->+telemetrics client|                                         |
 |    +------+    +------------------+                                         |
 +-----------------------------------------------------------------------------+


Source files
************

- main.c
  Entry of ``acrnprobe``.
- channel.c
  The implementation of *channel* (see `Terms`_).
- crash_reclassify.c
  Analyzing the detailed types for crash event.
- probeutils.c
  Provide some utils ``acrnprobe`` needs.
- event_queue.c
  The implementation of *event queue* (see `Terms`_).
- event_handler.c
  The implementation of *event handler* (see `Terms`_).
- history.c
  There is a history_event file to manage all logs that ``acrnprobe`` archived.
  "history.c" provides the interfaces to modify the file in fixed format.
- load_conf.c
  Parse and load the configuration file.
- property.c
  The ``acrnprobe`` needs to know some HW/SW properties, such as board version,
  build version. These properties are managed centrally in this file.
- sender.c
  The implementation of *sender* (see `Terms`_).
- startupreason.c
  This file provides the function to get system reboot reason from kernel
  command line.
- android_events.c
  Sync events detected by android crashlog.
- loop.c
  This file provides interfaces to read from image.

Configuration files
*******************

* ``/usr/share/defaults/telemetrics/acrnprobe.xml``

  If no custom configuration file is found, ``acrnprobe`` uses the settings in
  this file.

* ``/etc/acrnprobe.xml``

  Custom configuration file that ``acrnprobe`` reads.

For details about configuration file, please refer to :ref:`acrnprobe-conf`.

.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client