acrn-hypervisor/tools/acrn-crashlog/acrnprobe
Liu, Xinwu 41b39c5e1f tools: acrn-crashlog: Defer the vm events processing when failed
In the original design, acrnprobe marked all handled VMs'events as "synced"
in file vmrecordid(this patch changes the name to VM_eventsID.log).
Currently, the Android log events are not logged if the first attempt at
reading collecting them from the VM fails. This patch changes the logic
so that the acrn-crashlog tool will retry continuously.

This patch defines different tags for handled VMs'events, and only marks
VMs'events "synced" after it returns successfully.

Signed-off-by: Liu, Xinwu <xinwu.liu@intel.com>
Reviewed-by: xiaojin2 <xiaojing.liu@intel.com>
Reviewed-by: Jin Zhi <zhi.jin@intel.com>
Acked-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Acked-by: Chen gang <gang.c.chen@intel.com>
2018-06-29 15:23:18 +08:00
..
include tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
Makefile build: Using id tool to get builder username 2018-06-29 11:55:03 +08:00
README.rst Documentation: fix incorrect link in acrn-probe documentation 2018-06-12 14:08:26 -07:00
android_events.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
channels.c tools: acrn-crashlog: Fix potential issues under acrnprobe 2018-06-20 11:27:55 +08:00
crash_reclassify.c tools: acrn-crashlog: Fix potential issues under acrnprobe 2018-06-20 11:27:55 +08:00
event_handler.c tools: acrn-crashlog: compile without telemetrics client 2018-05-23 21:21:51 +08:00
event_queue.c tools: acrn-crashlog: event queue operations for acrnprobe 2018-05-23 17:10:51 +08:00
history.c tools: acrn-crashlog: Fix potential issues under acrnprobe 2018-06-20 11:27:55 +08:00
load_conf.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
main.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
probeutils.c tools: acrn-crashlog: Fix potential issues under acrnprobe 2018-06-20 11:27:55 +08:00
property.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
sender.c tools: acrn-crashlog: Defer the vm events processing when failed 2018-06-29 15:23:18 +08:00
startupreason.c tools: acrn-crashlog: get startup reason of system for acrnprobe 2018-05-23 17:10:51 +08:00

README.rst

.. _acrnprobe_doc:

Acrnprobe
#########

Description
***********

The ``acrnprobe`` is a tool to detect all critical events on the platform and
collect specific information for them. The collected information would be saved
as logs. The log path would be delivered to `telemetrics-client`_ as a record if
telemetrics-client exists on the system. In this case ``acrnprobe`` works as a
*probe* of telemetrics-client. If telemetrics-client doesn't exist on the
system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/``
by default) to manage the crash and events records on the platform instead of
``telem_journal``. But in this case, the records can't be delivered to the
backend.

Usage
*****

The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic
options:

Specify a configuration file for ``acrnprobe``. If this option is unused,
``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION
PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_).

.. code-block:: console

   $ acrnprobe -c [configuration_path]

To see the version of ``acrnprobe``.

.. code-block:: console

   $ acrnprobe -V

Architecture
************

Syntax
======

- channel :
  Channel represents a way of detecting the system's events. There are 3
  channels:

  + oneshot: detect once while ``acrnprobe`` startup.
  + polling: run a detecting job with fixed time interval.
  + inotify: monitor the change of file or dir.

- event queue :
  There is a global queue to receive all events detected.
  Generally, events are enqueued in channel, and dequeued in event handler.

- event handler :
  Event handler is a thread to handle events detected by channel.
  It's awakened by an enqueued event.

- sender :
  The sender corresponds to an exit of event.
  There are two senders:

  + Crashlog is responsible for collecting logs and saving it locally.
  + Telemd is responsible for sending log records to telemetrics client.

Description
===========

As a log collection mechanism to record critical events on the platform,
``acrnprobe`` provides these functions:

1. detect event

   From experience, the occurrence of an system event is usually accompanied
   by some effects. The effects could be a generated file, an error message in
   kernel's log, or a system reboot. To get these effects, for some of them we
   can monitor a directory, for other of them we might need to do a detection
   in a time loop.
   *So we implement the channel, which represents a common method of detection.*

2. analyze event and determine the event type

   Generally, a specific effect correspond to a particular type of events.
   However, it is the icing on the cake for analyzing the detailed event types
   according to some phenomena. *Crash reclassify is implemented for this
   purpose.*

3. collect information for detected events

   This is for debug purpose. Events without information are meaningless,
   and developers need to use this information to improve their system. *Sender
   crashlog is implemented for this purpose.*

4. archive these information as logs, and generate records

   There must be a central place to tell user what happened in system.
   *Sender telemd is implemented for this purpose.*

Diagram
=======
::

 +---------------------------------------------+
 | channel:   |oneshot|  |polling|   |inotify| |
 +--------------------------------------+------+
                                        |
 +---------------------+    +-----+     |
 | event queue         +<---+event+<----+
 +-+-------------------+    +-----+
   |
   v
 +-+---------------------------------------------------------------------------+
 |  event handler:                                                             |
 |                                                                             |
 |  event handler will handle internal event                                   |
 |    +----------+    +------------+                                           |
 |    |heart beat+--->+fed watchdog|                                           |
 |    +----------+    +------------+                                           |
 |                                                                             |
 |  call sender for other types                                                |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |    |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile|  |
 |    +--------+   +----------------+   +------------+   +------------------+  |
 |                                                                             |
 |    +------+    +------------------+                                         |
 |    |telemd+--->+telemetrics client|                                         |
 |    +------+    +------------------+                                         |
 +-----------------------------------------------------------------------------+


Source files
************

- main.c
  Entry of ``acrnprobe``.
- channel.c
  The implementation of *channel* (see `Syntax`_).
- crash_reclassify.c
  Analyzing the detailed types for crash event.
- probeutils.c
  Provide some utils ``acrnprobe`` needs.
- event_queue.c
  The implementation of *event queue* (see `Syntax`_).
- event_handler.c
  The implementation of *event handler* (see `Syntax`_).
- history.c
  There is a history_event file to manage all logs that ``acrnprobe`` archived.
  "history.c" provides the interfaces to modify the file in fixed format.
- load_conf.c
  Parse and load the configuration file.
- property.c
  The ``acrnprobe`` needs to know some HW/SW properties, such as board version,
  build version. These properties are managed centrally in this file.
- sender.c
  The implementation of *sender* (see `Syntax`_).
- startupreason.c
  This file provides the function to get system reboot reason from kernel
  command line.
- android_events.c
  Sync events detected by android crashlog.

Configuration files
*******************

* ``/usr/share/defaults/telemetrics/acrnprobe.xml``

  If no custom configuration file is found, ``acrnprobe`` uses the settings in
  this file.

* ``/etc/acrnprobe.xml``

  Custom configuration file that ``acrnprobe`` reads.

.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client