tools: acrn-crashlog: add documents for acrn-crashlog

This patch adds the documents for acrn-crashlog:
README.rst: General introduction for acrn-crashlog.
acrnprobe/README.rst: Introduction for acrnprobe in detail.
usercrash/README.rst: Introduction for usercrash in detail.

Signed-off-by: xiaojin2 <xiaojing.liu@intel.com>
Signed-off-by: Liu Xinwu <xinwu.liu@intel.com>
Signed-off-by: CHEN Gang <gang.c.chen@intel.com>
Acked-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
This commit is contained in:
CHEN Gang 2018-05-31 10:00:30 +00:00 committed by David Kinder
parent 756083fefc
commit 7d13e5be1b
3 changed files with 420 additions and 0 deletions

View File

@ -0,0 +1,153 @@
ACRN-Crashlog
#############
Introduction
************
The ``ACRN-Crashlog`` is a joint name for the tools (``acrnprobe``,
``usercrash_s``, ``usercrash_c``, ``debugger`` and etc.), which collect logs
and information after each crash or event on ACRN platform, including the
hypervisor, Service OS (SOS), and Android as a Guest (AaaG). The
``ACRN-Crashlog`` provides a flexible way to configure which events are of
interest, by using an XML configuration file.
Building
********
Build dependencies
==================
The ``ACRN-Crashlog`` tool depends on the following libraries
(build and runtime):
- libevent
- OpenSSL
- libxml2
- systemd
- telemetrics-client-dev (optional, detected at build time)
Refer to the :ref:`getting_started` for instructions on how to set-up your
build environment, and follow the instructions below to build and configure the
``ACRN-Crashlog`` tool.
Build
=====
To build the ``ACRN-Crashlog``, run below command under ``acrn-crashlog/``:
.. code-block:: console
$ make
To remove all generated files and return the folder to its clean state, use
below command under ``acrn-crashlog/``:
.. code-block:: console
$ make clean
Installing
**********
To install the build
.. code-block:: console
$ sudo make install
Usage
*****
The ``acrnprobe`` can work in two ways according to the existence of
telemetrics-client on the system:
1. If telemetrics-client doesn't exist on the system, ``acrnprobe`` provides
``history_event`` (under ``/var/log/crashlog/history_event``) to manage the
crash and events records on the platform. But in this case, the records
can't be delivered to the backend.
2. If telemetrics-client exists on the system, ``acrnprobe`` works as a probe
of the telemetrics-client: it runs as a daemon autostarted when the system
boots, and sends the crashlog path to the telemetrics-client that records
events of interest and reports them to the backend using ``telemd`` the
telemetrics daemon. The work flow of ``acrnprobe`` and telemetrics-client is:
::
+------------------------------------------------------------------+
| crashlog path log content |
| acrnprobe------------->telemetrics-client----------->backend |
+------------------------------------------------------------------+
Crashlog can be retrieved with ``telem_journal`` command:
.. code-block:: console
$ telem_journal -i
.. note::
For more details of telemetrics, please refer the `telemetrics-client`_ and
`telemetrics-backend`_ website.
``ACRN-Crashlog`` also provides a tool ``debugger`` to dump the specific
process information:
.. code-block:: console
$ debugger <pid>
.. note::
You need to be ``root`` to use the ``debugger``.
Source Code
***********
The source code structure:
.. code-block:: console
acrn-crashlog/
├── acrnprobe
│ └── include
├── common
│ └── include
├── data
└── usercrash
└── include
- ``acrnprobe``: to gather all the crash and event logs on the platform, and
probe on telemetrics-client. For the logs on hypervisor, it's collected with
acrnlog. For the log on SOS, the userspace crash log is collected with
usercrash, and the kernel crash log is collected with the inherent mechanism
like ``ipanic``, ``pstore`` and etc. For the log on AaaG, it's collected with
monitoring the change of related folders on the sos image, like
``/data/logs/``. ``acrnprobe`` also provides a flexible way to allow users to
configure which crash or event they want to collect through the xml file
easily.
- ``common``: some utils for logs, command and string.
- ``data``: configuration file, service files and shell script.
- ``usercrash``: to implement the tool which get the crash information for the
crashing process in userspace.
acrnprobe
=========
The ``acrnprobe`` detects all critical events on the platform and collects
specific information for debug purpose. These information would be saved as
logs, and the log path would be delivered to telemetrics-client as a record if
the telemetrics-client existed on the system.
For more detail on arcnprobe, please refer :ref:`acrnprobe_doc`.
usercrash
=========
The ``usercrash`` is a tool to get the crash info of the crashing process in
userspace. It works in Client/Server model. Server is autostarted, and client is
configured in ``core_pattern``, which will be triggered once crash occurs in
userspace.
For more detail on ``usercrash``, please refer :ref:`usercrash_doc`.
.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client
.. _`telemetrics-backend`: https://github.com/clearlinux/telemetrics-backend

View File

@ -0,0 +1,176 @@
.. _acrnprobe_doc:
Acrnprobe
#########
Description
***********
The ``acrnprobe`` is a tool to detect all critical events on the platform and
collect specific information for them. The collected information would be saved
as logs. The log path would be delivered to `telemetrics-client`_ as a record if
telemetrics-client exists on the system. In this case ``acrnprobe`` works as a
*probe* of telemetrics-client. If telemetrics-client doesn't exist on the
system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/``
by default) to manage the crash and events records on the platform instead of
``telem_journal``. But in this case, the records can't be delivered to the
backend.
Usage
*****
The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic
options:
Specify a configuration file for ``acrnprobe``. If this option is unused,
``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION
PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_).
.. code-block:: console
$ acrnprobe -c [configuration_path]
To see the version of ``acrnprobe``.
.. code-block:: console
$ acrnprobe -V
Architecture
************
Syntax
======
- channel :
Channel represents a way of detecting the system's events. There are 3
channels:
+ oneshot: detect once while ``acrnprobe`` startup.
+ polling: run a detecting job with fixed time interval.
+ inotify: monitor the change of file or dir.
- event queue :
There is a global queue to receive all events detected.
Generally, events are enqueued in channel, and dequeued in event handler.
- event handler :
Event handler is a thread to handle events detected by channel.
It's awakened by an enqueued event.
- sender :
The sender corresponds to an exit of event.
There are two senders:
+ Crashlog is responsible for collecting logs and saving it locally.
+ Telemd is responsible for sending log records to telemetrics client.
Description
===========
As a log collection mechanism to record critical events on the platform,
``acrnprobe`` provides these functions:
1. detect event
From experience, the occurrence of an system event is usually accompanied
by some effects. The effects could be a generated file, an error message in
kernel's log, or a system reboot. To get these effects, for some of them we
can monitor a directory, for other of them we might need to do a detection
in a time loop.
*So we implement the channel, which represents a common method of detection.*
2. analyze event and determine the event type
Generally, a specific effect correspond to a particular type of events.
However, it is the icing on the cake for analyzing the detailed event types
according to some phenomena. *Crash reclassify is implemented for this
purpose.*
3. collect information for detected events
This is for debug purpose. Events without information are meaningless,
and developers need to use this information to improve their system. *Sender
crashlog is implemented for this purpose.*
4. archive these information as logs, and generate records
There must be a central place to tell user what happened in system.
*Sender telemd is implemented for this purpose.*
Diagram
=======
::
+---------------------------------------------+
| channel: |oneshot| |polling| |inotify| |
+--------------------------------------+------+
|
+---------------------+ +-----+ |
| event queue +<---+event+<----+
+-+-------------------+ +-----+
|
v
+-+---------------------------------------------------------------------------+
| event handler: |
| |
| event handler will handle internal event |
| +----------+ +------------+ |
| |heart beat+--->+fed watchdog| |
| +----------+ +------------+ |
| |
| call sender for other types |
| +--------+ +----------------+ +------------+ +------------------+ |
| |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile| |
| +--------+ +----------------+ +------------+ +------------------+ |
| |
| +------+ +------------------+ |
| |telemd+--->+telemetrics client| |
| +------+ +------------------+ |
+-----------------------------------------------------------------------------+
Source files
************
- main.c
Entry of ``acrnprobe``.
- channel.c
The implementation of *channel* (see `Syntax`_).
- crash_reclassify.c
Analyzing the detailed types for crash event.
- probeutils.c
Provide some utils ``acrnprobe`` needs.
- event_queue.c
The implementation of *event queue* (see `Syntax`_).
- event_handler.c
The implementation of *event handler* (see `Syntax`_).
- history.c
There is a history_event file to manage all logs that ``acrnprobe`` archived.
"history.c" provides the interfaces to modify the file in fixed format.
- load_conf.c
Parse and load the configuration file.
- property.c
The ``acrnprobe`` needs to know some HW/SW properties, such as board version,
build version. These properties are managed centrally in this file.
- sender.c
The implementation of *sender* (see `Syntax`_).
- startupreason.c
This file provides the function to get system reboot reason from kernel
command line.
- android_events.c
Sync events detected by android crashlog.
Configuration files
*******************
* ``/usr/share/defaults/telemetrics/acrnprobe.xml``
If no custom configuration file is found, ``acrnprobe`` uses the settings in
this file.
* ``/etc/acrnprobe.xml``
Custom configuration file that ``acrnprobe`` reads.
.. _`telemetrics client`: https://github.com/clearlinux/telemetrics-client

View File

@ -0,0 +1,91 @@
.. _usercrash_doc:
Usercrash
#########
Description
***********
The ``usercrash`` is to get the crash info for the crashing process in
userpace. The collected information is saved as usercrash_xx under
``/var/log/usercrashes/``.
Design
******
The ``usercrash`` is designed as Client/Server model. The server is autostarted
at boot, and the client is configured in ``core_pattern``, which will be
triggered once crash occurs in userspace. Then client sends the crash event to
server. The server will check the files under ``/var/log/usercrashes/`` and
create a new file usercrash_xx(xx means the index of the crash files), then
it will send the fd to client. The client will be responsible for collecting
crash information and saving it in the crashlog file. After saving work is done,
client will notify server. Then the server will clean up.
The work flow diagram:
::
+--------------------------------------------------+
| |
| Server Client |
| + + |
| | Send crash event | |
| | <-----------------------+ |
| | | |
| Create usercrash_xx | |
| | | |
| | Send usercrash_xx fd | |
| +-----------------------> | |
| | | |
| | Fill usercrash_xx |
| | | |
| | Notify completion | |
| | <-----------------------+ |
| | | |
| Clean up | |
| | | |
| v v |
| |
+--------------------------------------------------+
Usage
*****
- The server is launched automatically at boot, and the client is configured in
``core_pattern``. The content of ``core_pattern`` is configured as
``usercrash_c`` while booting up:
.. code-block:: console
$ echo "|/usr/bin/usercrash_c %p %e %s" > /proc/sys/kernel/core_pattern
That means client will be triggered once userspace crash occurs. Then the
event will be sent to server from client.
- The ``debugger`` is an independent tool to dump the debug information of the
specific process, including backtrace, stack, opened files, registers value,
memory content around registers, and etc.
.. code-block:: console
$ debugger <pid>
.. note::
You need to be ``root`` to use the ``debugger``.
Souce Code
**********
- client.c : This file is the implementation for client of ``usercrash``, which
is responsible for delivering the ``usercrash`` event to the server, and
collecting crash information and saving it to the crashfile.
- crash_dump.c : This file is the implementation for dumping the crash
information, including backtrace stack, opened files, registers value, memory
content around registers, and etc.
- debugger.c : This file is to implement a tool, which runs in command line to
dump the process information list above.
- protocol.c : This file is the socket protocol implement file.
- server.c : This file is the implement file for server of ``usercrash``, which
is responsible for creating the crashfile and handle the events from client.