tools: acrn-crashlog: add documents for acrn-crashlog
This patch adds the documents for acrn-crashlog: README.rst: General introduction for acrn-crashlog. acrnprobe/README.rst: Introduction for acrnprobe in detail. usercrash/README.rst: Introduction for usercrash in detail. Signed-off-by: xiaojin2 <xiaojing.liu@intel.com> Signed-off-by: Liu Xinwu <xinwu.liu@intel.com> Signed-off-by: CHEN Gang <gang.c.chen@intel.com> Acked-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
This commit is contained in:
parent
756083fefc
commit
7d13e5be1b
|
@ -0,0 +1,153 @@
|
|||
ACRN-Crashlog
|
||||
#############
|
||||
|
||||
Introduction
|
||||
************
|
||||
|
||||
The ``ACRN-Crashlog`` is a joint name for the tools (``acrnprobe``,
|
||||
``usercrash_s``, ``usercrash_c``, ``debugger`` and etc.), which collect logs
|
||||
and information after each crash or event on ACRN platform, including the
|
||||
hypervisor, Service OS (SOS), and Android as a Guest (AaaG). The
|
||||
``ACRN-Crashlog`` provides a flexible way to configure which events are of
|
||||
interest, by using an XML configuration file.
|
||||
|
||||
Building
|
||||
********
|
||||
|
||||
Build dependencies
|
||||
==================
|
||||
|
||||
The ``ACRN-Crashlog`` tool depends on the following libraries
|
||||
(build and runtime):
|
||||
|
||||
- libevent
|
||||
- OpenSSL
|
||||
- libxml2
|
||||
- systemd
|
||||
- telemetrics-client-dev (optional, detected at build time)
|
||||
|
||||
Refer to the :ref:`getting_started` for instructions on how to set-up your
|
||||
build environment, and follow the instructions below to build and configure the
|
||||
``ACRN-Crashlog`` tool.
|
||||
|
||||
Build
|
||||
=====
|
||||
|
||||
To build the ``ACRN-Crashlog``, run below command under ``acrn-crashlog/``:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make
|
||||
|
||||
To remove all generated files and return the folder to its clean state, use
|
||||
below command under ``acrn-crashlog/``:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make clean
|
||||
|
||||
Installing
|
||||
**********
|
||||
|
||||
To install the build
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ sudo make install
|
||||
|
||||
Usage
|
||||
*****
|
||||
|
||||
The ``acrnprobe`` can work in two ways according to the existence of
|
||||
telemetrics-client on the system:
|
||||
|
||||
1. If telemetrics-client doesn't exist on the system, ``acrnprobe`` provides
|
||||
``history_event`` (under ``/var/log/crashlog/history_event``) to manage the
|
||||
crash and events records on the platform. But in this case, the records
|
||||
can't be delivered to the backend.
|
||||
|
||||
2. If telemetrics-client exists on the system, ``acrnprobe`` works as a probe
|
||||
of the telemetrics-client: it runs as a daemon autostarted when the system
|
||||
boots, and sends the crashlog path to the telemetrics-client that records
|
||||
events of interest and reports them to the backend using ``telemd`` the
|
||||
telemetrics daemon. The work flow of ``acrnprobe`` and telemetrics-client is:
|
||||
|
||||
::
|
||||
|
||||
+------------------------------------------------------------------+
|
||||
| crashlog path log content |
|
||||
| acrnprobe------------->telemetrics-client----------->backend |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
Crashlog can be retrieved with ``telem_journal`` command:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ telem_journal -i
|
||||
|
||||
.. note::
|
||||
|
||||
For more details of telemetrics, please refer the `telemetrics-client`_ and
|
||||
`telemetrics-backend`_ website.
|
||||
|
||||
``ACRN-Crashlog`` also provides a tool ``debugger`` to dump the specific
|
||||
process information:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ debugger <pid>
|
||||
|
||||
.. note::
|
||||
|
||||
You need to be ``root`` to use the ``debugger``.
|
||||
|
||||
Source Code
|
||||
***********
|
||||
|
||||
The source code structure:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
acrn-crashlog/
|
||||
├── acrnprobe
|
||||
│ └── include
|
||||
├── common
|
||||
│ └── include
|
||||
├── data
|
||||
└── usercrash
|
||||
└── include
|
||||
|
||||
- ``acrnprobe``: to gather all the crash and event logs on the platform, and
|
||||
probe on telemetrics-client. For the logs on hypervisor, it's collected with
|
||||
acrnlog. For the log on SOS, the userspace crash log is collected with
|
||||
usercrash, and the kernel crash log is collected with the inherent mechanism
|
||||
like ``ipanic``, ``pstore`` and etc. For the log on AaaG, it's collected with
|
||||
monitoring the change of related folders on the sos image, like
|
||||
``/data/logs/``. ``acrnprobe`` also provides a flexible way to allow users to
|
||||
configure which crash or event they want to collect through the xml file
|
||||
easily.
|
||||
- ``common``: some utils for logs, command and string.
|
||||
- ``data``: configuration file, service files and shell script.
|
||||
- ``usercrash``: to implement the tool which get the crash information for the
|
||||
crashing process in userspace.
|
||||
|
||||
acrnprobe
|
||||
=========
|
||||
|
||||
The ``acrnprobe`` detects all critical events on the platform and collects
|
||||
specific information for debug purpose. These information would be saved as
|
||||
logs, and the log path would be delivered to telemetrics-client as a record if
|
||||
the telemetrics-client existed on the system.
|
||||
For more detail on arcnprobe, please refer :ref:`acrnprobe_doc`.
|
||||
|
||||
usercrash
|
||||
=========
|
||||
|
||||
The ``usercrash`` is a tool to get the crash info of the crashing process in
|
||||
userspace. It works in Client/Server model. Server is autostarted, and client is
|
||||
configured in ``core_pattern``, which will be triggered once crash occurs in
|
||||
userspace.
|
||||
For more detail on ``usercrash``, please refer :ref:`usercrash_doc`.
|
||||
|
||||
.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client
|
||||
.. _`telemetrics-backend`: https://github.com/clearlinux/telemetrics-backend
|
|
@ -0,0 +1,176 @@
|
|||
.. _acrnprobe_doc:
|
||||
|
||||
Acrnprobe
|
||||
#########
|
||||
|
||||
Description
|
||||
***********
|
||||
|
||||
The ``acrnprobe`` is a tool to detect all critical events on the platform and
|
||||
collect specific information for them. The collected information would be saved
|
||||
as logs. The log path would be delivered to `telemetrics-client`_ as a record if
|
||||
telemetrics-client exists on the system. In this case ``acrnprobe`` works as a
|
||||
*probe* of telemetrics-client. If telemetrics-client doesn't exist on the
|
||||
system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/``
|
||||
by default) to manage the crash and events records on the platform instead of
|
||||
``telem_journal``. But in this case, the records can't be delivered to the
|
||||
backend.
|
||||
|
||||
Usage
|
||||
*****
|
||||
|
||||
The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic
|
||||
options:
|
||||
|
||||
Specify a configuration file for ``acrnprobe``. If this option is unused,
|
||||
``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION
|
||||
PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_).
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ acrnprobe -c [configuration_path]
|
||||
|
||||
To see the version of ``acrnprobe``.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ acrnprobe -V
|
||||
|
||||
Architecture
|
||||
************
|
||||
|
||||
Syntax
|
||||
======
|
||||
|
||||
- channel :
|
||||
Channel represents a way of detecting the system's events. There are 3
|
||||
channels:
|
||||
|
||||
+ oneshot: detect once while ``acrnprobe`` startup.
|
||||
+ polling: run a detecting job with fixed time interval.
|
||||
+ inotify: monitor the change of file or dir.
|
||||
|
||||
- event queue :
|
||||
There is a global queue to receive all events detected.
|
||||
Generally, events are enqueued in channel, and dequeued in event handler.
|
||||
|
||||
- event handler :
|
||||
Event handler is a thread to handle events detected by channel.
|
||||
It's awakened by an enqueued event.
|
||||
|
||||
- sender :
|
||||
The sender corresponds to an exit of event.
|
||||
There are two senders:
|
||||
|
||||
+ Crashlog is responsible for collecting logs and saving it locally.
|
||||
+ Telemd is responsible for sending log records to telemetrics client.
|
||||
|
||||
Description
|
||||
===========
|
||||
|
||||
As a log collection mechanism to record critical events on the platform,
|
||||
``acrnprobe`` provides these functions:
|
||||
|
||||
1. detect event
|
||||
|
||||
From experience, the occurrence of an system event is usually accompanied
|
||||
by some effects. The effects could be a generated file, an error message in
|
||||
kernel's log, or a system reboot. To get these effects, for some of them we
|
||||
can monitor a directory, for other of them we might need to do a detection
|
||||
in a time loop.
|
||||
*So we implement the channel, which represents a common method of detection.*
|
||||
|
||||
2. analyze event and determine the event type
|
||||
|
||||
Generally, a specific effect correspond to a particular type of events.
|
||||
However, it is the icing on the cake for analyzing the detailed event types
|
||||
according to some phenomena. *Crash reclassify is implemented for this
|
||||
purpose.*
|
||||
|
||||
3. collect information for detected events
|
||||
|
||||
This is for debug purpose. Events without information are meaningless,
|
||||
and developers need to use this information to improve their system. *Sender
|
||||
crashlog is implemented for this purpose.*
|
||||
|
||||
4. archive these information as logs, and generate records
|
||||
|
||||
There must be a central place to tell user what happened in system.
|
||||
*Sender telemd is implemented for this purpose.*
|
||||
|
||||
Diagram
|
||||
=======
|
||||
::
|
||||
|
||||
+---------------------------------------------+
|
||||
| channel: |oneshot| |polling| |inotify| |
|
||||
+--------------------------------------+------+
|
||||
|
|
||||
+---------------------+ +-----+ |
|
||||
| event queue +<---+event+<----+
|
||||
+-+-------------------+ +-----+
|
||||
|
|
||||
v
|
||||
+-+---------------------------------------------------------------------------+
|
||||
| event handler: |
|
||||
| |
|
||||
| event handler will handle internal event |
|
||||
| +----------+ +------------+ |
|
||||
| |heart beat+--->+fed watchdog| |
|
||||
| +----------+ +------------+ |
|
||||
| |
|
||||
| call sender for other types |
|
||||
| +--------+ +----------------+ +------------+ +------------------+ |
|
||||
| |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile| |
|
||||
| +--------+ +----------------+ +------------+ +------------------+ |
|
||||
| |
|
||||
| +------+ +------------------+ |
|
||||
| |telemd+--->+telemetrics client| |
|
||||
| +------+ +------------------+ |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
||||
|
||||
Source files
|
||||
************
|
||||
|
||||
- main.c
|
||||
Entry of ``acrnprobe``.
|
||||
- channel.c
|
||||
The implementation of *channel* (see `Syntax`_).
|
||||
- crash_reclassify.c
|
||||
Analyzing the detailed types for crash event.
|
||||
- probeutils.c
|
||||
Provide some utils ``acrnprobe`` needs.
|
||||
- event_queue.c
|
||||
The implementation of *event queue* (see `Syntax`_).
|
||||
- event_handler.c
|
||||
The implementation of *event handler* (see `Syntax`_).
|
||||
- history.c
|
||||
There is a history_event file to manage all logs that ``acrnprobe`` archived.
|
||||
"history.c" provides the interfaces to modify the file in fixed format.
|
||||
- load_conf.c
|
||||
Parse and load the configuration file.
|
||||
- property.c
|
||||
The ``acrnprobe`` needs to know some HW/SW properties, such as board version,
|
||||
build version. These properties are managed centrally in this file.
|
||||
- sender.c
|
||||
The implementation of *sender* (see `Syntax`_).
|
||||
- startupreason.c
|
||||
This file provides the function to get system reboot reason from kernel
|
||||
command line.
|
||||
- android_events.c
|
||||
Sync events detected by android crashlog.
|
||||
|
||||
Configuration files
|
||||
*******************
|
||||
|
||||
* ``/usr/share/defaults/telemetrics/acrnprobe.xml``
|
||||
|
||||
If no custom configuration file is found, ``acrnprobe`` uses the settings in
|
||||
this file.
|
||||
|
||||
* ``/etc/acrnprobe.xml``
|
||||
|
||||
Custom configuration file that ``acrnprobe`` reads.
|
||||
|
||||
.. _`telemetrics client`: https://github.com/clearlinux/telemetrics-client
|
|
@ -0,0 +1,91 @@
|
|||
.. _usercrash_doc:
|
||||
|
||||
Usercrash
|
||||
#########
|
||||
|
||||
Description
|
||||
***********
|
||||
|
||||
The ``usercrash`` is to get the crash info for the crashing process in
|
||||
userpace. The collected information is saved as usercrash_xx under
|
||||
``/var/log/usercrashes/``.
|
||||
|
||||
Design
|
||||
******
|
||||
|
||||
The ``usercrash`` is designed as Client/Server model. The server is autostarted
|
||||
at boot, and the client is configured in ``core_pattern``, which will be
|
||||
triggered once crash occurs in userspace. Then client sends the crash event to
|
||||
server. The server will check the files under ``/var/log/usercrashes/`` and
|
||||
create a new file usercrash_xx(xx means the index of the crash files), then
|
||||
it will send the fd to client. The client will be responsible for collecting
|
||||
crash information and saving it in the crashlog file. After saving work is done,
|
||||
client will notify server. Then the server will clean up.
|
||||
|
||||
The work flow diagram:
|
||||
|
||||
::
|
||||
|
||||
+--------------------------------------------------+
|
||||
| |
|
||||
| Server Client |
|
||||
| + + |
|
||||
| | Send crash event | |
|
||||
| | <-----------------------+ |
|
||||
| | | |
|
||||
| Create usercrash_xx | |
|
||||
| | | |
|
||||
| | Send usercrash_xx fd | |
|
||||
| +-----------------------> | |
|
||||
| | | |
|
||||
| | Fill usercrash_xx |
|
||||
| | | |
|
||||
| | Notify completion | |
|
||||
| | <-----------------------+ |
|
||||
| | | |
|
||||
| Clean up | |
|
||||
| | | |
|
||||
| v v |
|
||||
| |
|
||||
+--------------------------------------------------+
|
||||
|
||||
Usage
|
||||
*****
|
||||
|
||||
- The server is launched automatically at boot, and the client is configured in
|
||||
``core_pattern``. The content of ``core_pattern`` is configured as
|
||||
``usercrash_c`` while booting up:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ echo "|/usr/bin/usercrash_c %p %e %s" > /proc/sys/kernel/core_pattern
|
||||
|
||||
That means client will be triggered once userspace crash occurs. Then the
|
||||
event will be sent to server from client.
|
||||
|
||||
- The ``debugger`` is an independent tool to dump the debug information of the
|
||||
specific process, including backtrace, stack, opened files, registers value,
|
||||
memory content around registers, and etc.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ debugger <pid>
|
||||
|
||||
.. note::
|
||||
|
||||
You need to be ``root`` to use the ``debugger``.
|
||||
|
||||
Souce Code
|
||||
**********
|
||||
|
||||
- client.c : This file is the implementation for client of ``usercrash``, which
|
||||
is responsible for delivering the ``usercrash`` event to the server, and
|
||||
collecting crash information and saving it to the crashfile.
|
||||
- crash_dump.c : This file is the implementation for dumping the crash
|
||||
information, including backtrace stack, opened files, registers value, memory
|
||||
content around registers, and etc.
|
||||
- debugger.c : This file is to implement a tool, which runs in command line to
|
||||
dump the process information list above.
|
||||
- protocol.c : This file is the socket protocol implement file.
|
||||
- server.c : This file is the implement file for server of ``usercrash``, which
|
||||
is responsible for creating the crashfile and handle the events from client.
|
Loading…
Reference in New Issue