From 7d13e5be1bedda4535afa22c1ea053012e5b7713 Mon Sep 17 00:00:00 2001 From: CHEN Gang Date: Thu, 31 May 2018 10:00:30 +0000 Subject: [PATCH] tools: acrn-crashlog: add documents for acrn-crashlog This patch adds the documents for acrn-crashlog: README.rst: General introduction for acrn-crashlog. acrnprobe/README.rst: Introduction for acrnprobe in detail. usercrash/README.rst: Introduction for usercrash in detail. Signed-off-by: xiaojin2 Signed-off-by: Liu Xinwu Signed-off-by: CHEN Gang Acked-by: Geoffroy Van Cutsem --- tools/acrn-crashlog/README.rst | 153 ++++++++++++++++++++ tools/acrn-crashlog/acrnprobe/README.rst | 176 +++++++++++++++++++++++ tools/acrn-crashlog/usercrash/README.rst | 91 ++++++++++++ 3 files changed, 420 insertions(+) create mode 100644 tools/acrn-crashlog/README.rst create mode 100644 tools/acrn-crashlog/acrnprobe/README.rst create mode 100644 tools/acrn-crashlog/usercrash/README.rst diff --git a/tools/acrn-crashlog/README.rst b/tools/acrn-crashlog/README.rst new file mode 100644 index 000000000..7302ce548 --- /dev/null +++ b/tools/acrn-crashlog/README.rst @@ -0,0 +1,153 @@ +ACRN-Crashlog +############# + +Introduction +************ + +The ``ACRN-Crashlog`` is a joint name for the tools (``acrnprobe``, +``usercrash_s``, ``usercrash_c``, ``debugger`` and etc.), which collect logs +and information after each crash or event on ACRN platform, including the +hypervisor, Service OS (SOS), and Android as a Guest (AaaG). The +``ACRN-Crashlog`` provides a flexible way to configure which events are of +interest, by using an XML configuration file. + +Building +******** + +Build dependencies +================== + +The ``ACRN-Crashlog`` tool depends on the following libraries +(build and runtime): + +- libevent +- OpenSSL +- libxml2 +- systemd +- telemetrics-client-dev (optional, detected at build time) + +Refer to the :ref:`getting_started` for instructions on how to set-up your +build environment, and follow the instructions below to build and configure the +``ACRN-Crashlog`` tool. + +Build +===== + +To build the ``ACRN-Crashlog``, run below command under ``acrn-crashlog/``: + +.. code-block:: console + + $ make + +To remove all generated files and return the folder to its clean state, use +below command under ``acrn-crashlog/``: + +.. code-block:: console + + $ make clean + +Installing +********** + +To install the build + +.. code-block:: console + + $ sudo make install + +Usage +***** + +The ``acrnprobe`` can work in two ways according to the existence of +telemetrics-client on the system: + +1. If telemetrics-client doesn't exist on the system, ``acrnprobe`` provides + ``history_event`` (under ``/var/log/crashlog/history_event``) to manage the + crash and events records on the platform. But in this case, the records + can't be delivered to the backend. + +2. If telemetrics-client exists on the system, ``acrnprobe`` works as a probe + of the telemetrics-client: it runs as a daemon autostarted when the system + boots, and sends the crashlog path to the telemetrics-client that records + events of interest and reports them to the backend using ``telemd`` the + telemetrics daemon. The work flow of ``acrnprobe`` and telemetrics-client is: + +:: + + +------------------------------------------------------------------+ + | crashlog path log content | + | acrnprobe------------->telemetrics-client----------->backend | + +------------------------------------------------------------------+ + +Crashlog can be retrieved with ``telem_journal`` command: + +.. code-block:: console + + $ telem_journal -i + +.. note:: + + For more details of telemetrics, please refer the `telemetrics-client`_ and + `telemetrics-backend`_ website. + +``ACRN-Crashlog`` also provides a tool ``debugger`` to dump the specific +process information: + +.. code-block:: console + + $ debugger + +.. note:: + + You need to be ``root`` to use the ``debugger``. + +Source Code +*********** + +The source code structure: + +.. code-block:: console + + acrn-crashlog/ + ├── acrnprobe + │ └── include + ├── common + │ └── include + ├── data + └── usercrash + └── include + +- ``acrnprobe``: to gather all the crash and event logs on the platform, and + probe on telemetrics-client. For the logs on hypervisor, it's collected with + acrnlog. For the log on SOS, the userspace crash log is collected with + usercrash, and the kernel crash log is collected with the inherent mechanism + like ``ipanic``, ``pstore`` and etc. For the log on AaaG, it's collected with + monitoring the change of related folders on the sos image, like + ``/data/logs/``. ``acrnprobe`` also provides a flexible way to allow users to + configure which crash or event they want to collect through the xml file + easily. +- ``common``: some utils for logs, command and string. +- ``data``: configuration file, service files and shell script. +- ``usercrash``: to implement the tool which get the crash information for the + crashing process in userspace. + +acrnprobe +========= + +The ``acrnprobe`` detects all critical events on the platform and collects +specific information for debug purpose. These information would be saved as +logs, and the log path would be delivered to telemetrics-client as a record if +the telemetrics-client existed on the system. +For more detail on arcnprobe, please refer :ref:`acrnprobe_doc`. + +usercrash +========= + +The ``usercrash`` is a tool to get the crash info of the crashing process in +userspace. It works in Client/Server model. Server is autostarted, and client is +configured in ``core_pattern``, which will be triggered once crash occurs in +userspace. +For more detail on ``usercrash``, please refer :ref:`usercrash_doc`. + +.. _`telemetrics-client`: https://github.com/clearlinux/telemetrics-client +.. _`telemetrics-backend`: https://github.com/clearlinux/telemetrics-backend diff --git a/tools/acrn-crashlog/acrnprobe/README.rst b/tools/acrn-crashlog/acrnprobe/README.rst new file mode 100644 index 000000000..24e3af038 --- /dev/null +++ b/tools/acrn-crashlog/acrnprobe/README.rst @@ -0,0 +1,176 @@ +.. _acrnprobe_doc: + +Acrnprobe +######### + +Description +*********** + +The ``acrnprobe`` is a tool to detect all critical events on the platform and +collect specific information for them. The collected information would be saved +as logs. The log path would be delivered to `telemetrics-client`_ as a record if +telemetrics-client exists on the system. In this case ``acrnprobe`` works as a +*probe* of telemetrics-client. If telemetrics-client doesn't exist on the +system, ``acrnprobe`` provides ``history_event`` (under ``/var/log/crashlog/`` +by default) to manage the crash and events records on the platform instead of +``telem_journal``. But in this case, the records can't be delivered to the +backend. + +Usage +***** + +The ``acrnprobe`` is launched as a service at boot. Also, it provides some basic +options: + +Specify a configuration file for ``acrnprobe``. If this option is unused, +``acrnprobe`` will use the configuration file located in CUSTOM CONFIGURATION +PATH or INSTALLATION PATH (see `CONFIGURATION FILES`_). + +.. code-block:: console + + $ acrnprobe -c [configuration_path] + +To see the version of ``acrnprobe``. + +.. code-block:: console + + $ acrnprobe -V + +Architecture +************ + +Syntax +====== + +- channel : + Channel represents a way of detecting the system's events. There are 3 + channels: + + + oneshot: detect once while ``acrnprobe`` startup. + + polling: run a detecting job with fixed time interval. + + inotify: monitor the change of file or dir. + +- event queue : + There is a global queue to receive all events detected. + Generally, events are enqueued in channel, and dequeued in event handler. + +- event handler : + Event handler is a thread to handle events detected by channel. + It's awakened by an enqueued event. + +- sender : + The sender corresponds to an exit of event. + There are two senders: + + + Crashlog is responsible for collecting logs and saving it locally. + + Telemd is responsible for sending log records to telemetrics client. + +Description +=========== + +As a log collection mechanism to record critical events on the platform, +``acrnprobe`` provides these functions: + +1. detect event + + From experience, the occurrence of an system event is usually accompanied + by some effects. The effects could be a generated file, an error message in + kernel's log, or a system reboot. To get these effects, for some of them we + can monitor a directory, for other of them we might need to do a detection + in a time loop. + *So we implement the channel, which represents a common method of detection.* + +2. analyze event and determine the event type + + Generally, a specific effect correspond to a particular type of events. + However, it is the icing on the cake for analyzing the detailed event types + according to some phenomena. *Crash reclassify is implemented for this + purpose.* + +3. collect information for detected events + + This is for debug purpose. Events without information are meaningless, + and developers need to use this information to improve their system. *Sender + crashlog is implemented for this purpose.* + +4. archive these information as logs, and generate records + + There must be a central place to tell user what happened in system. + *Sender telemd is implemented for this purpose.* + +Diagram +======= +:: + + +---------------------------------------------+ + | channel: |oneshot| |polling| |inotify| | + +--------------------------------------+------+ + | + +---------------------+ +-----+ | + | event queue +<---+event+<----+ + +-+-------------------+ +-----+ + | + v + +-+---------------------------------------------------------------------------+ + | event handler: | + | | + | event handler will handle internal event | + | +----------+ +------------+ | + | |heart beat+--->+fed watchdog| | + | +----------+ +------------+ | + | | + | call sender for other types | + | +--------+ +----------------+ +------------+ +------------------+ | + | |crashlog+-->+crash reclassify+-->+collect logs+-->+generate crashfile| | + | +--------+ +----------------+ +------------+ +------------------+ | + | | + | +------+ +------------------+ | + | |telemd+--->+telemetrics client| | + | +------+ +------------------+ | + +-----------------------------------------------------------------------------+ + + +Source files +************ + +- main.c + Entry of ``acrnprobe``. +- channel.c + The implementation of *channel* (see `Syntax`_). +- crash_reclassify.c + Analyzing the detailed types for crash event. +- probeutils.c + Provide some utils ``acrnprobe`` needs. +- event_queue.c + The implementation of *event queue* (see `Syntax`_). +- event_handler.c + The implementation of *event handler* (see `Syntax`_). +- history.c + There is a history_event file to manage all logs that ``acrnprobe`` archived. + "history.c" provides the interfaces to modify the file in fixed format. +- load_conf.c + Parse and load the configuration file. +- property.c + The ``acrnprobe`` needs to know some HW/SW properties, such as board version, + build version. These properties are managed centrally in this file. +- sender.c + The implementation of *sender* (see `Syntax`_). +- startupreason.c + This file provides the function to get system reboot reason from kernel + command line. +- android_events.c + Sync events detected by android crashlog. + +Configuration files +******************* + +* ``/usr/share/defaults/telemetrics/acrnprobe.xml`` + + If no custom configuration file is found, ``acrnprobe`` uses the settings in + this file. + +* ``/etc/acrnprobe.xml`` + + Custom configuration file that ``acrnprobe`` reads. + +.. _`telemetrics client`: https://github.com/clearlinux/telemetrics-client diff --git a/tools/acrn-crashlog/usercrash/README.rst b/tools/acrn-crashlog/usercrash/README.rst new file mode 100644 index 000000000..76fc83cd6 --- /dev/null +++ b/tools/acrn-crashlog/usercrash/README.rst @@ -0,0 +1,91 @@ +.. _usercrash_doc: + +Usercrash +######### + +Description +*********** + +The ``usercrash`` is to get the crash info for the crashing process in +userpace. The collected information is saved as usercrash_xx under +``/var/log/usercrashes/``. + +Design +****** + +The ``usercrash`` is designed as Client/Server model. The server is autostarted +at boot, and the client is configured in ``core_pattern``, which will be +triggered once crash occurs in userspace. Then client sends the crash event to +server. The server will check the files under ``/var/log/usercrashes/`` and +create a new file usercrash_xx(xx means the index of the crash files), then +it will send the fd to client. The client will be responsible for collecting +crash information and saving it in the crashlog file. After saving work is done, +client will notify server. Then the server will clean up. + +The work flow diagram: + +:: + + +--------------------------------------------------+ + | | + | Server Client | + | + + | + | | Send crash event | | + | | <-----------------------+ | + | | | | + | Create usercrash_xx | | + | | | | + | | Send usercrash_xx fd | | + | +-----------------------> | | + | | | | + | | Fill usercrash_xx | + | | | | + | | Notify completion | | + | | <-----------------------+ | + | | | | + | Clean up | | + | | | | + | v v | + | | + +--------------------------------------------------+ + +Usage +***** + +- The server is launched automatically at boot, and the client is configured in + ``core_pattern``. The content of ``core_pattern`` is configured as + ``usercrash_c`` while booting up: + +.. code-block:: console + + $ echo "|/usr/bin/usercrash_c %p %e %s" > /proc/sys/kernel/core_pattern + +That means client will be triggered once userspace crash occurs. Then the +event will be sent to server from client. + +- The ``debugger`` is an independent tool to dump the debug information of the + specific process, including backtrace, stack, opened files, registers value, + memory content around registers, and etc. + +.. code-block:: console + + $ debugger + +.. note:: + + You need to be ``root`` to use the ``debugger``. + +Souce Code +********** + +- client.c : This file is the implementation for client of ``usercrash``, which + is responsible for delivering the ``usercrash`` event to the server, and + collecting crash information and saving it to the crashfile. +- crash_dump.c : This file is the implementation for dumping the crash + information, including backtrace stack, opened files, registers value, memory + content around registers, and etc. +- debugger.c : This file is to implement a tool, which runs in command line to + dump the process information list above. +- protocol.c : This file is the socket protocol implement file. +- server.c : This file is the implement file for server of ``usercrash``, which + is responsible for creating the crashfile and handle the events from client.