xfs: document the motivation for online fsck design
Start the first chapter of the online fsck design documentation. This covers the motivations for creating this in the first place. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
This commit is contained in:
parent
09a9639e56
commit
a8f6c2e54d
|
@ -123,4 +123,5 @@ Documentation for filesystem implementations.
|
|||
vfat
|
||||
xfs-delayed-logging-design
|
||||
xfs-self-describing-metadata
|
||||
xfs-online-fsck-design
|
||||
zonefs
|
||||
|
|
|
@ -0,0 +1,212 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. _xfs_online_fsck_design:
|
||||
|
||||
..
|
||||
Mapping of heading styles within this document:
|
||||
Heading 1 uses "====" above and below
|
||||
Heading 2 uses "===="
|
||||
Heading 3 uses "----"
|
||||
Heading 4 uses "````"
|
||||
Heading 5 uses "^^^^"
|
||||
Heading 6 uses "~~~~"
|
||||
Heading 7 uses "...."
|
||||
|
||||
Sections are manually numbered because apparently that's what everyone
|
||||
does in the kernel.
|
||||
|
||||
======================
|
||||
XFS Online Fsck Design
|
||||
======================
|
||||
|
||||
This document captures the design of the online filesystem check feature for
|
||||
XFS.
|
||||
The purpose of this document is threefold:
|
||||
|
||||
- To help kernel distributors understand exactly what the XFS online fsck
|
||||
feature is, and issues about which they should be aware.
|
||||
|
||||
- To help people reading the code to familiarize themselves with the relevant
|
||||
concepts and design points before they start digging into the code.
|
||||
|
||||
- To help developers maintaining the system by capturing the reasons
|
||||
supporting higher level decision making.
|
||||
|
||||
As the online fsck code is merged, the links in this document to topic branches
|
||||
will be replaced with links to code.
|
||||
|
||||
This document is licensed under the terms of the GNU Public License, v2.
|
||||
The primary author is Darrick J. Wong.
|
||||
|
||||
This design document is split into seven parts.
|
||||
Part 1 defines what fsck tools are and the motivations for writing a new one.
|
||||
Parts 2 and 3 present a high level overview of how online fsck process works
|
||||
and how it is tested to ensure correct functionality.
|
||||
Part 4 discusses the user interface and the intended usage modes of the new
|
||||
program.
|
||||
Parts 5 and 6 show off the high level components and how they fit together, and
|
||||
then present case studies of how each repair function actually works.
|
||||
Part 7 sums up what has been discussed so far and speculates about what else
|
||||
might be built atop online fsck.
|
||||
|
||||
.. contents:: Table of Contents
|
||||
:local:
|
||||
|
||||
1. What is a Filesystem Check?
|
||||
==============================
|
||||
|
||||
A Unix filesystem has four main responsibilities:
|
||||
|
||||
- Provide a hierarchy of names through which application programs can associate
|
||||
arbitrary blobs of data for any length of time,
|
||||
|
||||
- Virtualize physical storage media across those names, and
|
||||
|
||||
- Retrieve the named data blobs at any time.
|
||||
|
||||
- Examine resource usage.
|
||||
|
||||
Metadata directly supporting these functions (e.g. files, directories, space
|
||||
mappings) are sometimes called primary metadata.
|
||||
Secondary metadata (e.g. reverse mapping and directory parent pointers) support
|
||||
operations internal to the filesystem, such as internal consistency checking
|
||||
and reorganization.
|
||||
Summary metadata, as the name implies, condense information contained in
|
||||
primary metadata for performance reasons.
|
||||
|
||||
The filesystem check (fsck) tool examines all the metadata in a filesystem
|
||||
to look for errors.
|
||||
In addition to looking for obvious metadata corruptions, fsck also
|
||||
cross-references different types of metadata records with each other to look
|
||||
for inconsistencies.
|
||||
People do not like losing data, so most fsck tools also contains some ability
|
||||
to correct any problems found.
|
||||
As a word of caution -- the primary goal of most Linux fsck tools is to restore
|
||||
the filesystem metadata to a consistent state, not to maximize the data
|
||||
recovered.
|
||||
That precedent will not be challenged here.
|
||||
|
||||
Filesystems of the 20th century generally lacked any redundancy in the ondisk
|
||||
format, which means that fsck can only respond to errors by erasing files until
|
||||
errors are no longer detected.
|
||||
More recent filesystem designs contain enough redundancy in their metadata that
|
||||
it is now possible to regenerate data structures when non-catastrophic errors
|
||||
occur; this capability aids both strategies.
|
||||
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Note**: |
|
||||
+--------------------------------------------------------------------------+
|
||||
| System administrators avoid data loss by increasing the number of |
|
||||
| separate storage systems through the creation of backups; and they avoid |
|
||||
| downtime by increasing the redundancy of each storage system through the |
|
||||
| creation of RAID arrays. |
|
||||
| fsck tools address only the first problem. |
|
||||
+--------------------------------------------------------------------------+
|
||||
|
||||
TLDR; Show Me the Code!
|
||||
-----------------------
|
||||
|
||||
Code is posted to the kernel.org git trees as follows:
|
||||
`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
|
||||
`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
|
||||
`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
|
||||
Each kernel patchset adding an online repair function will use the same branch
|
||||
name across the kernel, xfsprogs, and fstests git repos.
|
||||
|
||||
Existing Tools
|
||||
--------------
|
||||
|
||||
The online fsck tool described here will be the third tool in the history of
|
||||
XFS (on Linux) to check and repair filesystems.
|
||||
Two programs precede it:
|
||||
|
||||
The first program, ``xfs_check``, was created as part of the XFS debugger
|
||||
(``xfs_db``) and can only be used with unmounted filesystems.
|
||||
It walks all metadata in the filesystem looking for inconsistencies in the
|
||||
metadata, though it lacks any ability to repair what it finds.
|
||||
Due to its high memory requirements and inability to repair things, this
|
||||
program is now deprecated and will not be discussed further.
|
||||
|
||||
The second program, ``xfs_repair``, was created to be faster and more robust
|
||||
than the first program.
|
||||
Like its predecessor, it can only be used with unmounted filesystems.
|
||||
It uses extent-based in-memory data structures to reduce memory consumption,
|
||||
and tries to schedule readahead IO appropriately to reduce I/O waiting time
|
||||
while it scans the metadata of the entire filesystem.
|
||||
The most important feature of this tool is its ability to respond to
|
||||
inconsistencies in file metadata and directory tree by erasing things as needed
|
||||
to eliminate problems.
|
||||
Space usage metadata are rebuilt from the observed file metadata.
|
||||
|
||||
Problem Statement
|
||||
-----------------
|
||||
|
||||
The current XFS tools leave several problems unsolved:
|
||||
|
||||
1. **User programs** suddenly **lose access** to the filesystem when unexpected
|
||||
shutdowns occur as a result of silent corruptions in the metadata.
|
||||
These occur **unpredictably** and often without warning.
|
||||
|
||||
2. **Users** experience a **total loss of service** during the recovery period
|
||||
after an **unexpected shutdown** occurs.
|
||||
|
||||
3. **Users** experience a **total loss of service** if the filesystem is taken
|
||||
offline to **look for problems** proactively.
|
||||
|
||||
4. **Data owners** cannot **check the integrity** of their stored data without
|
||||
reading all of it.
|
||||
This may expose them to substantial billing costs when a linear media scan
|
||||
performed by the storage system administrator might suffice.
|
||||
|
||||
5. **System administrators** cannot **schedule** a maintenance window to deal
|
||||
with corruptions if they **lack the means** to assess filesystem health
|
||||
while the filesystem is online.
|
||||
|
||||
6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
|
||||
health when doing so requires **manual intervention** and downtime.
|
||||
|
||||
7. **Users** can be tricked into **doing things they do not desire** when
|
||||
malicious actors **exploit quirks of Unicode** to place misleading names
|
||||
in directories.
|
||||
|
||||
Given this definition of the problems to be solved and the actors who would
|
||||
benefit, the proposed solution is a third fsck tool that acts on a running
|
||||
filesystem.
|
||||
|
||||
This new third program has three components: an in-kernel facility to check
|
||||
metadata, an in-kernel facility to repair metadata, and a userspace driver
|
||||
program to drive fsck activity on a live filesystem.
|
||||
``xfs_scrub`` is the name of the driver program.
|
||||
The rest of this document presents the goals and use cases of the new fsck
|
||||
tool, describes its major design points in connection to those goals, and
|
||||
discusses the similarities and differences with existing tools.
|
||||
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Note**: |
|
||||
+--------------------------------------------------------------------------+
|
||||
| Throughout this document, the existing offline fsck tool can also be |
|
||||
| referred to by its current name "``xfs_repair``". |
|
||||
| The userspace driver program for the new online fsck tool can be |
|
||||
| referred to as "``xfs_scrub``". |
|
||||
| The kernel portion of online fsck that validates metadata is called |
|
||||
| "online scrub", and portion of the kernel that fixes metadata is called |
|
||||
| "online repair". |
|
||||
+--------------------------------------------------------------------------+
|
||||
|
||||
The naming hierarchy is broken up into objects known as directories and files
|
||||
and the physical space is split into pieces known as allocation groups.
|
||||
Sharding enables better performance on highly parallel systems and helps to
|
||||
contain the damage when corruptions occur.
|
||||
The division of the filesystem into principal objects (allocation groups and
|
||||
inodes) means that there are ample opportunities to perform targeted checks and
|
||||
repairs on a subset of the filesystem.
|
||||
|
||||
While this is going on, other parts continue processing IO requests.
|
||||
Even if a piece of filesystem metadata can only be regenerated by scanning the
|
||||
entire system, the scan can still be done in the background while other file
|
||||
operations continue.
|
||||
|
||||
In summary, online fsck takes advantage of resource sharding and redundant
|
||||
metadata to enable targeted checking and repair operations while the system
|
||||
is running.
|
||||
This capability will be coupled to automatic system management so that
|
||||
autonomous self-healing of XFS maximizes service availability.
|
Loading…
Reference in New Issue