incubator-nuttx/fs/spiffs/docs/TECH_SPEC

* USING SPIFFS

TODO

* SPIFFS DESIGN

Spiffs is inspired by YAFFS. However, YAFFS is designed for NAND flashes, and
for bigger targets with much more ram. Nevertheless, many wise thoughts have
been borrowed from YAFFS when writing spiffs. Kudos!

The main complication writing spiffs was that it cannot be assumed the target
has a heap. Spiffs must go along only with the work ram buffer given to it.
This forces extra implementation on many areas of spiffs.

** SPI flash devices using NOR technology

Below is a small description of how SPI flashes work internally. This is to
give an understanding of the design choices made in spiffs.

SPI flash devices are physically divided in blocks. On some SPI flash devices,
blocks are further divided into sectors. Datasheets sometimes name blocks as
sectors and vice versa.

Common memory capacaties for SPI flashes are 512kB up to 8MB of data, where
blocks may be 64kB. Sectors can be e.g. 4kB, if supported. Many SPI flashes
have uniform block sizes, whereas others have non-uniform - the latter meaning
that e.g. the first 16 blocks are 4kB big, and the rest are 64kB.

The entire memory is linear and can be read and written in random access.
Erasing can only be done block- or sectorwise; or by mass erase.

SPI flashes can normally be erased from 100.000 up to 1.000.000 cycles before
they fail.

A clean SPI flash from factory have all bits in entire memory set to one. A
mass erase will reset the device to this state. Block or sector erasing will
put the all bits in the area given by the sector or block to ones. Writing to a
NOR flash pulls ones to zeroes. Writing 0xFF to an address is simply a no-op.

Writing 0b10101010 to a flash address holding 0b00001111 will yield 0b00001010.

This way of "write by nand" is used considerably in spiffs.

Common characteristics of NOR flashes are quick reads, but slow writes.

And finally, unlike NAND flashes, NOR flashes seem to not need any error
correction. They always write correctly I gather.

** Spiffs logical structure

Some terminology before proceeding. Physical blocks/sectors means sizes stated
in the datasheet. Logical blocks and pages is something the integrator choose.

** Blocks and pages

Spiffs is allocated to a part or all of the memory of the SPI flash device.
This area is divided into logical blocks, which in turn are divided into
logical pages. The boundary of a logical block must coincide with one or more
physical blocks. The sizes for logical blocks and logical pages always remain
the same, they are uniform.

Example: non-uniform flash mapped to spiffs with 128kB logical blocks

PHYSICAL FLASH BLOCKS               SPIFFS LOGICAL BLOCKS: 128kB

+-----------------------+   - - -   +-----------------------+
| Block 1 : 16kB        |           | Block 1 : 128kB       |
+-----------------------+           |                       |
| Block 2 : 16kB        |           |                       |
+-----------------------+           |                       |
| Block 3 : 16kB        |           |                       |
+-----------------------+           |                       |
| Block 4 : 16kB        |           |                       |
+-----------------------+           |                       |
| Block 5 : 64kB        |           |                       |
+-----------------------+   - - -   +-----------------------+
| Block 6 : 64kB        |           | Block 2 : 128kB       |
+-----------------------+           |                       |
| Block 7 : 64kB        |           |                       |
+-----------------------+   - - -   +-----------------------+
| Block 8 : 64kB        |           | Block 3 : 128kB       |
+-----------------------+           |                       |
| Block 9 : 64kB        |           |                       |
+-----------------------+   - - -   +-----------------------+
| ...                   |           | ...                   |

A logical block is divided further into a number of logical pages. A page
defines the smallest data holding element known to spiffs. Hence, if a file
is created being one byte big, it will occupy one page for index and one page
for data - it will occupy 2 x size of a logical page on flash.
So it seems it is good to select a small page size.

Each page has a metadata header being normally 5 to 9 bytes. This said, a very
small page size will make metadata occupy a lot of the memory on the flash. A
page size of 64 bytes will waste 8-14% on metadata, while 256 bytes 2-4%.
So it seems it is good to select a big page size.

Also, spiffs uses a ram buffer being two times the page size. This ram buffer
is used for loading and manipulating pages, but it is also used for algorithms
to find free file ids, scanning the file system, etc. Having too small a page
size means less work buffer for spiffs, ending up in more reads operations and
eventually gives a slower file system.

Choosing the page size for the system involves many factors:
 - How big is the logical block size
 - What is the normal size of most files
 - How much ram can be spent
 - How much data (vs metadata) must be crammed into the file system
 - How fast must spiffs be
 - Other things impossible to find out

So, choosing the Optimal Page Size (tm) seems tricky, to say the least. Don't
fret - there is no optimal page size. This varies from how the target will use
spiffs. Use the golden rule:

        ~~~   Logical Page Size = Logical Block Size / 256   ~~~

This is a good starting point. The final page size can then be derived through
heuristical experimenting for us non-analytical minds.

** Objects, indices and look-ups

A file, or an object as called in spiffs, is identified by an object id.
Another YAFFS rip-off. This object id is a part of the page header. So, all
pages know to which object/file they belong - not counting the free pages.

An object is made up of two types of pages: object index pages and data pages.
Data pages contain the data written by user. Index pages contain metadata about
the object, more specifically what data pages are part of the object.

The page header also includes something called a span index. Let's say a file
is written covering three data pages. The first data page will then have span
index 0, the second span index 1, and the last data page will have span index
2. Simple as that.

Finally, each page header contain flags, telling if the page is used,
deleted, finalized, holds index or data, and more.

Object indices also have span indices, where an object index with span index 0
is referred to as the object index header. This page does not only contain
references to data pages, but also extra info such as object name, object size
in bytes, flags for file or directory, etc.

If one were to create a file covering three data pages, named e.g.
"spandex-joke.txt", given object id 12, it could look like this:

PAGE 0  <things to be unveiled soon>

PAGE 1  page header:   [obj_id:12  span_ix:0  flags:USED|DATA]
        <first data page of joke>

PAGE 2  page header:   [obj_id:12  span_ix:1  flags:USED|DATA]
        <second data page of joke>

PAGE 3  page header:   [obj_id:545 span_ix:13 flags:USED|DATA]
        <some data belonging to object 545, probably not very amusing>

PAGE 4  page header:   [obj_id:12  span_ix:2  flags:USED|DATA]
        <third data page of joke>

PAGE 5  page header:   [obj_id:12  span_ix:0  flags:USED|INDEX]
        obj ix header: [name:spandex-joke.txt  size:600 bytes  flags:FILE]
        obj ix:        [1 2 4]

Looking in detail at page 5, the object index header page, the object index
array refers to each data page in order, as mentioned before. The index of the
object index array correlates with the data page span index.

                            entry ix:  0 1 2
                              obj ix: [1 2 4]
                                       | | |
    PAGE 1, DATA, SPAN_IX 0    --------/ | |
      PAGE 2, DATA, SPAN_IX 1    --------/ |
        PAGE 4, DATA, SPAN_IX 2    --------/

Things to be unveiled in page 0 - well.. Spiffs is designed for systems low on
ram. We cannot keep a dynamic list on the whereabouts of each object index
header so we can find a file fast. There might not even be a heap! But, we do
not want to scan all page headers on the flash to find the object index header.

The first page(s) of each block contains the so called object look-up. These
are not normal pages, they do not have a header. Instead, they are arrays
pointing out what object-id the rest of all pages in the block belongs to.

By this look-up, only the first page(s) in each block must to scanned to find
the actual page which contains the object index header of the desired object.

The object lookup is redundant metadata. The assumption is that it presents
less overhead reading a full page of data to memory from each block and search
that, instead of reading a small amount of data from each page (i.e. the page
header) in all blocks. Each read operation from SPI flash normally contains
extra data as the read command itself and the flash address. Also, depending on
the underlying implementation, other criterions may need to be passed for each
read transaction, like mutexes and such.

The veiled example unveiled would look like this, with some extra pages:

PAGE 0  [  12   12  545   12   12   34   34    4    0    0    0    0 ...]
PAGE 1  page header:   [obj_id:12  span_ix:0  flags:USED|DATA] ...
PAGE 2  page header:   [obj_id:12  span_ix:1  flags:USED|DATA] ...
PAGE 3  page header:   [obj_id:545 span_ix:13 flags:USED|DATA] ...
PAGE 4  page header:   [obj_id:12  span_ix:2  flags:USED|DATA] ...
PAGE 5  page header:   [obj_id:12  span_ix:0  flags:USED|INDEX] ...
PAGE 6  page header:   [obj_id:34  span_ix:0  flags:USED|DATA] ...
PAGE 7  page header:   [obj_id:34  span_ix:1  flags:USED|DATA] ...
PAGE 8  page header:   [obj_id:4   span_ix:1  flags:USED|INDEX] ...
PAGE 9  page header:   [obj_id:23  span_ix:0  flags:DELETED|INDEX] ...
PAGE 10 page header:   [obj_id:23  span_ix:0  flags:DELETED|DATA] ...
PAGE 11 page header:   [obj_id:23  span_ix:1  flags:DELETED|DATA] ...
PAGE 12 page header:   [obj_id:23  span_ix:2  flags:DELETED|DATA] ...
...

Ok, so why are page 9 to 12 marked as 0 when they belong to object id 23? These
pages are deleted, so this is marked both in page header flags and in the look
up. This is an example where spiffs uses NOR flashes "nand-way" of writing.

As a matter of fact, there are two object id's which are special:

obj id 0 (all bits zeroes) - indicates a deleted page in object look up
obj id 0xff.. (all bits ones) - indicates a free page in object look up

Actually, the object id's have another quirk: if the most significant bit is
set, this indicates an object index page. If the most significant bit is zero,
this indicates a data page. So to be fully correct, page 0 in above example
would look like this:

PAGE 0  [  12   12  545   12  *12   34   34   *4    0    0    0    0 ...]

where the asterisk means the msb of the object id is set.

This is another way to speed up the searches when looking for object indices.
By looking on the object id's msb in the object lookup, it is also possible
to find out whether the page is an object index page or a data page.