Skip to content

Commit cd7d028

Browse files
committed
[nrf fromlist] doc: zms: add documentation for ZMS
This adds the documentation for the Zephyr Memory Storage system. Upstream PR: zephyrproject-rtos/zephyr#77930 Signed-off-by: Riadh Ghaddab <[email protected]> (cherry picked from commit 9bb12ff7d9dbf2522bef6d7008c4e41a60a347be)
1 parent e82cea5 commit cd7d028

File tree

2 files changed

+361
-0
lines changed

2 files changed

+361
-0
lines changed

doc/services/storage/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Storage
77
:maxdepth: 1
88

99
nvs/nvs.rst
10+
zms/zms.rst
1011
disk/access.rst
1112
flash_map/flash_map.rst
1213
fcb/fcb.rst

doc/services/storage/zms/zms.rst

Lines changed: 360 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,360 @@
1+
.. _zms_api:
2+
3+
Zephyr Memory Storage (ZMS)
4+
###########################
5+
Zephyr Memory Storage is a new key/value storage system that is designed to work with all types
6+
of non-volatile technologies. It supports classical on-chip NOR flash as well as new technologies
7+
like RRAM and MRAM that do not require a separate erase operation at all.
8+
Data for these devices can be overwritten directly at any time.
9+
10+
General behavior
11+
****************
12+
ZMS divides the memory space into sectors (minimum 2), and each sector is filled with key/value
13+
pair until it is full.
14+
15+
Header entries and IDs entries are stored in the bottom of the sectors and are called ATE
16+
(Allocation Table Entry).
17+
When a sector is full we verify first that the following sector is empty, we garbage collect
18+
the N+2 sector (where N is the current sector number) by moving the valid ATEs to the N+1 empty
19+
sector, we erase the garbage collected sector and then we close the current sector by writing a
20+
garbage_collect_done ATE and the close ATE (one of the header entries).
21+
Afterwards we move forward to the next sector and start writing entries again.
22+
23+
This behavior is repeated until it reaches the end of the partition. Then it starts again from
24+
the first sector after garbage collecting it and erasing its content.
25+
26+
Composition of a sector
27+
=======================
28+
A sector is organized in this form (example with 3 sectors):
29+
30+
.. list-table::
31+
:widths: 25 25 25
32+
:header-rows: 1
33+
34+
* - Sector 0 (closed)
35+
- Sector 1 (open)
36+
- Sector 2 (empty)
37+
* - Data_a0
38+
- Data_b0
39+
- Data_c0
40+
* - Data_a1
41+
- Data_b1
42+
- Data_c1
43+
* - Data_a2
44+
- Data_b2
45+
- Data_c2
46+
* - GC_done
47+
- .
48+
- .
49+
* - .
50+
- .
51+
- .
52+
* - .
53+
- .
54+
- .
55+
* - .
56+
- ATE_b2
57+
- ATE_c2
58+
* - ATE_a2
59+
- ATE_b1
60+
- ATE_c1
61+
* - ATE_a1
62+
- ATE_b0
63+
- ATE_c0
64+
* - ATE_a0
65+
- GC_done
66+
- GC_done
67+
* - Close (cyc=1)
68+
- Close (cyc=1)
69+
- Close (cyc=1)
70+
* - Empty (cyc=1)
71+
- Empty (cyc=2)
72+
- Empty (cyc=2)
73+
74+
Definition of each element in the sector
75+
========================================
76+
77+
``Empty ATE:`` is written when erasing a sector (last position of the sector).
78+
79+
``Close ATE:`` is written when closing a sector (second to last position of the sector).
80+
81+
``GC_done ATE:`` is written to indicate that the next sector has been already garbage
82+
collected. This ATE could be in any position of the sector.
83+
84+
``ATE:`` are entries that contain an ID and describe where the data is stored, its size and
85+
its crc32
86+
87+
``Data:`` is the actual value associated to the ATE ID
88+
89+
How does ZMS work ?
90+
*******************
91+
92+
Mounting the Storage system
93+
===========================
94+
95+
Mounting the storage starts by getting the flash parameters, checking that the file system
96+
properties are correct (sector_size, sector_count ...) then calling the zms_init function to
97+
make the storage ready.
98+
99+
To mount the NVS filesystem some elements in the zms_fs structure must be initialized.
100+
101+
.. code-block:: c
102+
103+
struct zms_fs {
104+
/** File system offset in flash **/
105+
off_t offset;
106+
107+
/** Storage system is split into sectors, each sector size must be multiple of
108+
* erase-blocks if the device has erase capabilities
109+
*/
110+
uint32_t sector_size;
111+
/** Number of sectors in the file system */
112+
uint32_t sector_count;
113+
114+
/** Flash device runtime structure */
115+
const struct device *flash_device;
116+
};
117+
118+
Initialization of ZMS
119+
=====================
120+
121+
As the ZMS has a fast-forward write mechanism, we must find the last sector and the last pointer
122+
of the entry where it stopped the last time.
123+
It must look for a closed sector followed by an open one, then within the open sector, it finds
124+
(recover) the last written ATE (Allocation Table Entry).
125+
After that, it checks that the sector after this one is empty, or it will erase it.
126+
127+
ZMS ID/data write
128+
===================
129+
130+
To avoid rewriting the same data with the same ID again, it must look in all the sectors if the
131+
same ID exist then compares its data, if the data is identical no write is performed.
132+
If we must perform a write, then an ATE and Data (if not a delete) are written in the sector.
133+
If the sector is full (cannot hold the current data + ATE) we have to move to the next sector,
134+
garbage collect the sector after the newly opened one then erase it.
135+
Data size that is smaller or equal to 8 bytes are written within the ATE.
136+
137+
ZMS ID/data read (with history)
138+
===============================
139+
140+
By default it looks for the last data with the same ID and retrieves its data. it returns as well
141+
the number of bytes that were read.
142+
If history count is provided that is different than 0, older data with same ID is retrieved.
143+
144+
ZMS get data length
145+
===================
146+
147+
Given an ID ZMS will return the size of the last data that was found with the same ID
148+
149+
ZMS free space calculation
150+
==========================
151+
152+
ZMS can also return the free space remaining in the partition.
153+
However, this operation is very time consuming and needs to browse all valid ATEs in all sectors
154+
of the partition and for each valid ATE try to find if an older one exist.
155+
We do not recommend applications to use this function very often at runtime as it could slow
156+
very much the calling thread
157+
158+
ZMS how does the cycle counter works ?
159+
======================================
160+
161+
Each sector has a lead cycle counter which is a uin8_t that is used to validate all the other
162+
ATEs.
163+
The lead cycle counter is stored in the empty ATE.
164+
To become valid, an ATE must have the same cycle counter as the one stored in the empty ATE.
165+
Each time an ATE is moved from a sector to another it must get the cycle counter of the
166+
destination sector.
167+
To erase a sector, the cycle counter of the empty ATE is incremented and a single write of the
168+
empty ATE is done.
169+
All the ATEs in that sector become invalid.
170+
171+
ZMS how to close a sector
172+
=========================
173+
174+
To close a sector a close ATE is added at the end of the sector and it must have the same cycle
175+
counter as the empty ATE.
176+
When closing a sector, all the remaining space that has not been used is filled with garbage data
177+
to avoid having old ATEs with a valid cycle counter.
178+
179+
ZMS triggering Garbage collector
180+
================================
181+
182+
Some applications need to make sure that storage writes have a maximum defined latency.
183+
When calling a ZMS write, the current sector could be almost full and we need to trigger the GC
184+
to switch to the next sector.
185+
This operation is time consuming and it will cause some applications to not meet their real time
186+
constraints.
187+
ZMS adds an API for the application to get the current remaining free space in a sector.
188+
The application could then decide when needed to switch to the next sector if the current one is
189+
almost full and of course it will trigger the garbage collection on the next sector.
190+
This will guarantee the application that the next write won't trigger the garbage collection.
191+
192+
ZMS structure of ATE (Allocation Table Entries)
193+
===============================================
194+
195+
An entry has 16 bytes divided between these variables :
196+
197+
.. code-block:: c
198+
199+
struct zms_ate {
200+
uint8_t crc8; /* crc8 check of the entry */
201+
uint8_t cycle_cnt; /* cycle counter for non erasable devices */
202+
uint32_t id; /* data id */
203+
uint16_t len; /* data len within sector */
204+
union {
205+
uint8_t data[8]; /* used to store small size data */
206+
struct {
207+
uint32_t offset; /* data offset within sector */
208+
union {
209+
uint32_t data_crc; /* crc for data */
210+
uint32_t metadata; /* Used to store metadata information
211+
* such as storage version.
212+
*/
213+
};
214+
};
215+
};
216+
} __packed;
217+
218+
.. note:: The data CRC is checked only when the whole data of the element is read.
219+
The data CRC is not checked for a partial read, as it is computed for the complete set of data.
220+
221+
.. note:: Enabling the data CRC feature on a previously existing ZMS content without
222+
data CRC will make all existing data invalid.
223+
224+
.. _free-space:
225+
226+
How much space is available for Key/value pairs
227+
***********************************************
228+
229+
For both scenarios ZMS should have always an empty sector to be able to perform the garbage
230+
collection.
231+
So if we suppose that 4 sectors exist in a partition, ZMS will only use 3 sectors to store
232+
Key/value pairs and keep always one (rotating sector) empty to be able to launch GC.
233+
234+
.. note:: The maximum single data length that could be written at once in a sector is 64K
235+
(This could change in future versions of ZMS)
236+
237+
Data <= 8 bytes
238+
===============
239+
240+
For small sized value (< 8 bytes), the data is stored within the entry (ATE) itself and no data
241+
is written at the top of the sector.
242+
ZMS has an entry size of 16 bytes which means that the free space in a partition to store data
243+
is computed in this scenario as ::
244+
245+
(NUM_SECTORS - 1) * (SECTOR_SIZE - (5 * ATE_SIZE)) / 2
246+
247+
Where:
248+
249+
``NUM_SECTOR:`` Total number of sectors
250+
251+
``SECTOR_SIZE:`` Size of the sector
252+
253+
``ATE_SIZE:`` 16 bytes
254+
255+
``(5 * ATE_SIZE):`` Reserved ATEs for header and delete items
256+
257+
For example for 4 sectors of 1024 bytes, free space for data is 3 * (944)/2 = 1416 bytes.
258+
259+
Data > 8 bytes
260+
==============
261+
262+
Data is stored separately at the top of the sector.
263+
In this case it is hard to estimate the free available space as this depends on the size of
264+
the data. But we can take into account that for N bytes of data (N > 8 bytes) an additional
265+
16 bytes of ATE must be added at the bottom of the sector.
266+
267+
Let's take an example:
268+
269+
For a partition that has 4 sectors of 1024 bytes and for data size of 64 bytes.
270+
Only 3 sectors are available for writes with a capacity of 944 bytes each.
271+
Each Key/value pair needs an extra 16 bytes for ATE which makes it possible to store 11 pairs
272+
in each sectors (944 / 80).
273+
Total data that could be stored in this partition for this case is 11 * 3 * 64 = 2112 bytes
274+
275+
ZMS wear leveling feature
276+
*************************
277+
278+
This storage system is optimized for devices that do not require an erase.
279+
Using storage systems that rely on an erase-value (NVS as an example) will need to emulate the
280+
erase with write operations. This will cause a significant decrease in the life expectancy of
281+
these devices and will cause more delays for write operations and for initialization.
282+
ZMS introduces a cycle count mechanism that avoids emulating erase operation for these devices.
283+
It also guarantees that every memory location is written only once for each cycle of sector write.
284+
285+
As an example, to erase a 4096 bytes sector on a non erasable device using NVS, 256 flash writes
286+
must be performed (supposing that write-block-size=16 bytes), while using ZMS only 1 write of
287+
16 bytes is needed. This operation is 256 times faster in this case.
288+
289+
Garbage collection operation is also adding some writes to the memory cell life expectancy as it
290+
is moving some blocks from one sector to another.
291+
To make the garbage collector not affect the life expectancy of the device it is recommended
292+
to dimension correctly the partition size. Its size should be the double of the maximum size of
293+
data (including extra headers) that could be written in the storage.
294+
295+
See :ref:`free-space`.
296+
297+
How to compute device lifetime
298+
==============================
299+
300+
Storage devices whether they are classical Flash or new technologies like RRAM/MRAM has a limited
301+
life expectancy which is determined by the number of times memory cells can be erased/written.
302+
Flash devices are erased one page at a time as part of their functional behavior (otherwise
303+
memory cells cannot be overwritten) and for non erasable storage devices memory cells can be
304+
overwritten directly.
305+
306+
A typical scenario is shown here to calculate the life expectancy of a device.
307+
Let's suppose that we store an 8 bytes variable using the same ID but its content changes every
308+
minute. The partition has 4 sectors with 1024 bytes each.
309+
Each write of the variable requires 16 bytes of storage.
310+
As we have 944 bytes available for ATEs for each sector, and because ZMS is a fast-forward
311+
storage system, we are going to rewrite the first location of the first sector after
312+
(944 * 4) / 16 = 236 minutes.
313+
314+
In addition to the normal writes, garbage collector will move the still valid data from old
315+
sectors to new ones.
316+
As we are using the same ID and a big partition size, no data will be moved by the garbage
317+
collector in this case.
318+
For storage devices that could be written 20000 times, the storage will last about
319+
4.720.000 minutes (~9 years).
320+
321+
To make a more general formula we must first compute the effective used size in ZMS by our
322+
typical set of data.
323+
For id/data pair with data <= 8 bytes, effective_size is 16 bytes
324+
For id/data pair with data > 8 bytes, effective_size is 16 bytes + sizeof(data)
325+
Let's suppose that total_effective_size is the total size of the set of data that is written in
326+
the storage and that the partition is well dimensioned (double of the effective size) to avoid
327+
having the garbage collector moving blocks all the time.
328+
329+
The expected life of the device in minutes is computed as ::
330+
331+
(SECTOR_EFFECTIVE_SIZE * SECTOR_NUMBER * MAX_NUM_WRITES) / (TOTAL_EFFECTIVE_SIZE * WR_MIN)
332+
333+
Where:
334+
335+
``SECTOR_EFFECTIVE_SIZE``: is the size sector - header_size(80 bytes)
336+
337+
``SECTOR_NUMBER``: is the number of sectors
338+
339+
``MAX_NUM_WRITES``: is the life expectancy of the storage device in number of writes
340+
341+
``TOTAL_EFFECTIVE_SIZE``: Total effective size of the set of written data
342+
343+
``WR_MIN``: Number of writes of the set of data per minute
344+
345+
Sample
346+
******
347+
348+
A sample of how ZMS can be used is supplied in ``samples/subsys/fs/zms``.
349+
350+
API Reference
351+
*************
352+
353+
The ZMS subsystem APIs are provided by ``zms.h``:
354+
355+
.. doxygengroup:: zms_data_structures
356+
357+
.. doxygengroup:: zms_high_level_api
358+
359+
.. comment
360+
not documenting .. doxygengroup:: zms

0 commit comments

Comments
 (0)