Skip to content

Commit f19765e

Browse files
authored
Merge pull request ceph#56173 from Matan-B/wip-matanb-clone-overlap-doc
doc/dev: osd_internals/snaps.rst: add clone_overlap doc Reviewed-by: Ilya Dryomov <[email protected]> Reviewed-by: Radosław Zarzyński <[email protected]> Reviewed-by: Anthony D'Atri <[email protected]> Reviewed-by: Greg Farnum <[email protected]> Reviewed-by: Zac Dover <[email protected]>
2 parents d3ec86a + 6a7f2b0 commit f19765e

File tree

2 files changed

+118
-5
lines changed

2 files changed

+118
-5
lines changed

doc/dev/osd_internals/manifest.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,8 @@ we may want to exploit.
218218
The dedup-tool needs to be updated to use ``LIST_SNAPS`` to discover
219219
clones as part of leak detection.
220220

221+
.. _osd-make-writeable:
222+
221223
An important question is how we deal with the fact that many clones
222224
will frequently have references to the same backing chunks at the same
223225
offset. In particular, ``make_writeable`` will generally create a clone

doc/dev/osd_internals/snaps.rst

Lines changed: 116 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,11 @@ The difference between *pool snaps* and *self managed snaps* from the
2323
OSD's point of view lies in whether the *SnapContext* comes to the OSD
2424
via the client's MOSDOp or via the most recent OSDMap.
2525

26-
See OSD::make_writeable
26+
See :ref:`manifest.rst <osd-make-writeable>` for more information.
2727

2828
Ondisk Structures
2929
-----------------
30-
Each object has in the PG collection a *head* object (or *snapdir*, which we
31-
will come to shortly) and possibly a set of *clone* objects.
30+
Each object has in the PG collection a *head* object and possibly a set of *clone* objects.
3231
Each hobject_t has a snap field. For the *head* (the only writeable version
3332
of an object), the snap field is set to CEPH_NOSNAP. For the *clones*, the
3433
snap field is set to the *seq* of the *SnapContext* at their creation.
@@ -47,8 +46,12 @@ The *head* object contains a *SnapSet* encoded in an attribute, which tracks
4746
3. Overlapping intervals between clones for tracking space usage
4847
4. Clone size
4948

50-
If the *head* is deleted while there are still clones, a *snapdir* object
51-
is created instead to house the *SnapSet*.
49+
The *head* can't be deleted while there are still clones. Instead, it is
50+
marked as whiteout (``object_info_t::FLAG_WHITEOUT``) in order to house the
51+
*SnapSet* contained in it.
52+
In that case, the *head* object no longer logically exists.
53+
54+
See: should_whiteout()
5255

5356
Additionally, the *object_info_t* on each clone includes a vector of snaps
5457
for which clone is defined.
@@ -126,3 +129,111 @@ up to 8 prefixes need to be checked to determine all hobjects in a particular
126129
snap for a particular PG. Upon split, the prefixes to check on the parent
127130
are adjusted such that only the objects remaining in the PG will be visible.
128131
The children will immediately have the correct mapping.
132+
133+
clone_overlap
134+
-------------
135+
Each SnapSet attached to the *head* object contains the overlapping intervals
136+
between clone objects for optimizing space.
137+
The overlapping intervals are stored within the ``clone_overlap`` map, each element in the
138+
map stores the snap ID and the corresponding overlap with the next newest clone.
139+
140+
See the following example using a 4 byte object:
141+
142+
+--------+---------+
143+
| object | content |
144+
+========+=========+
145+
| head | [AAAA] |
146+
+--------+---------+
147+
148+
listsnaps output is as follows:
149+
150+
+---------+-------+------+---------+
151+
| cloneid | snaps | size | overlap |
152+
+=========+=======+======+=========+
153+
| head | - | 4 | |
154+
+---------+-------+------+---------+
155+
156+
After taking a snapshot (ID 1) and re-writing the first 2 bytes of the object,
157+
the clone created will overlap with the new *head* object in its last 2 bytes.
158+
159+
+------------+---------+
160+
| object | content |
161+
+============+=========+
162+
| head | [BBAA] |
163+
+------------+---------+
164+
| clone ID 1 | [AAAA] |
165+
+------------+---------+
166+
167+
+---------+-------+------+---------+
168+
| cloneid | snaps | size | overlap |
169+
+=========+=======+======+=========+
170+
| 1 | 1 | 4 | [2~2] |
171+
+---------+-------+------+---------+
172+
| head | - | 4 | |
173+
+---------+-------+------+---------+
174+
175+
By taking another snapshot (ID 2) and this time re-writing only the first 1 byte of the object,
176+
the clone created (ID 2) will overlap with the new *head* object in its last 3 bytes.
177+
While the oldest clone (ID 1) will overlap with the newest clone in its last 2 bytes.
178+
179+
+------------+---------+
180+
| object | content |
181+
+============+=========+
182+
| head | [CBAA] |
183+
+------------+---------+
184+
| clone ID 2 | [BBAA] |
185+
+------------+---------+
186+
| clone ID 1 | [AAAA] |
187+
+------------+---------+
188+
189+
+---------+-------+------+---------+
190+
| cloneid | snaps | size | overlap |
191+
+=========+=======+======+=========+
192+
| 1 | 1 | 4 | [2~2] |
193+
+---------+-------+------+---------+
194+
| 2 | 2 | 4 | [1~3] |
195+
+---------+-------+------+---------+
196+
| head | - | 4 | |
197+
+---------+-------+------+---------+
198+
199+
If the *head* object will be completely re-written by re-writing 4 bytes,
200+
the only existing overlap that will remain will be between the two clones.
201+
202+
+------------+---------+
203+
| object | content |
204+
+============+=========+
205+
| head | [DDDD] |
206+
+------------+---------+
207+
| clone ID 2 | [BBAA] |
208+
+------------+---------+
209+
| clone ID 1 | [AAAA] |
210+
+------------+---------+
211+
212+
+---------+-------+------+---------+
213+
| cloneid | snaps | size | overlap |
214+
+=========+=======+======+=========+
215+
| 1 | 1 | 4 | [2~2] |
216+
+---------+-------+------+---------+
217+
| 2 | 2 | 4 | |
218+
+---------+-------+------+---------+
219+
| head | - | 4 | |
220+
+---------+-------+------+---------+
221+
222+
Lastly, after the last snap (ID 2) is removed and snaptrim kicks in,
223+
no overlapping intervals will remain:
224+
225+
+------------+---------+
226+
| object | content |
227+
+============+=========+
228+
| head | [DDDD] |
229+
+------------+---------+
230+
| clone ID 1 | [AAAA] |
231+
+------------+---------+
232+
233+
+---------+-------+------+---------+
234+
| cloneid | snaps | size | overlap |
235+
+=========+=======+======+=========+
236+
| 1 | 1 | 4 | |
237+
+---------+-------+------+---------+
238+
| head | - | 4 | |
239+
+---------+-------+------+---------+

0 commit comments

Comments
 (0)