@@ -23,12 +23,11 @@ The difference between *pool snaps* and *self managed snaps* from the
2323OSD's point of view lies in whether the *SnapContext * comes to the OSD
2424via the client's MOSDOp or via the most recent OSDMap.
2525
26- See OSD::make_writeable
26+ See :ref: ` manifest.rst < osd-make-writeable >` for more information.
2727
2828Ondisk Structures
2929-----------------
30- Each object has in the PG collection a *head * object (or *snapdir *, which we
31- will come to shortly) and possibly a set of *clone * objects.
30+ Each object has in the PG collection a *head * object and possibly a set of *clone * objects.
3231Each hobject_t has a snap field. For the *head * (the only writeable version
3332of an object), the snap field is set to CEPH_NOSNAP. For the *clones *, the
3433snap field is set to the *seq * of the *SnapContext * at their creation.
@@ -47,8 +46,12 @@ The *head* object contains a *SnapSet* encoded in an attribute, which tracks
4746 3. Overlapping intervals between clones for tracking space usage
4847 4. Clone size
4948
50- If the *head * is deleted while there are still clones, a *snapdir * object
51- is created instead to house the *SnapSet *.
49+ The *head * can't be deleted while there are still clones. Instead, it is
50+ marked as whiteout (``object_info_t::FLAG_WHITEOUT ``) in order to house the
51+ *SnapSet * contained in it.
52+ In that case, the *head * object no longer logically exists.
53+
54+ See: should_whiteout()
5255
5356Additionally, the *object_info_t * on each clone includes a vector of snaps
5457for which clone is defined.
@@ -126,3 +129,111 @@ up to 8 prefixes need to be checked to determine all hobjects in a particular
126129snap for a particular PG. Upon split, the prefixes to check on the parent
127130are adjusted such that only the objects remaining in the PG will be visible.
128131The children will immediately have the correct mapping.
132+
133+ clone_overlap
134+ -------------
135+ Each SnapSet attached to the *head * object contains the overlapping intervals
136+ between clone objects for optimizing space.
137+ The overlapping intervals are stored within the ``clone_overlap `` map, each element in the
138+ map stores the snap ID and the corresponding overlap with the next newest clone.
139+
140+ See the following example using a 4 byte object:
141+
142+ +--------+---------+
143+ | object | content |
144+ +========+=========+
145+ | head | [AAAA] |
146+ +--------+---------+
147+
148+ listsnaps output is as follows:
149+
150+ +---------+-------+------+---------+
151+ | cloneid | snaps | size | overlap |
152+ +=========+=======+======+=========+
153+ | head | - | 4 | |
154+ +---------+-------+------+---------+
155+
156+ After taking a snapshot (ID 1) and re-writing the first 2 bytes of the object,
157+ the clone created will overlap with the new *head * object in its last 2 bytes.
158+
159+ +------------+---------+
160+ | object | content |
161+ +============+=========+
162+ | head | [BBAA] |
163+ +------------+---------+
164+ | clone ID 1 | [AAAA] |
165+ +------------+---------+
166+
167+ +---------+-------+------+---------+
168+ | cloneid | snaps | size | overlap |
169+ +=========+=======+======+=========+
170+ | 1 | 1 | 4 | [2~2] |
171+ +---------+-------+------+---------+
172+ | head | - | 4 | |
173+ +---------+-------+------+---------+
174+
175+ By taking another snapshot (ID 2) and this time re-writing only the first 1 byte of the object,
176+ the clone created (ID 2) will overlap with the new *head * object in its last 3 bytes.
177+ While the oldest clone (ID 1) will overlap with the newest clone in its last 2 bytes.
178+
179+ +------------+---------+
180+ | object | content |
181+ +============+=========+
182+ | head | [CBAA] |
183+ +------------+---------+
184+ | clone ID 2 | [BBAA] |
185+ +------------+---------+
186+ | clone ID 1 | [AAAA] |
187+ +------------+---------+
188+
189+ +---------+-------+------+---------+
190+ | cloneid | snaps | size | overlap |
191+ +=========+=======+======+=========+
192+ | 1 | 1 | 4 | [2~2] |
193+ +---------+-------+------+---------+
194+ | 2 | 2 | 4 | [1~3] |
195+ +---------+-------+------+---------+
196+ | head | - | 4 | |
197+ +---------+-------+------+---------+
198+
199+ If the *head * object will be completely re-written by re-writing 4 bytes,
200+ the only existing overlap that will remain will be between the two clones.
201+
202+ +------------+---------+
203+ | object | content |
204+ +============+=========+
205+ | head | [DDDD] |
206+ +------------+---------+
207+ | clone ID 2 | [BBAA] |
208+ +------------+---------+
209+ | clone ID 1 | [AAAA] |
210+ +------------+---------+
211+
212+ +---------+-------+------+---------+
213+ | cloneid | snaps | size | overlap |
214+ +=========+=======+======+=========+
215+ | 1 | 1 | 4 | [2~2] |
216+ +---------+-------+------+---------+
217+ | 2 | 2 | 4 | |
218+ +---------+-------+------+---------+
219+ | head | - | 4 | |
220+ +---------+-------+------+---------+
221+
222+ Lastly, after the last snap (ID 2) is removed and snaptrim kicks in,
223+ no overlapping intervals will remain:
224+
225+ +------------+---------+
226+ | object | content |
227+ +============+=========+
228+ | head | [DDDD] |
229+ +------------+---------+
230+ | clone ID 1 | [AAAA] |
231+ +------------+---------+
232+
233+ +---------+-------+------+---------+
234+ | cloneid | snaps | size | overlap |
235+ +=========+=======+======+=========+
236+ | 1 | 1 | 4 | |
237+ +---------+-------+------+---------+
238+ | head | - | 4 | |
239+ +---------+-------+------+---------+
0 commit comments