33===========================
44
55Although the source code is the ultimate guide, this document helps
6- new developers to get up to speed with the implementation details.
6+ users and
7+ new developers get up to speed with the implementation details.
78
89Introduction
910------------
@@ -12,7 +13,7 @@ Swift offers something called a *container*, which we use interchangeably with
1213the term *bucket *, so we say that RGW's buckets implement Swift containers.
1314
1415This document does not consider how RGW operates on these structures,
15- e.g. the use of encode() and decode() methods for serialization and so on.
16+ e.g. the use of `` encode() `` and `` decode()` methods for serialization and so on.
1617
1718Conceptual View
1819---------------
@@ -24,8 +25,8 @@ metadata, bucket index, and data.
2425Metadata
2526^^^^^^^^
2627
27- We have 3 'sections' of metadata: 'user', 'bucket', and 'bucket.instance'.
28- You can use the following commands to introspect metadata entries: ::
28+ We have three 'sections' of metadata: 'user', 'bucket', and 'bucket.instance'.
29+ You can use the following commands to inspect metadata entries: ::
2930
3031 $ radosgw-admin metadata list
3132 $ radosgw-admin metadata list bucket
@@ -38,40 +39,40 @@ You can use the following commands to introspect metadata entries: ::
3839
3940Some variables have been used in above commands, they are:
4041
41- - user : Holds user information
42- - bucket : Holds a mapping between bucket name and bucket instance id
43- - bucket.instance : Holds bucket instance information[2]
42+ - _user_ : Holds user information
43+ - _bucket_ : Holds a mapping between bucket name and bucket instance id
44+ - _bucket.instance_ : Holds bucket instance information[2]
4445
45- Every metadata entry is kept on a single RADOS object. See below for implementation details.
46+ Each metadata entry is kept on a single RADOS object. See below for implementation details.
4647
4748Note that the metadata is not indexed. When listing a metadata section we do a
4849RADOS ``pgls `` operation on the containing pool.
4950
5051Bucket Index
5152^^^^^^^^^^^^
5253
53- It's a different kind of metadata, and kept separately. The bucket index holds
54- a key-value map in RADOS objects. By default it is a single RADOS object per
54+ The bucket index is a different kind of metadata, and is kept separately. The bucket index holds
55+ a key-value map attached to RADOS objects. By default it is a single RADOS object per
5556bucket, but it is possible since Hammer to shard that map over multiple RADOS
5657objects. The map itself is kept in omap, associated with each RADOS object.
57- The key of each omap is the name of the objects , and the value holds some basic
58+ The key of each omap is the name of the object , and the value holds some basic
5859metadata of that object -- metadata that shows up when listing the bucket.
5960Also, each omap holds a header, and we keep some bucket accounting metadata
6061in that header (number of objects, total size, etc.).
6162
62- Note that we also hold other information in the bucket index, and it's kept in
63+ Note that we also hold other information in the bucket index, which is kept in
6364other key namespaces. We can hold the bucket index log there, and for versioned
6465objects there is more information that we keep on other keys.
6566
6667Data
6768^^^^
6869
69- Objects data is kept in one or more RADOS objects for each rgw object.
70+ Object data is kept in one or more RADOS objects for each RGW object.
7071
7172Object Lookup Path
7273------------------
7374
74- When accessing objects, REST APIs come to RGW with three parameters:
75+ When accessing S3/Swift objects, REST APIs come to RGW with three parameters:
7576account information (access key in S3 or account name in Swift),
7677bucket or container name, and object name (or key). At present, RGW only
7778uses account information to find out the user ID and for access control.
@@ -81,57 +82,64 @@ The user ID in RGW is a string, typically the actual user name from the user
8182credentials and not a hashed or mapped identifier.
8283
8384When accessing a user's data, the user record is loaded from an object
84- " <user_id>" in pool " default.rgw.meta" with namespace " users.uid" .
85+ named `` <user_id> `` in pool `` default.rgw.meta `` with namespace `` users.uid `` .
8586
86- Bucket names are represented in the pool " default.rgw.meta" with namespace
87- " root". Bucket record is
88- loaded in order to obtain so-called marker, which serves as a bucket ID.
87+ Bucket names are represented in the pool `` default.rgw.meta `` with namespace
88+ `` root ``. The bucket record is
89+ loaded in order to obtain the so-called marker, which serves as a bucket ID.
8990
90- The object is located in pool " default.rgw.buckets.data" .
91- Object name is " <marker>_<key>" ,
92- for example " default.7593.4_image.png" , where the marker is " default.7593.4"
93- and the key is " image.png" . Since these concatenated names are not parsed,
91+ S3/Swift objects are located in a pool named like `` default.rgw.buckets.data `` .
92+ RADOS object names are `` <marker>_<key> `` ,
93+ for example `` default.7593.4_image.png `` , where the marker is `` default.7593.4 ``
94+ and the key is `` image.png `` . Since these concatenated names are not parsed,
9495only passed down to RADOS, the choice of the separator is not important and
9596causes no ambiguity. For the same reason, slashes are permitted in object
9697names (keys).
9798
98- It is also possible to create multiple data pools and make it so that
99+ It is possible to create multiple data pools and make it so that
99100different users\` buckets will be created in different RADOS pools by default,
100101thus providing the necessary scaling. The layout and naming of these pools
101102is controlled by a 'policy' setting.[3]
102103
103- An RGW object may consist of several RADOS objects, the first of which
104- is the head that contains the metadata, such as manifest, ACLs, content type,
104+ An RGW object may comprise multiple RADOS objects, the first of which
105+ is the `` HEAD `` that contains metadata including manifest, ACLs, content type,
105106ETag, and user-defined metadata. The metadata is stored in xattrs.
106- The head may also contain up to :confval: `rgw_max_chunk_size ` of object data, for efficiency
107- and atomicity. The manifest describes how each object is laid out in RADOS
107+ The ``HEAD` object may also inline up to :confval:`rgw_max_chunk_size` of object data, for efficiency
108+ and atomicity. This enables a convenenient tiering strategy: index pools
109+ are necessarily replicated (cannot be EC) and should be placed on fast SSD
110+ OSDs. With a mix of small/hot RGW objects and larger, warm/cold RGW
111+ objects like video files, the larger objects will automatically be placed
112+ in the ``buckets.data `` pool, which may be EC and/or slower storage like
113+ HDDs or QLC SSDs.
114+
115+ The manifest describes how each RGW object is laid out across RADOS
108116objects.
109117
110118Bucket and Object Listing
111119-------------------------
112120
113- Buckets that belong to a given user are listed in an omap of an object named
114- " <user_id>.buckets" (for example, " foo.buckets" ) in pool " default.rgw.meta"
115- with namespace " users.uid" .
121+ Buckets that belong to a given user are listed in an omap of a RADOS object named
122+ `` <user_id>.buckets `` (for example, `` foo.buckets `` ) in pool `` default.rgw.meta ``
123+ with namespace `` users.uid `` .
116124These objects are accessed when listing buckets, when updating bucket
117125contents, and updating and retrieving bucket statistics (e.g. for quota).
118126
119- See the user-visible, encoded class ' cls_user_bucket_entry' and its
120- nested class ' cls_user_bucket' for the values of these omap entries.
127+ See the user-visible, encoded class `` cls_user_bucket_entry `` and its
128+ nested class `` cls_user_bucket `` for the values of these omap entries.
121129
122- These listings are kept consistent with buckets in pool " .rgw" .
130+ These listings are kept consistent with buckets in the pool named `` .rgw `` .
123131
124132Objects that belong to a given bucket are listed in a bucket index,
125133as discussed in sub-section 'Bucket Index' above. The default naming
126- for index objects is " .dir.<marker>" in pool " default.rgw.buckets.index" .
134+ for index objects is `` .dir.<marker> `` in pool `` default.rgw.buckets.index `` .
127135
128136Footnotes
129137---------
130138
131139[1] Omap is a key-value store, associated with an object, in a way similar
132- to how Extended Attributes associate with a POSIX file. An object's omap
133- is not physically located in the object's storage, but its precise
134- implementation is invisible and immaterial to RADOS Gateway .
140+ to how Extended Attributes (XATTRs) are associated with a POSIX file. An object's omap
141+ is not physically colocated with the object's payload data, and its precise
142+ implementation is invisible to and immaterial to RGW daemons .
135143
136144[2] Before the Dumpling release, the 'bucket.instance' metadata did not
137145exist and the 'bucket' metadata contained its information. It is possible
@@ -140,25 +148,25 @@ to encounter such buckets in old installations.
140148[3] Pool names changed with the Infernalis release.
141149If you are looking at an older setup, some details may be different. In
142150particular there was a different pool for each of the namespaces that are
143- now being used inside the ``default.root.meta `` pool.
151+ now combined inside the ``default.root.meta `` pool.
144152
145153Appendix: Compendium
146154--------------------
147155
148156Known pools:
149157
150- .rgw.root
151- Unspecified region , zone, and global information records, one per object.
158+ `` .rgw.root ``
159+ Region , zone, and global information records, one per object.
152160
153- <zone>.rgw.control
161+ `` <zone>.rgw.control ``
154162 notify.<N>
155163
156- <zone>.rgw.meta
164+ `` <zone>.rgw.meta ``
157165 Multiple namespaces with different kinds of metadata:
158166
159- namespace: root
167+ namespace: `` root ``
160168 <bucket>
161- .bucket.meta.<bucket>:<marker> # see put_bucket_instance_info()
169+ `` .bucket.meta.<bucket>:<marker> `` # see put_bucket_instance_info()
162170
163171 The tenant is used to disambiguate buckets, but not bucket instances.
164172 Example::
@@ -170,7 +178,7 @@ Known pools:
170178 prodtx/test%25star
171179 testcont
172180
173- namespace: users.uid
181+ namespace: `` users.uid ``
174182 Contains _both_ per-user information (RGWUserInfo) in "<user>" objects
175183 and per-user lists of buckets in omaps of "<user>.buckets" objects.
176184 The "<user>" may contain the tenant if non-empty, for example::
@@ -180,27 +188,27 @@ Known pools:
180188 prodtx$prodt.buckets
181189 test2
182190
183- namespace: users.email
191+ namespace: `` users.email ``
184192 Unimportant
185193
186- namespace: users.keys
187- 47UA98JSTJZ9YAN3OS3O
194+ namespace: `` users.keys ``
195+ example: `` 47UA98JSTJZ9YAN3OS3O ``
188196
189197 This allows ``radosgw `` to look up users by their access keys during authentication.
190198
191- namespace: users.swift
199+ namespace: `` users.swift ``
192200 test:tester
193201
194- <zone>.rgw.buckets.index
195- Objects are named " .dir.<marker>", each contains a bucket index.
202+ `` <zone>.rgw.buckets.index ``
203+ Objects are named `` .dir.<marker> ``: each contains a bucket index.
196204 If the index is sharded, each shard appends the shard index after
197205 the marker.
198206
199- <zone>.rgw.buckets.data
200- default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1
207+ `` <zone>.rgw.buckets.data ``
208+ example: `` default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1 ``
201209 <marker>_<key>
202210
203- An example of a marker would be " default.16004.1" or " default.7593.4" .
204- The current format is " <zone>.<instance_id>.<bucket_id>" . But once
211+ An example of a marker would be `` default.16004.1 `` or ` default.7593.4`` .
212+ The current format is `` <zone>.<instance_id>.<bucket_id> `` . But once
205213generated, a marker is not parsed again, so its format may change
206214freely in the future.
0 commit comments