Skip to content

Commit 6869996

Browse files
committed
doc/radosgw: Improve layout.rst
Signed-off-by: Anthony D'Atri <[email protected]>
1 parent 3383af5 commit 6869996

File tree

1 file changed

+63
-55
lines changed

1 file changed

+63
-55
lines changed

doc/radosgw/layout.rst

Lines changed: 63 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
===========================
44

55
Although the source code is the ultimate guide, this document helps
6-
new developers to get up to speed with the implementation details.
6+
users and
7+
new developers get up to speed with the implementation details.
78

89
Introduction
910
------------
@@ -12,7 +13,7 @@ Swift offers something called a *container*, which we use interchangeably with
1213
the term *bucket*, so we say that RGW's buckets implement Swift containers.
1314

1415
This document does not consider how RGW operates on these structures,
15-
e.g. the use of encode() and decode() methods for serialization and so on.
16+
e.g. the use of ``encode()`` and ``decode()` methods for serialization and so on.
1617
1718
Conceptual View
1819
---------------
@@ -24,8 +25,8 @@ metadata, bucket index, and data.
2425
Metadata
2526
^^^^^^^^
2627
27-
We have 3 'sections' of metadata: 'user', 'bucket', and 'bucket.instance'.
28-
You can use the following commands to introspect metadata entries: ::
28+
We have three 'sections' of metadata: 'user', 'bucket', and 'bucket.instance'.
29+
You can use the following commands to inspect metadata entries: ::
2930
3031
$ radosgw-admin metadata list
3132
$ radosgw-admin metadata list bucket
@@ -38,40 +39,40 @@ You can use the following commands to introspect metadata entries: ::
3839
3940
Some variables have been used in above commands, they are:
4041
41-
- user: Holds user information
42-
- bucket: Holds a mapping between bucket name and bucket instance id
43-
- bucket.instance: Holds bucket instance information[2]
42+
- _user_: Holds user information
43+
- _bucket_: Holds a mapping between bucket name and bucket instance id
44+
- _bucket.instance_: Holds bucket instance information[2]
4445
45-
Every metadata entry is kept on a single RADOS object. See below for implementation details.
46+
Each metadata entry is kept on a single RADOS object. See below for implementation details.
4647
4748
Note that the metadata is not indexed. When listing a metadata section we do a
4849
RADOS ``pgls`` operation on the containing pool.
4950

5051
Bucket Index
5152
^^^^^^^^^^^^
5253

53-
It's a different kind of metadata, and kept separately. The bucket index holds
54-
a key-value map in RADOS objects. By default it is a single RADOS object per
54+
The bucket index is a different kind of metadata, and is kept separately. The bucket index holds
55+
a key-value map attached to RADOS objects. By default it is a single RADOS object per
5556
bucket, but it is possible since Hammer to shard that map over multiple RADOS
5657
objects. The map itself is kept in omap, associated with each RADOS object.
57-
The key of each omap is the name of the objects, and the value holds some basic
58+
The key of each omap is the name of the object, and the value holds some basic
5859
metadata of that object -- metadata that shows up when listing the bucket.
5960
Also, each omap holds a header, and we keep some bucket accounting metadata
6061
in that header (number of objects, total size, etc.).
6162

62-
Note that we also hold other information in the bucket index, and it's kept in
63+
Note that we also hold other information in the bucket index, which is kept in
6364
other key namespaces. We can hold the bucket index log there, and for versioned
6465
objects there is more information that we keep on other keys.
6566

6667
Data
6768
^^^^
6869

69-
Objects data is kept in one or more RADOS objects for each rgw object.
70+
Object data is kept in one or more RADOS objects for each RGW object.
7071

7172
Object Lookup Path
7273
------------------
7374

74-
When accessing objects, REST APIs come to RGW with three parameters:
75+
When accessing S3/Swift objects, REST APIs come to RGW with three parameters:
7576
account information (access key in S3 or account name in Swift),
7677
bucket or container name, and object name (or key). At present, RGW only
7778
uses account information to find out the user ID and for access control.
@@ -81,57 +82,64 @@ The user ID in RGW is a string, typically the actual user name from the user
8182
credentials and not a hashed or mapped identifier.
8283

8384
When accessing a user's data, the user record is loaded from an object
84-
"<user_id>" in pool "default.rgw.meta" with namespace "users.uid".
85+
named ``<user_id>`` in pool ``default.rgw.meta`` with namespace ``users.uid``.
8586

86-
Bucket names are represented in the pool "default.rgw.meta" with namespace
87-
"root". Bucket record is
88-
loaded in order to obtain so-called marker, which serves as a bucket ID.
87+
Bucket names are represented in the pool ``default.rgw.meta`` with namespace
88+
``root``. The bucket record is
89+
loaded in order to obtain the so-called marker, which serves as a bucket ID.
8990

90-
The object is located in pool "default.rgw.buckets.data".
91-
Object name is "<marker>_<key>",
92-
for example "default.7593.4_image.png", where the marker is "default.7593.4"
93-
and the key is "image.png". Since these concatenated names are not parsed,
91+
S3/Swift objects are located in a pool named like ``default.rgw.buckets.data``.
92+
RADOS object names are ``<marker>_<key>``,
93+
for example ``default.7593.4_image.png``, where the marker is ``default.7593.4``
94+
and the key is ``image.png``. Since these concatenated names are not parsed,
9495
only passed down to RADOS, the choice of the separator is not important and
9596
causes no ambiguity. For the same reason, slashes are permitted in object
9697
names (keys).
9798

98-
It is also possible to create multiple data pools and make it so that
99+
It is possible to create multiple data pools and make it so that
99100
different users\` buckets will be created in different RADOS pools by default,
100101
thus providing the necessary scaling. The layout and naming of these pools
101102
is controlled by a 'policy' setting.[3]
102103

103-
An RGW object may consist of several RADOS objects, the first of which
104-
is the head that contains the metadata, such as manifest, ACLs, content type,
104+
An RGW object may comprise multiple RADOS objects, the first of which
105+
is the ``HEAD`` that contains metadata including manifest, ACLs, content type,
105106
ETag, and user-defined metadata. The metadata is stored in xattrs.
106-
The head may also contain up to :confval:`rgw_max_chunk_size` of object data, for efficiency
107-
and atomicity. The manifest describes how each object is laid out in RADOS
107+
The ``HEAD` object may also inline up to :confval:`rgw_max_chunk_size` of object data, for efficiency
108+
and atomicity. This enables a convenenient tiering strategy: index pools
109+
are necessarily replicated (cannot be EC) and should be placed on fast SSD
110+
OSDs. With a mix of small/hot RGW objects and larger, warm/cold RGW
111+
objects like video files, the larger objects will automatically be placed
112+
in the ``buckets.data`` pool, which may be EC and/or slower storage like
113+
HDDs or QLC SSDs.
114+
115+
The manifest describes how each RGW object is laid out across RADOS
108116
objects.
109117

110118
Bucket and Object Listing
111119
-------------------------
112120

113-
Buckets that belong to a given user are listed in an omap of an object named
114-
"<user_id>.buckets" (for example, "foo.buckets") in pool "default.rgw.meta"
115-
with namespace "users.uid".
121+
Buckets that belong to a given user are listed in an omap of a RADOS object named
122+
``<user_id>.buckets`` (for example, ``foo.buckets``) in pool ``default.rgw.meta``
123+
with namespace ``users.uid``.
116124
These objects are accessed when listing buckets, when updating bucket
117125
contents, and updating and retrieving bucket statistics (e.g. for quota).
118126

119-
See the user-visible, encoded class 'cls_user_bucket_entry' and its
120-
nested class 'cls_user_bucket' for the values of these omap entries.
127+
See the user-visible, encoded class ``cls_user_bucket_entry`` and its
128+
nested class ``cls_user_bucket`` for the values of these omap entries.
121129

122-
These listings are kept consistent with buckets in pool ".rgw".
130+
These listings are kept consistent with buckets in the pool named ``.rgw``.
123131

124132
Objects that belong to a given bucket are listed in a bucket index,
125133
as discussed in sub-section 'Bucket Index' above. The default naming
126-
for index objects is ".dir.<marker>" in pool "default.rgw.buckets.index".
134+
for index objects is ``.dir.<marker>`` in pool ``default.rgw.buckets.index``.
127135

128136
Footnotes
129137
---------
130138

131139
[1] Omap is a key-value store, associated with an object, in a way similar
132-
to how Extended Attributes associate with a POSIX file. An object's omap
133-
is not physically located in the object's storage, but its precise
134-
implementation is invisible and immaterial to RADOS Gateway.
140+
to how Extended Attributes (XATTRs) are associated with a POSIX file. An object's omap
141+
is not physically colocated with the object's payload data, and its precise
142+
implementation is invisible to and immaterial to RGW daemons.
135143

136144
[2] Before the Dumpling release, the 'bucket.instance' metadata did not
137145
exist and the 'bucket' metadata contained its information. It is possible
@@ -140,25 +148,25 @@ to encounter such buckets in old installations.
140148
[3] Pool names changed with the Infernalis release.
141149
If you are looking at an older setup, some details may be different. In
142150
particular there was a different pool for each of the namespaces that are
143-
now being used inside the ``default.root.meta`` pool.
151+
now combined inside the ``default.root.meta`` pool.
144152

145153
Appendix: Compendium
146154
--------------------
147155

148156
Known pools:
149157

150-
.rgw.root
151-
Unspecified region, zone, and global information records, one per object.
158+
``.rgw.root``
159+
Region, zone, and global information records, one per object.
152160

153-
<zone>.rgw.control
161+
``<zone>.rgw.control``
154162
notify.<N>
155163

156-
<zone>.rgw.meta
164+
``<zone>.rgw.meta``
157165
Multiple namespaces with different kinds of metadata:
158166

159-
namespace: root
167+
namespace: ``root``
160168
<bucket>
161-
.bucket.meta.<bucket>:<marker> # see put_bucket_instance_info()
169+
``.bucket.meta.<bucket>:<marker>`` # see put_bucket_instance_info()
162170

163171
The tenant is used to disambiguate buckets, but not bucket instances.
164172
Example::
@@ -170,7 +178,7 @@ Known pools:
170178
prodtx/test%25star
171179
testcont
172180

173-
namespace: users.uid
181+
namespace: ``users.uid``
174182
Contains _both_ per-user information (RGWUserInfo) in "<user>" objects
175183
and per-user lists of buckets in omaps of "<user>.buckets" objects.
176184
The "<user>" may contain the tenant if non-empty, for example::
@@ -180,27 +188,27 @@ Known pools:
180188
prodtx$prodt.buckets
181189
test2
182190

183-
namespace: users.email
191+
namespace: ``users.email``
184192
Unimportant
185193

186-
namespace: users.keys
187-
47UA98JSTJZ9YAN3OS3O
194+
namespace: ``users.keys``
195+
example: ``47UA98JSTJZ9YAN3OS3O``
188196

189197
This allows ``radosgw`` to look up users by their access keys during authentication.
190198

191-
namespace: users.swift
199+
namespace: ``users.swift``
192200
test:tester
193201

194-
<zone>.rgw.buckets.index
195-
Objects are named ".dir.<marker>", each contains a bucket index.
202+
``<zone>.rgw.buckets.index``
203+
Objects are named ``.dir.<marker>``: each contains a bucket index.
196204
If the index is sharded, each shard appends the shard index after
197205
the marker.
198206

199-
<zone>.rgw.buckets.data
200-
default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1
207+
``<zone>.rgw.buckets.data``
208+
example: ``default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1``
201209
<marker>_<key>
202210

203-
An example of a marker would be "default.16004.1" or "default.7593.4".
204-
The current format is "<zone>.<instance_id>.<bucket_id>". But once
211+
An example of a marker would be ``default.16004.1`` or `default.7593.4``.
212+
The current format is ``<zone>.<instance_id>.<bucket_id>``. But once
205213
generated, a marker is not parsed again, so its format may change
206214
freely in the future.

0 commit comments

Comments
 (0)