Skip to content

Commit b1005d3

Browse files
authored
Merge pull request ceph#62179 from benhanokh/s3_full_object_dedup
rgw/dedup: full object dedup
2 parents b515a50 + 895121f commit b1005d3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+10053
-5
lines changed

doc/radosgw/s3_objects_dedup.rst

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
======================
2+
Full RGW Object Dedup:
3+
======================
4+
Add a radosgw-admin command to collect and report deduplication stats
5+
6+
.. note:: This utility doesn’t perform dedup and doesn’t make any
7+
change to the existing system and will only collect
8+
statistics and report them.
9+
10+
----
11+
12+
***************
13+
Admin commands:
14+
***************
15+
- ``radosgw-admin dedup stats``:
16+
Collects & displays last dedup statistics
17+
- ``radosgw-admin dedup pause``:
18+
Pauses active dedup session (dedup resources are not released)
19+
- ``radosgw-admin dedup resume``:
20+
Resumes a paused dedup session
21+
- ``radosgw-admin dedup abort``:
22+
Aborts active dedup session and release all resources used by it
23+
- ``radosgw-admin dedup estimate``
24+
Starts a new dedup estimate session (aborting first existing session if exists)
25+
26+
----
27+
28+
****************
29+
Skipped Objects:
30+
****************
31+
Dedup Estimates skips the following objects:
32+
33+
- Objects smaller than 4MB (unless they are multipart)
34+
- Objects with different placement rules
35+
- Objects with different pools
36+
- Objects with different same storage-classes
37+
38+
The Dedup process itself (which will be released later) will also skip
39+
**compressed** and **user-encrypted** objects, but the estimate
40+
process will accept them (since we don't have access to that
41+
information during the estimate process)
42+
43+
----
44+
45+
********************
46+
Estimate Processing:
47+
********************
48+
The Dedup Estimate process collects all the needed information directly from
49+
the bucket-indices reading one full bucket-index object with 1000's of
50+
entries at a time.
51+
52+
The Bucket-Indices objects are sharded between the participating
53+
members so every bucket-index object is read exactly one time.
54+
The sharding allow processing to scale almost linearly spliting the
55+
load evenly between the participating members.
56+
57+
The Dedup Estimate process does not access the objects themselves
58+
(data/metadata) which means its processing time won't be affected by
59+
the underlined media storing the objects (SSD/HDD) since the bucket-indices are
60+
virtually always stored on a fast medium (SSD with heavy memory
61+
caching)
62+
63+
----
64+
65+
*************
66+
Memory Usage:
67+
*************
68+
+---------------++-----------+
69+
| RGW Obj Count | Memory |
70+
+===============++===========+
71+
| | ____1M | | ___8MB |
72+
| | ____4M | | __16MB |
73+
| | ___16M | | __32MB |
74+
| | ___64M | | __64MB |
75+
| | __256M | | _128MB |
76+
| | _1024M( 1G) | | _256MB |
77+
| | _4096M( 4G) | | _512MB |
78+
| | 16384M(16G) | | 1024MB |
79+
+---------------+------------+

qa/suites/rgw/dedup/%

Whitespace-only changes.

qa/suites/rgw/dedup/.qa

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../.qa/

qa/suites/rgw/dedup/beast.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.qa/rgw_frontend/beast.yaml
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.qa/objectstore/bluestore-bitmap.yaml
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
roles:
2+
- [mon.a, mon.c, mgr.y, osd.0, osd.1, osd.2, osd.3, client.0]
3+
- [mon.b, mgr.x, osd.4, osd.5, osd.6, osd.7, client.1]
4+
- [client.2]
5+
openstack:
6+
- volumes: # attached to each instance
7+
count: 4
8+
size: 10 # GB
9+
overrides:
10+
ceph:
11+
conf:
12+
osd:
13+
osd shutdown pgref assert: true
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.qa/rgw/ignore-pg-availability.yaml

qa/suites/rgw/dedup/overrides.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
overrides:
2+
ceph:
3+
conf:
4+
client:
5+
setuser: ceph
6+
setgroup: ceph
7+
debug rgw: 20
8+
debug rgw dedup: 20
9+
rgw:
10+
storage classes:
11+
LUKEWARM:
12+
FROZEN:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.qa/distros/supported-random-distro$/

qa/suites/rgw/dedup/tasks/+

Whitespace-only changes.

0 commit comments

Comments
 (0)