You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/rados/operations/erasure-code.rst
+62-13Lines changed: 62 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -199,20 +199,49 @@ Erasure-coded pool overhead
199
199
200
200
The overhead factor (space amplification) of an erasure-coded pool
201
201
is `(k+m) / k`. For a 4,2 profile, the overhead is
202
-
thus 1.5, which means that 1.5 GiB of underlying storage are used to store
203
-
1 GiB of user data. Contrast with default three-way replication, with
202
+
thus 1.5, which means that 1.5 GiB of underlying storage is used to store
203
+
1 GiB of user data. Contrast with default replication with ``size-3``, with
204
204
which the overhead factor is 3.0. Do not mistake erasure coding for a free
205
205
lunch: there is a significant performance tradeoff, especially when using HDDs
206
206
and when performing cluster recovery or backfill.
207
207
208
208
Below is a table showing the overhead factors for various values of `k` and `m`.
209
-
As `m` increases above 2, the incremental capacity overhead gain quickly
209
+
As `k` increases above 4, the incremental capacity overhead gain quickly
210
210
experiences diminishing returns but the performance impact grows proportionally.
211
-
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 until
212
-
and unless you fully understand the ramifications, including the number of
213
-
failure domains your cluster topology must contain. If you choose `m=1`,
214
-
expect data unavailability during maintenance and data loss if component
215
-
failures overlap.
211
+
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 unless
212
+
and until you fully understand the ramifications, including the number of
213
+
failure domains your cluster topology presents. If you choose `m=1`,
214
+
expect data unavailability during maintenance and data loss when component
215
+
failures overlap. Profiles with `m=1` are thus strongly discouraged for
216
+
production data.
217
+
218
+
Deployments that must remain active and avoid data loss when larger numbers
219
+
of overlapping component failure must be survived may favor a value of `m` > 2.
220
+
Note that such profiles result in lower space efficiency and lessened performance, especially
221
+
during backfill and recovery.
222
+
223
+
If you are certain that you wish to use erasure coding for one or more pools but
224
+
are not certain which profile to use, select `k=4` and `m=2`. You will realize
225
+
double the usable space compared to replication with `size=3` with relatively
226
+
tolerable write and recovery performance impact.
227
+
228
+
.. note:: Most erasure-coded pool deployments require at least `k+m` CRUSH failure
229
+
domains, which in most cases means `rack`s or `hosts`. There are
230
+
operational advantages to planning EC profiles and cluster topology
231
+
so that there are at least `k+m+1` failure domains. In most cases
232
+
a value of `k` > 8 is discouragd.
233
+
234
+
.. note:: CephFS and RGW deployments with a significant proportion
235
+
of very small user files/objects may wish to plan carefully as
236
+
erasure-coded data pools can result in considerable additional space
237
+
ampliificaton. Both CephFS and RGW support multiple data pools
238
+
with different media, performance, and data protection strategies,
239
+
which can enable efficient and effective deployments. An RGW
240
+
deployment might for example provision a modest complement of
241
+
TLC SSDs used by replicated index and default bucket data pools,
242
+
and a larger complement of erasure-coded QLC SSDs or HDDs to which
243
+
larger and colder objects are directed via storage class, placement
244
+
target, or Lua scripting.
216
245
217
246
.. list-table:: Erasure coding overhead
218
247
:widths: 4 4 4 4 4 4 4 4 4 4 4 4
@@ -363,18 +392,38 @@ failures overlap.
363
392
- 1.82
364
393
- 1.91
365
394
- 2.00
366
-
367
-
368
-
369
-
395
+
* - k=12
396
+
- 1.08
397
+
- 1.17
398
+
- 1.25
399
+
- 1.33
400
+
- 1.42
401
+
- 1.50
402
+
- 1.58
403
+
- 1.67
404
+
- 1.75
405
+
- 1.83
406
+
- 1.92
407
+
* - k=20
408
+
- 1.05
409
+
- 1.10
410
+
- 1.15
411
+
- 1.20
412
+
- 1.25
413
+
- 1.30
414
+
- 1.35
415
+
- 1.40
416
+
- 1.45
417
+
- 1.50
418
+
- 1.55
370
419
371
420
372
421
373
422
374
423
Erasure-coded pools and cache tiering
375
424
-------------------------------------
376
425
377
-
.. note:: Cache tiering is deprecated in Reef.
426
+
.. note:: Cache tiering was deprecated in Reef. We strongly advise not deploying new cache tiers, and working to remove them from existing deployments.
378
427
379
428
Erasure-coded pools require more resources than replicated pools and
380
429
lack some of the functionality supported by replicated pools (for example, omap).
0 commit comments