Skip to content

Commit 7861012

Browse files
authored
Merge pull request ceph#62572 from anthonyeleven/expand-ec-table
doc/rados/operations: Improve erasure-code.rst Reviewed-by: Zac Dover <[email protected]>
2 parents f6c9f59 + 0c89feb commit 7861012

File tree

1 file changed

+62
-13
lines changed

1 file changed

+62
-13
lines changed

doc/rados/operations/erasure-code.rst

Lines changed: 62 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -199,20 +199,49 @@ Erasure-coded pool overhead
199199

200200
The overhead factor (space amplification) of an erasure-coded pool
201201
is `(k+m) / k`. For a 4,2 profile, the overhead is
202-
thus 1.5, which means that 1.5 GiB of underlying storage are used to store
203-
1 GiB of user data. Contrast with default three-way replication, with
202+
thus 1.5, which means that 1.5 GiB of underlying storage is used to store
203+
1 GiB of user data. Contrast with default replication with ``size-3``, with
204204
which the overhead factor is 3.0. Do not mistake erasure coding for a free
205205
lunch: there is a significant performance tradeoff, especially when using HDDs
206206
and when performing cluster recovery or backfill.
207207

208208
Below is a table showing the overhead factors for various values of `k` and `m`.
209-
As `m` increases above 2, the incremental capacity overhead gain quickly
209+
As `k` increases above 4, the incremental capacity overhead gain quickly
210210
experiences diminishing returns but the performance impact grows proportionally.
211-
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 until
212-
and unless you fully understand the ramifications, including the number of
213-
failure domains your cluster topology must contain. If you choose `m=1`,
214-
expect data unavailability during maintenance and data loss if component
215-
failures overlap.
211+
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 unless
212+
and until you fully understand the ramifications, including the number of
213+
failure domains your cluster topology presents. If you choose `m=1`,
214+
expect data unavailability during maintenance and data loss when component
215+
failures overlap. Profiles with `m=1` are thus strongly discouraged for
216+
production data.
217+
218+
Deployments that must remain active and avoid data loss when larger numbers
219+
of overlapping component failure must be survived may favor a value of `m` > 2.
220+
Note that such profiles result in lower space efficiency and lessened performance, especially
221+
during backfill and recovery.
222+
223+
If you are certain that you wish to use erasure coding for one or more pools but
224+
are not certain which profile to use, select `k=4` and `m=2`. You will realize
225+
double the usable space compared to replication with `size=3` with relatively
226+
tolerable write and recovery performance impact.
227+
228+
.. note:: Most erasure-coded pool deployments require at least `k+m` CRUSH failure
229+
domains, which in most cases means `rack`s or `hosts`. There are
230+
operational advantages to planning EC profiles and cluster topology
231+
so that there are at least `k+m+1` failure domains. In most cases
232+
a value of `k` > 8 is discouragd.
233+
234+
.. note:: CephFS and RGW deployments with a significant proportion
235+
of very small user files/objects may wish to plan carefully as
236+
erasure-coded data pools can result in considerable additional space
237+
ampliificaton. Both CephFS and RGW support multiple data pools
238+
with different media, performance, and data protection strategies,
239+
which can enable efficient and effective deployments. An RGW
240+
deployment might for example provision a modest complement of
241+
TLC SSDs used by replicated index and default bucket data pools,
242+
and a larger complement of erasure-coded QLC SSDs or HDDs to which
243+
larger and colder objects are directed via storage class, placement
244+
target, or Lua scripting.
216245

217246
.. list-table:: Erasure coding overhead
218247
:widths: 4 4 4 4 4 4 4 4 4 4 4 4
@@ -363,18 +392,38 @@ failures overlap.
363392
- 1.82
364393
- 1.91
365394
- 2.00
366-
367-
368-
369-
395+
* - k=12
396+
- 1.08
397+
- 1.17
398+
- 1.25
399+
- 1.33
400+
- 1.42
401+
- 1.50
402+
- 1.58
403+
- 1.67
404+
- 1.75
405+
- 1.83
406+
- 1.92
407+
* - k=20
408+
- 1.05
409+
- 1.10
410+
- 1.15
411+
- 1.20
412+
- 1.25
413+
- 1.30
414+
- 1.35
415+
- 1.40
416+
- 1.45
417+
- 1.50
418+
- 1.55
370419

371420

372421

373422

374423
Erasure-coded pools and cache tiering
375424
-------------------------------------
376425

377-
.. note:: Cache tiering is deprecated in Reef.
426+
.. note:: Cache tiering was deprecated in Reef. We strongly advise not deploying new cache tiers, and working to remove them from existing deployments.
378427

379428
Erasure-coded pools require more resources than replicated pools and
380429
lack some of the functionality supported by replicated pools (for example, omap).

0 commit comments

Comments
 (0)