@@ -48,26 +48,39 @@ place that will cause the cluster to re-replicate the data until the
4848
4949Stretch Cluster Issues
5050======================
51- No matter what happens, Ceph will not compromise on data integrity
52- and consistency. If there's a failure in your network or a loss of nodes and
53- you can restore service, Ceph will return to normal functionality on its own.
54-
55- But there are scenarios where you lose data availability despite having
56- enough servers available to satisfy Ceph's consistency and sizing constraints, or
57- where you may be surprised to not satisfy Ceph's constraints.
58- The first important category of these failures resolve around inconsistent
59- networks -- if there's a netsplit, Ceph may be unable to mark OSDs down and kick
60- them out of the acting PG sets despite the primary being unable to replicate data.
61- If this happens, IO will not be permitted, because Ceph can't satisfy its durability
62- guarantees.
63-
64- The second important category of failures is when you think you have data replicated
65- across data centers, but the constraints aren't sufficient to guarantee this.
66- For instance, you might have data centers A and B, and your CRUSH rule targets 3 copies
67- and places a copy in each data center with a ``min_size `` of 2. The PG may go active with
68- 2 copies in site A and no copies in site B, which means that if you then lose site A you
69- have lost data and Ceph can't operate on it. This situation is surprisingly difficult
70- to avoid with standard CRUSH rules.
51+
52+ Ceph does not permit the compromise of data integrity and data consistency
53+ under any circumstances. When service is restored after a network failure or a
54+ loss of Ceph nodes, Ceph will restore itself to a state of normal functioning
55+ without operator intervention.
56+
57+ Ceph does not permit the compromise of data integrity or data consistency, but
58+ there are situations in which *data availability * is compromised. These
59+ situations can occur even though there are enough clusters available to satisfy
60+ Ceph's consistency and sizing constraints. In some situations, you might
61+ discover that your cluster does not satisfy those constraints.
62+
63+ The first category of these failures that we will discuss involves inconsistent
64+ networks -- if there is a netsplit (a disconnection between two servers that
65+ splits the network into two pieces), Ceph might be unable to mark OSDs ``down ``
66+ and remove them from the acting PG sets. This failure to mark ODSs ``down ``
67+ will occur, despite the fact that the primary PG is unable to replicate data (a
68+ situation that, under normal non-netsplit circumstances, would result in the
69+ marking of affected OSDs as ``down `` and their removal from the PG). If this
70+ happens, Ceph will be unable to satisfy its durability guarantees and
71+ consequently IO will not be permitted.
72+
73+ The second category of failures that we will discuss involves the situation in
74+ which the constraints are not sufficient to guarantee the replication of data
75+ across data centers, though it might seem that the data is correctly replicated
76+ across data centers. For example, in a scenario in which there are two data
77+ centers named Data Center A and Data Center B, and the CRUSH rule targets three
78+ replicas and places a replica in each data center with a ``min_size `` of ``2 ``,
79+ the PG might go active with two replicas in Data Center A and zero replicas in
80+ Data Center B. In a situation of this kind, the loss of Data Center A means
81+ that the data is lost and Ceph will not be able to operate on it. This
82+ situation is surprisingly difficult to avoid using only standard CRUSH rules.
83+
7184
7285Stretch Mode
7386============
0 commit comments