You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/operate/rs/databases/active-active/develop/develop-for-aa.md
+68-60Lines changed: 68 additions & 60 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,20 +17,21 @@ conditions between updates to various sites, network, and cluster
17
17
failures that could reorder the events and change the outcome of the
18
18
updates performed across geo-distributed writes.
19
19
20
-
Active-Active databases (formerly known as CRDB) are geo-distributed databases that span multiple Redis Enterprise Software (RS) clusters.
21
-
Active-Active databases depend on multi-master replication (MMR) and Conflict-free
22
-
Replicated Data Types (CRDTs) to power a simple development experience
20
+
Active-Active databases (formerly known as CRDB) are geo-distributed databases that span multiple Redis Enterprise Software clusters.
21
+
Active-Active databases depend on [multi-master replication](https://en.wikipedia.org/wiki/Multi-master_replication) and [conflict-free
22
+
replicated data types (CRDTs)](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) to power a simple development experience
23
23
for geo-distributed applications. Active-Active databases allow developers to use existing
24
-
Redis data types and commands, but understand the developers intent and
25
-
automatically handle conflicting concurrent writes to the same key
24
+
Redis data types and commands, but automatically handle conflicting concurrent writes to the same key
26
25
across multiple geographies. For example, developers can simply use the
27
-
INCR or INCRBY method in Redis in all instances of the geo-distributed
28
-
application, and Active-Active databases handle the additive nature of INCR to reflect the
29
-
correct final value. The following example displays a sequence of events
30
-
over time : t1 to t9. This Active-Active database has two member Active-Active databases : member CRDB1 and
31
-
member CRDB2. The local operations executing in each member Active-Active database is
32
-
listed under the member Active-Active database name. The "Sync" even represent the moment
33
-
where synchronization catches up to distribute all local member Active-Active database
26
+
`INCR` or `INCRBY` method in Redis in all instances of the geo-distributed
27
+
application, and Active-Active databases handle the additive nature of `INCR` to reflect the
28
+
correct final value.
29
+
30
+
The following example displays a sequence of events
31
+
over time: t1 to t9. This Active-Active database has two member Active-Active databases: member CRDB1 and
32
+
member CRDB2. The local operations running in each member Active-Active database are
33
+
listed under the member Active-Active database name. The sync events represent the moment
34
+
when synchronization catches up to distribute all local member Active-Active database
34
35
updates to other participating clusters and other member Active-Active databases.
35
36
36
37
|**Time**|**Member CRDB1**|**Member CRDB2**|
@@ -47,44 +48,44 @@ updates to other participating clusters and other member Active-Active databases
47
48
48
49
Databases provide various approaches to address some of these concerns:
49
50
50
-
- Active-Passive Geo-distributed deployments: With active-passive
51
-
distributions, all writes go to an active cluster. Redis Enterprise
52
-
provides a "Replica Of" capability that provides a similar approach.
53
-
This can be employed when the workload is heavily balanced towards
54
-
read and few writes. However, WAN performance and availability
55
-
is quite flaky and traveling large distances for writes take away
51
+
-**Active-Passive geo-distributed deployments**: With active-passive
52
+
distributions, all writes go to an active cluster. Redis Enterprise Sofware
53
+
provides a [Replica Of]({{<relref "/operate/rs/databases/import-export/replica-of/">}}) capability that provides a similar approach.
54
+
This can be employed when the workload is heavily balanced toward
55
+
reads and few writes. However, WAN performance and availability
56
+
can be unreliable, and traveling large distances for writes takes away
56
57
from application performance and availability.
57
-
- Two-phase Commit (2PC): This approach is designed around a protocol
58
+
-**Two-phase commit (2PC)**: This approach is designed around a protocol
58
59
that commits a transaction across multiple transaction managers.
59
60
Two-phase commit provides a consistent transactional write across
60
61
regions but fails transactions unless all participating transaction
61
-
managers are "available" at the time of the transaction. The number
62
+
managers are available at the time of the transaction. The number
62
63
of messages exchanged and its cross-regional availability
63
64
requirement make two-phase commit unsuitable for even moderate
64
65
throughputs and cross-geo writes that go over WANs.
65
-
- Sync update with Quorum-based writes: This approach synchronously
66
-
coordinates a write across majority number of replicas across
66
+
-**Sync update with quorum-based writes**: This approach synchronously
67
+
coordinates a write across the majority of replicas across
67
68
clusters spanning multiple regions. However, just like two-phase
68
-
commit, number of messages exchanged and its cross-regional
69
+
commit, the number of messages exchanged and its cross-regional
69
70
availability requirement make geo-distributed quorum writes
70
-
unsuitable for moderate throughputs and crossgeo writes that go
71
+
unsuitable for moderate throughputs and cross-geo writes that go
71
72
over WANs.
72
-
- Last-Writer-Wins (LWW) Conflict Resolution: Some systems provide
73
-
simplistic conflict resolution for all types of writes where the
73
+
-**Last-Writer-Wins (LWW) conflict resolution**: Some systems provide
74
+
simplistic conflict resolution for all types of writes where
74
75
system clocks are used to determine the winner across conflicting
75
76
writes. LWW is lightweight and can be suitable for simpler data.
76
77
However, LWW can be destructive to updates that are not necessarily
77
-
conflicting. For example adding a new element to a set across two
78
+
conflicting. For example, adding a new element to a set across two
78
79
geographies concurrently would result in only one of these new
79
80
elements appearing in the final result with LWW.
80
-
- MVCC (multi-version concurrency control): MVCC systems maintain
81
+
-**MVCC (multi-version concurrency control)**: MVCC systems maintain
81
82
multiple versions of data and may expose ways for applications to
82
-
resolve conflicts. Even though MVCC system can provide a flexible
83
-
way to resolve conflicting writes, it comes at a cost of great
83
+
resolve conflicts. Even though an MVCC system can provide a flexible
84
+
way to resolve conflicting writes, it comes at the cost of great
84
85
complexity in the development of a solution.
85
86
86
87
Even though types and commands in Active-Active databases look identical to standard Redis
87
-
types and commands, the underlying types in RS are enhanced to maintain
88
+
types and commands, the underlying types in Redis Enterprise Software are enhanced to maintain
88
89
more metadata to create the conflict-free data type experience. This
89
90
section explains what you need to know about developing with Active-Active databases on
90
91
Redis Enterprise Software.
@@ -97,7 +98,7 @@ execute them in script-replication mode.
97
98
98
99
## Eviction
99
100
100
-
The default policy for Active-Active databases is _noeviction_ mode. Redis Enterprise version 6.0.20 and later support all eviction policies for Active-Active databases, unless [Auto Tiering]({{< relref "/operate/rs/databases/auto-tiering" >}})(previously known as Redis on Flash) is enabled.
101
+
The default policy for Active-Active databases is _noeviction_ mode. Redis Enterprise Software version 6.0.20 and later support all eviction policies for Active-Active databases, unless [Auto Tiering]({{< relref "/operate/rs/databases/auto-tiering" >}})(previously known as Redis on Flash) is enabled.
101
102
For details, see [eviction for Active-Active databases]({{< relref "/operate/rs/databases/memory-performance/eviction-policy#active-active-database-eviction" >}}).
102
103
103
104
@@ -107,21 +108,21 @@ Expiration is supported with special multi-master semantics.
107
108
108
109
If a key's expiration time is changed at the same time on different
109
110
members of the Active-Active database, the longer extended time set via TTL on a key is
110
-
preserved. As an example:
111
+
preserved.
111
112
112
-
If this command was performed on key1 on cluster #1
113
+
If this command was performed on key1 on cluster #1:
113
114
114
115
```sh
115
116
127.0.0.1:6379> EXPIRE key1 10
116
117
```
117
118
118
-
And if this command was performed on key1 on cluster #2
119
+
If this command was performed on key1 on cluster #2:
119
120
120
121
```sh
121
122
127.0.0.1:6379> EXPIRE key1 50
122
123
```
123
124
124
-
The EXPIRE command setting the key to 50 would win.
125
+
The `EXPIRE` command setting the key to 50 would win.
125
126
126
127
And if this command was performed on key1 on cluster #3:
127
128
@@ -132,54 +133,61 @@ And if this command was performed on key1 on cluster #3:
132
133
It would win out of the three clusters hosting the Active-Active database as it sets the
133
134
TTL on key1 to an infinite time.
134
135
135
-
The replica responsible for the "winning" expire value is also
136
-
responsible to expire the key and propagate a DEL effect when this
137
-
happens. A "losing" replica is from this point on not responsible
138
-
for expiring the key, unless another EXPIRE command resets the TTL.
139
-
Furthermore, a replica that is NOT the "owner" of the expired value:
136
+
The replica responsible for the winning expire value is also
137
+
responsible for expiring the key and propagating a DEL effect when this
138
+
happens. A losing replica is from this point on not responsible
139
+
for expiring the key, unless another `EXPIRE` command resets the TTL.
140
+
Furthermore, a replica that is not the owner of the expired value:
140
141
141
142
- Silently ignores the key if a user attempts to access it in READ
142
-
mode, e.g. treating it as if it was expired but not propagating a
143
+
mode, for example, treating it as if it was expired but not propagating a
143
144
DEL.
144
145
- Expires it (sending a DEL) before making any modifications if a user
145
146
attempts to access it in WRITE mode.
146
147
147
148
{{< note >}}
148
-
Expiration values are in the range of [0, 2^49] for Active-Active databases and [0, 2^64] for non Active-Active databases.
149
+
Expiration values are in the range of [0, 2^49] for Active-Active databases and [0, 2^64] for regular databases.
149
150
{{< /note >}}
150
151
152
+
## Tombstones
153
+
154
+
For conflict resolution purposes, Active-Active databases cannot immediately release a deleted key. Instead, the key is logically deleted but remains in memory as a tombstone until the garbage collector can safely remove it.
155
+
156
+
When a deleted key becomes a tombstone, it frees some memory previously consumed by the key. The size of each tombstone varies depending on the data type and the key's history.
157
+
158
+
The garbage collector automatically removes a tombstone when all instances in the Active-Active database have observed the deletion operation.
159
+
160
+
To monitor tombstones, you can use shard-level metrics exposed by [`INFO crdt`](#info) and Grafana.
161
+
151
162
## Out-of-Memory (OOM) {#outofmemory-oom}
152
163
153
-
If a member Active-Active database is in an out of memory situation, that member is marked
154
-
"inconsistent" by RS, the member stops responding to user traffic, and
164
+
If a member Active-Active database is out of memory, that member is marked as
165
+
"inconsistent", the member stops responding to user traffic, and
155
166
the syncer initiates full reconciliation with other peers in the Active-Active database.
156
167
157
-
## Active-Active Database Key Counts
168
+
## Active-Active database key counts
158
169
159
170
Keys are counted differently for Active-Active databases:
160
171
161
172
- DBSIZE (in `shard-cli dbsize`) reports key header instances
162
173
that represent multiple potential values of a key before a replication conflict is resolved.
163
174
- expired_keys (in `bdb-cli info`) can be more than the keys count in DBSIZE (in `shard-cli dbsize`)
164
175
because expires are not always removed when a key becomes a tombstone.
165
-
A tombstone is a key that is logically deleted but still takes memory
166
-
until it is collected by the garbage collector.
167
176
- The Expires average TTL (in `bdb-cli info`) is computed for local expires only.
168
177
169
178
## INFO
170
179
171
-
The INFO command has an additional crdt section which provides advanced
172
-
troubleshooting information (applicable to support etc.):
180
+
The `INFO` command has an additional CRDT section that provides advanced troubleshooting information:
173
181
174
182
|**Section**|**Field**|**Description**|
175
183
| ------ | ------ | ------ |
176
-
|**CRDT Context**| crdt_config_version |Currently active Active-Active database configuration version. |
|**Peers**| A list of currently connected Peer Replication peers. This is similar to the slaves list reported by Redis.||
182
-
|**Backlogs**| A list of Peer Replication backlogs currently maintained. Typically in a full mesh topology only a single backlog is used for all peers, as the requested Ids are identical.||
189
+
|**Peers**||A list of currently connected peer replication peers. This is similar to the replicas list reported by Redis. |
190
+
|**Backlogs**||A list of peer replication backlogs currently maintained. Typically in a full mesh topology, only a single backlog is used for all peers, as the requested IDs are identical. |
183
191
|**CRDT Stats**| crdt_sync_full | Number of inbound full synchronization processes performed. |
184
192
|| crdt_sync_partial_ok | Number of partial (backlog based) re-synchronization processes performed. |
185
193
|| crdt_sync_partial-err | Number of partial re-synchronization processes failed due to exhausted backlog. |
@@ -188,13 +196,13 @@ troubleshooting information (applicable to support etc.):
188
196
|| crdt_ovc_filtered_effect_reqs | Number of inbound effect requests filtered due to old vector clock. |
189
197
|| crdt_gc_pending | Number of elements pending garbage collection. |
190
198
|| crdt_gc_attempted | Number of attempts to garbage collect tombstones. |
191
-
|| crdt_gc_collected | Number of tombstones garbaged collected successfully. |
199
+
|| crdt_gc_collected | Number of tombstones garbage collected successfully. |
192
200
|| crdt_gc_gvc_min | The minimal globally observed vector clock, as computed locally from all received observed clocks. |
193
-
|| crdt_stale_released_with_merge | Indicates last stale flag transition was a result of a complete full sync. |
194
-
|**CRDT Replicas**| A list of crdt_replica \<uid> entries, each describes the known state of a remote instance with the following fields:||
201
+
|| crdt_stale_released_with_merge | Indicates the last stale flag transition was a result of a complete full sync. |
202
+
|**CRDT Replicas**||A list of crdt_replica \<uid> entries, each describes the known state of a remote instance with the following fields: |
195
203
|| config_version | Last configuration version reported. |
196
204
|| shards | Number of shards. |
197
205
|| slots | Total number of hash slots. |
198
-
|| slot_coverage | A flag indicating remote shards provide full coverage (i.e. all shards are alive). |
199
-
|| max_ops_lag | Number of local operations not yet observed by the least updated remote shard |
200
-
|| min_ops_lag | Number of local operations not yet observed by the most updated remote shard |
206
+
|| slot_coverage | A flag indicating remote shards provide full coverage (all shards are alive). |
207
+
|| max_ops_lag | Number of local operations not yet observed by the least updated remote shard.|
208
+
|| min_ops_lag | Number of local operations not yet observed by the most updated remote shard.|
0 commit comments