Skip to content

Commit ba9790a

Browse files
Arra, Praveen Rjyothsnakonisa
authored andcommitted
Updated documentation on tombstones with examples
Patch by Arra Praveen; reviewed by Jyothsna Konisa, Brad Schoening for CASSANDRA-20800
1 parent a144060 commit ba9790a

File tree

1 file changed

+78
-16
lines changed

1 file changed

+78
-16
lines changed

doc/modules/cassandra/pages/managing/operating/compaction/tombstones.adoc

Lines changed: 78 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,18 @@
44

55
== What are tombstones?
66

7-
{cassandra}'s processes for deleting data are designed to improve performance, and to work with {cassandra}'s built-in properties for data distribution and fault-tolerance.
7+
{cassandra}'s processes for deleting data are designed to be efficient, and to work with {cassandra}'s native features for data distribution and fault-tolerance.
88

99
{cassandra} treats a deletion as an insertion, and inserts a time-stamped deletion marker called a tombstone.
1010
The tombstones go through {cassandra}'s write path, and are written to SSTables on one or more nodes.
1111
The key feature difference of a tombstone is that it has a built-in expiration date/time.
12-
At the end of its expiration period, the grace period, the tombstone is deleted as part of {cassandra}'s normal compaction process.
12+
At the end of its expiration period, called the grace period, the tombstone is deleted as part of {cassandra}'s normal compaction process.
1313

1414
[NOTE]
1515
====
16-
You can also mark a {cassandra} row or column with a time-to-live (TTL) value.
17-
After this amount of time has ended, {cassandra} marks the object with a tombstone, and handles it like other tombstoned objects.
16+
In {cassandra}, you can assign a time-to-live (TTL) to a row or column. Once the TTL expires, the data is eligible for removal.
17+
During compaction, if the `gc_grace_seconds` period is still active, {cassandra} marks the data as expired, handling it like any other deleted item.
18+
After `gc_grace_seconds` has elapsed, the data is eligible for permanent removal.
1819
====
1920

2021
== Why tombstones?
@@ -23,14 +24,14 @@ The tombstone represents the deletion of an object, either a row or column value
2324
This approach is used instead of removing values because of the distributed nature of {cassandra}.
2425
Once an object is marked as a tombstone, queries will ignore all values that are time-stamped previous to the tombstone insertion.
2526

26-
== Zombies
27+
== Preventing Data Resurrection
2728

28-
In a multi-node cluster, {cassandra} may store replicas of the same data on two or more nodes.
29-
This helps prevent data loss, but it complicates the deletion process.
30-
If a node receives a delete command for data it stores locally, the node tombstones the specified object and tries to pass the tombstone to other nodes containing replicas of that object.
31-
But if one replica node is unresponsive at that time, it does not receive the tombstone immediately, so it still contains the pre-delete version of the object.
32-
If the tombstoned object has already been deleted from the rest of the cluster before that node recovers, {cassandra} treats the object on the recovered node as new data, and propagates it to the rest of the cluster.
33-
This kind of deleted but persistent object is called a https://cassandra.apache.org/_/glossary.html#zombie[zombie].
29+
In a multi-node {cassandra} cluster, data is often replicated across several nodes to safeguard against loss.
30+
However, this replication can make deletions more complex.
31+
When a node receives a request to delete data, it marks the item with a tombstone and attempts to share this tombstone with other nodes that hold copies of the same data.
32+
If one of these replica nodes is offline or unreachable during the deletion, it won’t get the tombstone right away and will continue to store the original, undeleted data.
33+
If the rest of the cluster purges the tombstoned data before the offline node comes back online, {cassandra} may mistakenly treat the data on the recovered node as live and repair may replicate it across the cluster again.
34+
This scenario, where deleted data reappears, is known as a https://cassandra.apache.org/_/glossary.html#zombie[zombie].
3435

3536
== Grace period
3637

@@ -52,10 +53,10 @@ After the tombstone's grace period ends, {cassandra} deletes the tombstone durin
5253

5354
== Deletion
5455

55-
After `gc_grace_seconds` has expired the tombstone may be removed (meaning there will no longer be any object that a certain piece of data was
56-
deleted).
57-
But one complication for deletion is that a tombstone can live in one SSTable and the data it marks for deletion in another, so a compaction must also remove both SSTables.
58-
More precisely, drop an actual tombstone the:
56+
Once the `gc_grace_seconds` period has passed, the tombstone can be removed, meaning there will no longer be any record indicating that a specific piece of data was deleted.
57+
However, deleting data can be complicated because the tombstone might exist in one SSTable while the data it marks for deletion is in another.
58+
Therefore, a compaction process must remove both SSTables.
59+
More specifically, a tombstone is only dropped when:
5960

6061
* The tombstone must be older than `gc_grace_seconds`.
6162
Note that tombstones will not be removed until a compaction event even if `gc_grace_seconds` has elapsed.
@@ -124,6 +125,67 @@ To avoid keeping tombstones forever, we set `gc_grace_seconds` for every table i
124125

125126
If an SSTable contains only tombstones and it is guaranteed that SSTable is not shadowing data in any other SSTable, then the compaction can drop
126127
that SSTable.
127-
If you see SSTables with only tombstones (note that TTL'd data is considered tombstones once the time-to-live has expired), but it is not being dropped by compaction, it is likely that other SSTables contain older data.
128+
If you observe SSTables that contain only tombstones or expired TTL data, and compaction is not removing them, it likely indicates that older versions of the data still exist in other SSTables.
128129
There is a tool called `sstableexpiredblockers` that will list which SSTables are droppable and which are blocking them from being dropped.
129130
With `TimeWindowCompactionStrategy` it is possible to remove the guarantee (not check for shadowing data) by enabling `unsafe_aggressive_sstable_expiration`.
131+
132+
133+
== Examples
134+
135+
Below is the sstabledump output showing a live row with expired flag as "false":
136+
[source,json]
137+
----
138+
{
139+
"partition" : {
140+
"key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
141+
"position" : 0
142+
},
143+
"rows" : [
144+
{
145+
"type" : "row",
146+
"position" : 36,
147+
"clustering" : [ "2025-11-05T22:43:20.833Z" ],
148+
"liveness_info" : { "tstamp" : "2025-11-05T22:43:20.8326Z", "ttl" : 86400, "expires_at" : "2025-11-06T22:43:20Z", "expired" : false },
149+
"cells" : [
150+
{ "name" : "activity_details", "value" : "details_for_row_3099" },
151+
{ "name" : "activity_type", "value" : "type_7" }
152+
]
153+
}
154+
]
155+
}
156+
----
157+
158+
Below is the sstabledump output showing expired flag as "true" when TTL has expired (and is considered as tombstone when read):
159+
[source,json]
160+
----
161+
{
162+
"partition" : {
163+
"key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
164+
"position" : 0
165+
},
166+
"rows" : [
167+
{
168+
"type" : "row",
169+
"position" : 30,
170+
"clustering" : [ "2025-11-05T22:43:20.833Z" ],
171+
"liveness_info" : { "tstamp" : "2025-11-05T22:43:20.832650Z", "ttl" : 86400, "expires_at" : "2025-11-06T22:43:20Z", "expired" : true },
172+
"cells" : [
173+
{ "name" : "activity_details", "deletion_info" : { "local_delete_time" : "2025-11-05T22:43:20Z" } },
174+
{ "name" : "activity_type", "deletion_info" : { "local_delete_time" : "2025-11-05T22:43:20Z" } }
175+
]
176+
}
177+
]
178+
}
179+
----
180+
181+
Below is the output from sstabledump, showing how a delete operation generates a tombstone:
182+
[source,json]
183+
----
184+
{
185+
"partition" : {
186+
"key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
187+
"position" : 31,
188+
"deletion_info" : { "marked_deleted" : "2025-12-02T21:58:26.185187Z", "local_delete_time" : "2025-12-02T21:58:26Z" }
189+
}
190+
}
191+
----

0 commit comments

Comments
 (0)