You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:navtitle: Cassandra table schema evolution with CDC
7
7
8
-
NOTE: This article is a continuation of the "xref:change-data-capture/index.adoc[]" article. Please read that ahead of this article to understand the fundamentals of what resources are being used.
8
+
[NOTE]
9
+
====
10
+
This article is a continuation of the xref:change-data-capture/index.adoc[] article. Please read that article first to understand the fundamentals of what resources are being used.
11
+
====
9
12
10
-
The message schema is of particular importance in completing the CDC pattern. Initially, it is set to match the Cassandra table’s schema as closely as possible. But some data types are not known in Pulsar (or more accurately, Avro). To overcome this, there are adaptations performed when the Cassandra Connector builds the Pulsar message. Some types are not compatible and not able to be adapted. In this case, those columns of data are dropped while creating the Pulsar message.
13
+
The message schema is of particular importance in completing the CDC pattern. Initially, it is set to match the Cassandra table’s schema as closely as possible, but some data types are not known in Pulsar (or more accurately, not known in Avro). To overcome this, there are adaptations performed when the Cassandra Source Connector builds the Pulsar message. Some types are not compatible and not able to be adapted. In this case, those columns of data are dropped while creating the Pulsar message.
11
14
12
15
To better understand how exactly the CDC agent constructs the event message, here is the pseudo code of how the schema is created:
Notice the two types used in KeyValue. The byte array is an Avroencoded record that documents the table's primary key(s). The MutationValue is an extended Avro record that has direction on what changed and how to get its specifics.
29
+
Notice the two types used in KeyValue. The byte array is an Avro-encoded record that documents the table's primary key(s). The MutationValue is an extended Avro record that has direction on what changed and how to get its specifics.
27
30
28
31
CDC sets the initial topic schema on the first change it detects. Once the initial topic schema has been set, a “happy path” has been established to create change data events in Pulsar.
29
32
@@ -38,19 +41,19 @@ Here is a brief summary of how the data message schema is created:
38
41
39
42
== Adding a table column
40
43
41
-
This is the easiest of scenarios for table design change. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Note that because the schema compatibility is BACKWARD_TRANSITIVE, the new column will need to be optional. Which is the default of any non-primary-key column.
44
+
This is the easiest of scenarios for table design change. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Note that because the schema auto-update compatibility strategy is set to BACKWARD_TRANSITIVE, the new column must be optional, which is the default of any non-primary-key column.
Altering a table column includes renaming a column or changing a column’s type. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Once a table has been created, a table’s primary key(s) can not be modified. This fits well with the CDC pattern.
52
+
Altering a table column includes renaming a column or changing a column’s type. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing schema and message compatibility will be kept. Once a table has been created, a table’s primary key(s) can not be modified. This fits well with the CDC pattern.
50
53
51
-
While technically updating columns is possible when CDC is enabled, it is not recommended. Instead, changes to a Cassandra table should be additive only. If you are familiar with data migrations, this concept is the same. If you need to change the name or type of table column, add a new column. The resulting event messages will have a reference to both columns, and you can handle this migration downstream.
54
+
While technically updating columns is possible when CDC is enabled, it is not recommended. Instead, changes to a Cassandra table should be additive only. (If you are familiar with data migrations, this concept is the same). To change the name or type of table column, add a new column. The resulting event messages will have a reference to both columns, and you can handle this migration downstream.
52
55
53
-
Note that this recommendation assumes a schema-compatibility-strategy of BACKWARD_TRANSITIVE. If you are using a different strategy, table updates will be handled differently.
56
+
Note that this recommendation assumes a schemacompatibilitystrategy of BACKWARD_TRANSITIVE. If you are using a different schema compatibility strategy, table updates will be handled differently.
54
57
55
58
== Removing a table column
56
59
@@ -62,4 +65,4 @@ An example of removing a column:
62
65
63
66
== Next
64
67
65
-
Now let's move on to consuming event data in Pulsar xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[].
68
+
Let's move on to consuming event data in Pulsar! xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[].
0 commit comments