Skip to content

Commit f963f7c

Browse files
committed
cleanup-table-scheme
1 parent 3f9d54b commit f963f7c

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

modules/use-cases-architectures/pages/change-data-capture/table-schema-evolution.adoc

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,12 @@ David Dieruf <[email protected]>
55
:title: Cassandra table schema evolution with CDC
66
:navtitle: Cassandra table schema evolution with CDC
77

8-
NOTE: This article is a continuation of the "xref:change-data-capture/index.adoc[]" article. Please read that ahead of this article to understand the fundamentals of what resources are being used.
8+
[NOTE]
9+
====
10+
This article is a continuation of the xref:change-data-capture/index.adoc[] article. Please read that article first to understand the fundamentals of what resources are being used.
11+
====
912

10-
The message schema is of particular importance in completing the CDC pattern. Initially, it is set to match the Cassandra table’s schema as closely as possible. But some data types are not known in Pulsar (or more accurately, Avro). To overcome this, there are adaptations performed when the Cassandra Connector builds the Pulsar message. Some types are not compatible and not able to be adapted. In this case, those columns of data are dropped while creating the Pulsar message.
13+
The message schema is of particular importance in completing the CDC pattern. Initially, it is set to match the Cassandra table’s schema as closely as possible, but some data types are not known in Pulsar (or more accurately, not known in Avro). To overcome this, there are adaptations performed when the Cassandra Source Connector builds the Pulsar message. Some types are not compatible and not able to be adapted. In this case, those columns of data are dropped while creating the Pulsar message.
1114

1215
To better understand how exactly the CDC agent constructs the event message, here is the pseudo code of how the schema is created:
1316

@@ -23,7 +26,7 @@ Schema<KeyValue<byte[], MutationValue>> keyValueSchema = Schema.KeyValue(
2326
);
2427
----
2528

26-
Notice the two types used in KeyValue. The byte array is an Avro encoded record that documents the table's primary key(s). The MutationValue is an extended Avro record that has direction on what changed and how to get its specifics.
29+
Notice the two types used in KeyValue. The byte array is an Avro-encoded record that documents the table's primary key(s). The MutationValue is an extended Avro record that has direction on what changed and how to get its specifics.
2730

2831
CDC sets the initial topic schema on the first change it detects. Once the initial topic schema has been set, a “happy path” has been established to create change data events in Pulsar.
2932

@@ -38,19 +41,19 @@ Here is a brief summary of how the data message schema is created:
3841

3942
== Adding a table column
4043

41-
This is the easiest of scenarios for table design change. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Note that because the schema compatibility is BACKWARD_TRANSITIVE, the new column will need to be optional. Which is the default of any non-primary-key column.
44+
This is the easiest of scenarios for table design change. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Note that because the schema auto-update compatibility strategy is set to BACKWARD_TRANSITIVE, the new column must be optional, which is the default of any non-primary-key column.
4245

4346
An example of adding a column:
4447

4548
`ALTER TABLE [keyspace_name.] table_name ADD my-super-awesome-column text;`
4649

4750
== Updating a table column
4851

49-
Altering a table column includes renaming a column or changing a column’s type. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Once a table has been created, a table’s primary key(s) can not be modified. This fits well with the CDC pattern.
52+
Altering a table column includes renaming a column or changing a column’s type. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing schema and message compatibility will be kept. Once a table has been created, a table’s primary key(s) can not be modified. This fits well with the CDC pattern.
5053

51-
While technically updating columns is possible when CDC is enabled, it is not recommended. Instead, changes to a Cassandra table should be additive only. If you are familiar with data migrations, this concept is the same. If you need to change the name or type of table column, add a new column. The resulting event messages will have a reference to both columns, and you can handle this migration downstream.
54+
While technically updating columns is possible when CDC is enabled, it is not recommended. Instead, changes to a Cassandra table should be additive only. (If you are familiar with data migrations, this concept is the same). To change the name or type of table column, add a new column. The resulting event messages will have a reference to both columns, and you can handle this migration downstream.
5255

53-
Note that this recommendation assumes a schema-compatibility-strategy of BACKWARD_TRANSITIVE. If you are using a different strategy, table updates will be handled differently.
56+
Note that this recommendation assumes a schema compatibility strategy of BACKWARD_TRANSITIVE. If you are using a different schema compatibility strategy, table updates will be handled differently.
5457

5558
== Removing a table column
5659

@@ -62,4 +65,4 @@ An example of removing a column:
6265

6366
== Next
6467

65-
Now let's move on to consuming event data in Pulsar xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[].
68+
Let's move on to consuming event data in Pulsar! xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[].

0 commit comments

Comments
 (0)