|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Cassandra |
| 4 | +parent: Data Type Conversion |
| 5 | +nav_order: 4 |
| 6 | +--- |
| 7 | + |
| 8 | +# Schema migration for Cassandra |
| 9 | +{: .no_toc } |
| 10 | + |
| 11 | +Spanner migration tool makes some assumptions while performing data type conversion from Cassandra to Spanner(GoogleSQL). |
| 12 | +There are also nuances to handling certain specific data types. These are captured below. |
| 13 | + |
| 14 | +### Adapter Compatibility: |
| 15 | +The Spanner migration tool supports only schema migration from Cassandra to the GoogleSQL dialect of Spanner. The generated schema includes `cassandra_type` annotations, ensuring compatibility with the [Cassandra Adapter](https://cloud.google.com/spanner/docs/non-relational/connect-cassandra-adapter), which allows existing Cassandra applications to connect to Google Cloud Spanner (GoogleSQL) with minimal or no code changes. |
| 16 | + |
| 17 | +<details open markdown="block"> |
| 18 | + <summary> |
| 19 | + Table of contents |
| 20 | + </summary> |
| 21 | + {: .text-delta } |
| 22 | +1. TOC |
| 23 | +{:toc} |
| 24 | +</details> |
| 25 | + |
| 26 | +## Data Type Mapping |
| 27 | + |
| 28 | +The Spanner migration tool maps Cassandra primitive types to Spanner(GoogleSQL) types as follows: |
| 29 | + |
| 30 | +| **Cassandra Type** | **Spanner(GoogleSQL) Type** | **Notes** | |
| 31 | +|:-------------------------------------------------:|:---------------------------:|:--------------------------------------------------------:| |
| 32 | +| `ASCII` | `STRING(MAX)` | | |
| 33 | +| `BIGINT` | `INT64` | | |
| 34 | +| `BLOB` | `BYTES(MAX)` | | |
| 35 | +| `BOOLEAN` | `BOOL` | | |
| 36 | +| `COUNTER` | `INT64` | Spanner(GoogleSQL) does not support a counter data type | |
| 37 | +| `DATE` | `DATE` | | |
| 38 | +| `DECIMAL`, `VARINT` | `NUMERIC` | Potential changes of precision | |
| 39 | +| `DOUBLE` | `FLOAT64` | | |
| 40 | +| `FLOAT` | `FLOAT32` | | |
| 41 | +| `INET` | `STRING(MAX)` | | |
| 42 | +| `INT`, `SMALLINT`,<br/>`TINYINT` | `INT64` | Changes in storage size | |
| 43 | +| `TEXT` | `STRING(MAX)` | | |
| 44 | +| `TIME` | `INT64` | Spanner(GoogleSQL) doesn't support a time data type | |
| 45 | +| `TIMESTAMP` | `TIMESTAMP` | | |
| 46 | +| `UUID`, `TIMEUUID` | `STRING(MAX)` | Spanner(GoogleSQL) doesn't validate the uuid | |
| 47 | +| `VARCHAR` | `STRING(MAX)` | | |
| 48 | + |
| 49 | +Unlike primitive types, Cassandra's collection types such as Maps, Sets, and Lists do not have direct, one-to-one equivalents in |
| 50 | +Spanner(GoogleSQL). Their mapping typically involves: |
| 51 | + |
| 52 | +| **Cassandra Type** | **Spanner(GoogleSQL) Type** | **Notes** | |
| 53 | +|:-------------------------------------------------:|:---------------------------:|:----------------------------------------------------------------------------------------------------:| |
| 54 | +| `SET` | `ARRAY` | Spanner(GoogleSQL) doesn't support a dedicated set data type. Use ARRAY columns to represent a set | |
| 55 | +| `LIST` | `ARRAY` | Use ARRAY to store a list of typed objects | |
| 56 | +| `MAP` | `JSON` | Spanner(GoogleSQL) doesn't support a dedicated map type. Use JSON columns to represent maps | |
| 57 | + |
| 58 | +Spanner(GoogleSQL) does not support `duration` datatype of Cassandra. Along with `duration` |
| 59 | +datatype, all other types map to `STRING(MAX)`. |
| 60 | + |
| 61 | +## DECIMAL and VARINT |
| 62 | + |
| 63 | +[Spanner(GoogleSQL)'s NUMERIC |
| 64 | +type](https://cloud.google.com/spanner/docs/data-types#decimal_type) can store |
| 65 | +up to 29 digits before the decimal point and up to 9 after the decimal point. |
| 66 | +Cassandra's DECIMAL type can potentially support higher precision than this, so |
| 67 | +please verify that Spanner(GoogleSQL)'s NUMERIC support meets your application needs. Note |
| 68 | +that the remarks about DECIMAL apply equally to VARINT. |
| 69 | + |
| 70 | +## UUID and TIMEUUID |
| 71 | + |
| 72 | +Cassandra has two primary identifier types often used for unique keys: `UUID` and `TIMEUUID`. |
| 73 | +UUID is a standard Type 4 UUID, generally randomly generated. TIMEUUID is a Type 1 UUID, which |
| 74 | +embeds a timestamp and is time-ordered, providing a natural chronological sorting. Cassandra's |
| 75 | +drivers and functions are aware of the internal structure of these types. |
| 76 | + |
| 77 | +Spanner(GoogleSQL) does not have a native `UUID` or `TIMEUUID` data type. Instead, these are typically |
| 78 | +stored using the `STRING` type (for the hexadecimal string representation) |
| 79 | +or `BYTES` (specifically `BYTES(16)` for the 16-byte UUID value) |
| 80 | + |
| 81 | +When storing `UUID` or `TIMEUUID` data in Spanner(GoogleSQL), it does not perform intrinsic validation |
| 82 | +of the UUID's internal structure or format (e.g., checking for correct version bits, variant bits, |
| 83 | +or a valid time component for TIMEUUID) from the source. |
| 84 | + |
| 85 | +## COUNTER |
| 86 | + |
| 87 | +Cassandra's `COUNTER` type provides atomic, distributed increments/decrements. |
| 88 | + |
| 89 | +Spanner(GoogleSQL) doesn't have a direct equivalent to Cassandra's `COUNTER`. While we typically map this |
| 90 | +data to an `INT64` column in Spanner(GoogleSQL), you'll need to implement counter logic within your |
| 91 | +application's transactions (read, increment, write) to ensure correctness. |
| 92 | + |
| 93 | +## DURATION |
| 94 | + |
| 95 | +Cassandra has a `DURATION` type for periods of time. Spanner(GoogleSQL) doesn't have a native equivalent, |
| 96 | +so we typically map this to a `STRING` (e.g., ISO 8601 format). So please ensure that your |
| 97 | +application handles this. |
| 98 | + |
| 99 | +## TIME |
| 100 | + |
| 101 | +Cassandra has a `TIME` type for the time of day (nanoseconds since midnight). Spanner(GoogleSQL) doesn't |
| 102 | +have a native equivalent, so we typically map this to an `INT64` to store nanosecond duration. |
| 103 | +So please ensure that your application handles this. |
| 104 | + |
| 105 | +## SET and LIST |
| 106 | + |
| 107 | +Cassandra uses `SET` (an unordered collection of unique elements) and |
| 108 | +`LIST` (an ordered collection of non-unique elements). |
| 109 | + |
| 110 | +Both of these are typically mapped to Spanner(GoogleSQL)'s `ARRAY` type (e.g., `SET<TEXT>` to |
| 111 | +`ARRAY<STRING(MAX)>`, `LIST<INT>` to `ARRAY<INT64>`). When mapping `SET` to `ARRAY`, |
| 112 | +note that Spanner(GoogleSQL)'s ARRAY is ordered and allows duplicates. Therefore, your application |
| 113 | +must handle uniqueness if required. |
| 114 | + |
| 115 | +## MAP |
| 116 | + |
| 117 | +Cassandra uses `MAP` for storing typed key-value pairs. Spanner(GoogleSQL) does not have a native `MAP` type. |
| 118 | +Cassandra's `MAP` typically maps to Spanner(GoogleSQL)'s `JSON` type. Unlike Cassandra, Spanner(GoogleSQL) does not |
| 119 | +validate the internal `JSON` structure or types, so your application must ensure data integrity. |
| 120 | + |
| 121 | +## Storage Use |
| 122 | + |
| 123 | +The Spanner migration tool maps several Cassandra types to Spanner(GoogleSQL) types that use more storage. |
| 124 | +For example, `SMALLINT` is a two-byte integer, but it maps to Spanner(GoogleSQL)'s `INT64`, |
| 125 | +an eight-byte integer. |
| 126 | + |
| 127 | +## Primary Keys |
| 128 | + |
| 129 | +Spanner(GoogleSQL) requires primary keys for all tables. Spanner(GoogleSQL)'s primary key is derived |
| 130 | +as a composite of the Cassandra partition key and clustering key. |
| 131 | + |
| 132 | +## Column Nullability |
| 133 | + |
| 134 | +Cassandra does not enforce all columns on all rows, so corresponding Spanner(GoogleSQL) columns are |
| 135 | +created as `NULLABLE` by default. Spanner(GoogleSQL) primary key columns, however, are inherently `NOT NULL`. |
| 136 | +We can explicitly define other columns as NOT NULL in Spanner(GoogleSQL) if Cassandra data guarantees a value. |
| 137 | + |
| 138 | +## Foreign Keys |
| 139 | + |
| 140 | +Cassandra does not support native foreign key constraints. Therefore, no such constraints exist |
| 141 | +to convert when migrating from Cassandra to Spanner(GoogleSQL). |
| 142 | + |
| 143 | +## Secondary Indexes |
| 144 | + |
| 145 | +The tool currently doesn't support the migration of Cassandra secondary indexes to Spanner(GoogleSQL) secondary indexes. |
| 146 | + |
| 147 | +## Other Cassandra Types |
| 148 | +Cassandra's other complex types, such as nested collection types and User Defined Types (UDTs), are currently |
| 149 | +not natively supported in Spanner(GoogleSQL). By default, these types are mapped to `STRING(MAX)`. |
| 150 | + |
| 151 | +## Note |
| 152 | +See [Migrating from Cassandra to Cloud Spanner(GoogleSQL)](https://cloud.google.com/spanner/docs/non-relational/migrate-from-cassandra-to-spanner) |
| 153 | +for details on data migration since currently SMT supports schema only migration. |
0 commit comments