Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/cli/flags.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ such as `avro` etc.

* **`user`**: Specifies the user for the source database.

* **`dbName`**: Specifies the name of the source database.
* **`dbName`**: Specifies the name of the source database. For Cassandra, this corresponds to the keyspace.

* **`port`**: Specifies the port for the source database.

* **`password`**: Specifies the password for the source database.

* **`datacenter`**: Optional flag. Specifies the datacenter for the source database. This parameter is specific to Cassandra source and will be ignored for all other databases.

* **`streamingCfg`**: Optional flag. Specifies the file path for streaming config.
Please note that streaming migration is only supported for MySQL and PostgreSQL databases currently.
Here is an example of a [streamingCfg JSON](./config-json.md#streamingcfg-for-non-sharded-minimal-downtime-migrations) and [how to use it in the CLI](./schema-and-data.md#examples).
Expand Down
153 changes: 153 additions & 0 deletions docs/data-types/cassandra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
layout: default
title: Cassandra
parent: Data Type Conversion
nav_order: 4
---

# Schema migration for Cassandra
{: .no_toc }

Spanner migration tool makes some assumptions while performing data type conversion from Cassandra to Spanner(GoogleSQL).
There are also nuances to handling certain specific data types. These are captured below.

### Adapter Compatibility:
The Spanner migration tool supports only schema migration from Cassandra to the GoogleSQL dialect of Spanner. The generated schema includes `cassandra_type` annotations, ensuring compatibility with the [Cassandra Adapter](https://cloud.google.com/spanner/docs/non-relational/connect-cassandra-adapter), which allows existing Cassandra applications to connect to Google Cloud Spanner (GoogleSQL) with minimal or no code changes.

<details open markdown="block">
<summary>
Table of contents
</summary>
{: .text-delta }
1. TOC
{:toc}
</details>

## Data Type Mapping

The Spanner migration tool maps Cassandra primitive types to Spanner(GoogleSQL) types as follows:

| **Cassandra Type** | **Spanner(GoogleSQL) Type** | **Notes** |
|:-------------------------------------------------:|:---------------------------:|:--------------------------------------------------------:|
| `ASCII` | `STRING(MAX)` | |
| `BIGINT` | `INT64` | |
| `BLOB` | `BYTES(MAX)` | |
| `BOOLEAN` | `BOOL` | |
| `COUNTER` | `INT64` | Spanner(GoogleSQL) does not support a counter data type |
| `DATE` | `DATE` | |
| `DECIMAL`, `VARINT` | `NUMERIC` | Potential changes of precision |
| `DOUBLE` | `FLOAT64` | |
| `FLOAT` | `FLOAT32` | |
| `INET` | `STRING(MAX)` | |
| `INT`, `SMALLINT`,<br/>`TINYINT` | `INT64` | Changes in storage size |
| `TEXT` | `STRING(MAX)` | |
| `TIME` | `INT64` | Spanner(GoogleSQL) doesn't support a time data type |
| `TIMESTAMP` | `TIMESTAMP` | |
| `UUID`, `TIMEUUID` | `STRING(MAX)` | Spanner(GoogleSQL) doesn't validate the uuid |
| `VARCHAR` | `STRING(MAX)` | |

Unlike primitive types, Cassandra's collection types such as Maps, Sets, and Lists do not have direct, one-to-one equivalents in
Spanner(GoogleSQL). Their mapping typically involves:

| **Cassandra Type** | **Spanner(GoogleSQL) Type** | **Notes** |
|:-------------------------------------------------:|:---------------------------:|:----------------------------------------------------------------------------------------------------:|
| `SET` | `ARRAY` | Spanner(GoogleSQL) doesn't support a dedicated set data type. Use ARRAY columns to represent a set |
| `LIST` | `ARRAY` | Use ARRAY to store a list of typed objects |
| `MAP` | `JSON` | Spanner(GoogleSQL) doesn't support a dedicated map type. Use JSON columns to represent maps |

Spanner(GoogleSQL) does not support `duration` datatype of Cassandra. Along with `duration`
datatype, all other types map to `STRING(MAX)`.

## DECIMAL and VARINT

[Spanner(GoogleSQL)'s NUMERIC
type](https://cloud.google.com/spanner/docs/data-types#decimal_type) can store
up to 29 digits before the decimal point and up to 9 after the decimal point.
Cassandra's DECIMAL type can potentially support higher precision than this, so
please verify that Spanner(GoogleSQL)'s NUMERIC support meets your application needs. Note
that the remarks about DECIMAL apply equally to VARINT.

## UUID and TIMEUUID

Cassandra has two primary identifier types often used for unique keys: `UUID` and `TIMEUUID`.
UUID is a standard Type 4 UUID, generally randomly generated. TIMEUUID is a Type 1 UUID, which
embeds a timestamp and is time-ordered, providing a natural chronological sorting. Cassandra's
drivers and functions are aware of the internal structure of these types.

Spanner(GoogleSQL) does not have a native `UUID` or `TIMEUUID` data type. Instead, these are typically
stored using the `STRING` type (for the hexadecimal string representation)
or `BYTES` (specifically `BYTES(16)` for the 16-byte UUID value)

When storing `UUID` or `TIMEUUID` data in Spanner(GoogleSQL), it does not perform intrinsic validation
of the UUID's internal structure or format (e.g., checking for correct version bits, variant bits,
or a valid time component for TIMEUUID) from the source.

## COUNTER

Cassandra's `COUNTER` type provides atomic, distributed increments/decrements.

Spanner(GoogleSQL) doesn't have a direct equivalent to Cassandra's `COUNTER`. While we typically map this
data to an `INT64` column in Spanner(GoogleSQL), you'll need to implement counter logic within your
application's transactions (read, increment, write) to ensure correctness.

## DURATION

Cassandra has a `DURATION` type for periods of time. Spanner(GoogleSQL) doesn't have a native equivalent,
so we typically map this to a `STRING` (e.g., ISO 8601 format). So please ensure that your
application handles this.

## TIME

Cassandra has a `TIME` type for the time of day (nanoseconds since midnight). Spanner(GoogleSQL) doesn't
have a native equivalent, so we typically map this to an `INT64` to store nanosecond duration.
So please ensure that your application handles this.

## SET and LIST

Cassandra uses `SET` (an unordered collection of unique elements) and
`LIST` (an ordered collection of non-unique elements).

Both of these are typically mapped to Spanner(GoogleSQL)'s `ARRAY` type (e.g., `SET<TEXT>` to
`ARRAY<STRING(MAX)>`, `LIST<INT>` to `ARRAY<INT64>`). When mapping `SET` to `ARRAY`,
note that Spanner(GoogleSQL)'s ARRAY is ordered and allows duplicates. Therefore, your application
must handle uniqueness if required.

## MAP

Cassandra uses `MAP` for storing typed key-value pairs. Spanner(GoogleSQL) does not have a native `MAP` type.
Cassandra's `MAP` typically maps to Spanner(GoogleSQL)'s `JSON` type. Unlike Cassandra, Spanner(GoogleSQL) does not
validate the internal `JSON` structure or types, so your application must ensure data integrity.

## Storage Use

The Spanner migration tool maps several Cassandra types to Spanner(GoogleSQL) types that use more storage.
For example, `SMALLINT` is a two-byte integer, but it maps to Spanner(GoogleSQL)'s `INT64`,
an eight-byte integer.

## Primary Keys

Spanner(GoogleSQL) requires primary keys for all tables. Spanner(GoogleSQL)'s primary key is derived
as a composite of the Cassandra partition key and clustering key.

## Column Nullability

Cassandra does not enforce all columns on all rows, so corresponding Spanner(GoogleSQL) columns are
created as `NULLABLE` by default. Spanner(GoogleSQL) primary key columns, however, are inherently `NOT NULL`.
We can explicitly define other columns as NOT NULL in Spanner(GoogleSQL) if Cassandra data guarantees a value.

## Foreign Keys

Cassandra does not support native foreign key constraints. Therefore, no such constraints exist
to convert when migrating from Cassandra to Spanner(GoogleSQL).

## Secondary Indexes

The tool currently doesn't support the migration of Cassandra secondary indexes to Spanner(GoogleSQL) secondary indexes.

## Other Cassandra Types
Cassandra's other complex types, such as nested collection types and User Defined Types (UDTs), are currently
not natively supported in Spanner(GoogleSQL). By default, these types are mapped to `STRING(MAX)`.

## Note
See [Migrating from Cassandra to Cloud Spanner(GoogleSQL)](https://cloud.google.com/spanner/docs/non-relational/migrate-from-cassandra-to-spanner)
for details on data migration since currently SMT supports schema only migration.
2 changes: 1 addition & 1 deletion docs/data-types/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ permalink: /data-types-conv
# Data type conversions using schema migrations
{: .no_toc }

SMT currently supports performing schema migrations for MySQL and PostgreSQL. Certain features of relational databases, especially those that don't map directly to Spanner features, are ignored, e.g. stored functions and procedures, and sequences. Types such as integers, floats, char/text, bools, timestamps, and (some) array types, map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's `STRING(MAX)`.
SMT currently supports performing schema migrations for MySQL, PostgreSQL, and Cassandra. For Cassandra, schema migrations are supported only to the GoogleSQL dialect. Certain features of relational databases, especially those that don't map directly to Spanner features, are ignored, e.g. stored functions and procedures, and sequences. Types such as integers, floats, char/text, bools, timestamps, and (some) array types, map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's `STRING(MAX)`.

SMT supports converting to both GoogleSQL and PostgreSQL [dialects](https://cloud.google.com/spanner/docs) of Spanner.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ To find out how to configure custom transformations, refer [here](./transformati

## Supported Sources and Targets

- **Schema Migrations**: SMT supports schema migrations for MySQL and PostgreSQL.
- **Schema Migrations**: SMT supports schema migrations for MySQL, PostgreSQL and Cassandra.
- **Data Migrations**: SMT supports minimal downtime migrations and POC migration for MySQL and PostgreSQL databases.

## About the project
Expand Down
Loading