Skip to content

Conversation

@msmygit
Copy link

@msmygit msmygit commented Jan 24, 2026

Description

Upgrade to latest Cassandra Java Driver 4.x

Motivation and Context

Cassandra Java Driver 3.x versions has reached EOL and is mostly not receiving any fixes or features. DataStax has already donated it to the ASF and is spearheading the 4.x series.

Resolves #26852 #26762

Impact

This will get presto to a more stable and future-forward and supported non-EOL Cassandra Java driver.

Test Plan

All relevant tests are now updated to leverage the latest Cassandra Java driver flavor

Contributor checklist

  • [x ] Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Cassandra Connector Changes
* Upgrade to Cassandra Java Driver `4.x`

Summary by Sourcery

Upgrade the Cassandra connector to use the Cassandra Java Driver 4.x and adapt the connector’s session, metadata, type handling, and configuration to the new driver APIs, including support for DataStax Astra secure connect bundles.

New Features:

  • Add support for configuring Cassandra connections via DataStax Astra secure connect bundles.
  • Introduce a ReopeningSession wrapper to transparently reopen CqlSession instances on failure.

Enhancements:

  • Refactor Cassandra client wiring to build CqlSession with programmatic driver configuration and updated retry, load-balancing, and speculative execution settings.
  • Update connector metadata, schema, and token range handling to the new driver 4.x metadata and token map model.
  • Adjust Cassandra type mappings and value conversions to the driver 4.x type system and Java time APIs.
  • Simplify configuration by removing deprecated protocol and whitelist settings and relying on driver 4.x defaults.
  • Update write path (page sink) and test utilities to use the new query builder and statement APIs.
  • Replace host-based replica/address handling with the new Node-based APIs and adapt split/token code accordingly.

Build:

  • Replace the legacy DataStax driver dependency with the Apache Cassandra Java Driver 4.x core and query-builder modules, and add the LZ4 runtime and jsr305 annotations dependencies.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 24, 2026

Reviewer's Guide

Upgrades the Presto Cassandra connector from the DataStax 3.x driver to the Apache Cassandra Java Driver 4.x, refactoring session/metadata APIs, query building, type mappings, retries, and configuration to align with the new driver while adding Astra/secure-connect support and updating tests and dependencies.

Class diagram for updated Cassandra session and client wiring

classDiagram
    class CassandraClientModule {
        +createCassandraSession(ConnectorId connectorId, CassandraClientConfig config, JsonCodec_List_ExtraColumnMetadata extraColumnMetadataCodec) CassandraSession
    }

    class CassandraSession {
        <<interface>>
        +String PRESTO_COMMENT_METADATA
        +String getCassandraVersion()
        +String getPartitioner()
        +Set~TokenRange~ getTokenRanges()
        +Set~Node~ getReplicas(String caseSensitiveSchemaName, TokenRange tokenRange)
        +Set~Node~ getReplicas(String caseSensitiveSchemaName, ByteBuffer partitionKey)
        +String getCaseSensitiveSchemaName(String caseInsensitiveSchemaName)
        +List~String~ getCaseSensitiveSchemaNames()
        +List~String~ getCaseSensitiveTableNames(String caseSensitiveSchemaName)
        +CassandraTable getTable(SchemaTableName schemaTableName)
        +List~SizeEstimate~ getSizeEstimates(String keyspaceName, String tableName)
        +PreparedStatement prepare(String statement)
        +ResultSet execute(String cql, Object values)
        +ResultSet execute(Statement statement)
    }

    class NativeCassandraSession {
        -String connectorId
        -JsonCodec_List_ExtraColumnMetadata extraColumnMetadataCodec
        -ReopeningSession reopeningSession
        -Supplier~CqlSession~ session
        -Duration noHostAvailableRetryTimeout
        -static boolean caseSensitiveNameMatchingEnabled
        +NativeCassandraSession(String connectorId, JsonCodec_List_ExtraColumnMetadata extraColumnMetadataCodec, ReopeningSession reopeningSession, Duration noHostAvailableRetryTimeout, boolean caseSensitiveNameMatchingEnabled)
        +String getCassandraVersion()
        +String getPartitioner()
        +Set~TokenRange~ getTokenRanges()
        +Set~Node~ getReplicas(String caseSensitiveSchemaName, TokenRange tokenRange)
        +Set~Node~ getReplicas(String caseSensitiveSchemaName, ByteBuffer partitionKey)
        +String getCaseSensitiveSchemaName(String caseSensitiveSchemaName)
        +List~String~ getCaseSensitiveSchemaNames()
        +List~String~ getCaseSensitiveTableNames(String caseSensitiveSchemaName)
        +CassandraTable getTable(SchemaTableName schemaTableName)
        +boolean isMaterializedView(SchemaTableName schemaTableName)
        +List~SizeEstimate~ getSizeEstimates(String keyspaceName, String tableName)
        +PreparedStatement prepare(String statement)
        +ResultSet execute(String cql, Object values)
        +ResultSet execute(Statement statement)
        -T executeWithSession(SessionCallable sessionCallable)
    }

    class ReopeningSession {
        -CqlSession delegate
        -boolean closed
        -Supplier~CqlSession~ sessionSupplier
        +ReopeningSession(Supplier~CqlSession~ sessionSupplier)
        +synchronized CqlSession get()
        +synchronized CqlSession getSession()
        +synchronized void close()
        +synchronized boolean isClosed()
    }

    class CassandraClientConfig {
        -DefaultConsistencyLevel consistencyLevel
        -int fetchSize
        -List~String~ contactPoints
        -int nativeProtocolPort
        -boolean allowDropTable
        -String username
        -String password
        -Duration clientReadTimeout
        -Duration clientConnectTimeout
        -Integer clientSoLinger
        -RetryPolicyType retryPolicy
        -boolean useDCAware
        -String dcAwareLocalDC
        -int dcAwareUsedHostsPerRemoteDc
        -boolean dcAwareAllowRemoteDCsForLocal
        -boolean useTokenAware
        -boolean tokenAwareShuffleReplicas
        -Duration noHostAvailableRetryTimeout
        -int speculativeExecutionLimit
        -Duration speculativeExecutionDelay
        -boolean tlsEnabled
        -File truststorePath
        -String truststorePassword
        -File keystorePath
        -String keystorePassword
        -File secureConnectBundle
        -boolean caseSensitiveNameMatchingEnabled
        +DefaultConsistencyLevel getConsistencyLevel()
        +CassandraClientConfig setConsistencyLevel(DefaultConsistencyLevel level)
        +List~String~ getContactPoints()
        +CassandraClientConfig setContactPoints(String commaSeparatedList)
        +Optional~File~ getSecureConnectBundle()
        +CassandraClientConfig setSecureConnectBundle(File secureConnectBundle)
        +Duration getNoHostAvailableRetryTimeout()
        +boolean isTlsEnabled()
        +String getDcAwareLocalDC()
        +boolean isUseDCAware()
        +boolean isUseTokenAware()
        +RetryPolicyType getRetryPolicy()
        +int getSpeculativeExecutionLimit()
        +Duration getSpeculativeExecutionDelay()
    }

    class RetryPolicyType {
        <<enum>>
        +DEFAULT
        +BACKOFF
        +DOWNGRADING_CONSISTENCY
        +FALLTHROUGH
        -Class_retryPolicy_ policyClass
        +Class_retryPolicy_ getPolicyClass()
    }

    class BackoffRetryPolicy {
        <<implements RetryPolicy>>
        -static int MAX_RETRIES
        +RetryDecision onReadTimeout(Request request, ConsistencyLevel cl, int blockFor, int received, boolean dataPresent, int retryCount)
        +RetryDecision onWriteTimeout(Request request, ConsistencyLevel cl, WriteType writeType, int blockFor, int received, int retryCount)
        +RetryDecision onUnavailable(Request request, ConsistencyLevel cl, int required, int alive, int retryCount)
        +RetryDecision onRequestAborted(Request request, Throwable error, int retryCount)
        +RetryDecision onErrorResponse(Request request, CoordinatorException error, int retryCount)
        +void close()
    }

    CassandraClientModule ..> CassandraClientConfig : uses
    CassandraClientModule ..> ReopeningSession : creates
    CassandraClientModule ..> CqlSession : builds
    CassandraClientModule ..> RetryPolicyType : reads
    CassandraClientModule ..> DefaultLoadBalancingPolicy : configures

    NativeCassandraSession ..> CqlSession : uses
    NativeCassandraSession ..> ReopeningSession : holds
    NativeCassandraSession ..|> CassandraSession

    ReopeningSession ..> CqlSession : wraps

    CassandraClientConfig ..> RetryPolicyType

    RetryPolicyType ..> RetryPolicy : policyClass
    BackoffRetryPolicy ..|> RetryPolicy
Loading

Class diagram for updated Cassandra type mapping and value handling

classDiagram
    class CassandraType {
        <<enum>>
        ASCII
        BIGINT
        BLOB
        BOOLEAN
        COUNTER
        CUSTOM
        DATE
        DECIMAL
        DOUBLE
        FLOAT
        INET
        INT
        LIST
        MAP
        SET
        SMALLINT
        TEXT
        TIMESTAMP
        TIMEUUID
        TINYINT
        TUPLE
        UUID
        VARCHAR
        VARINT
        -Type nativeType
        -Class_java_javaType
        +int getTypeArgumentSize()
        +static CassandraType getCassandraType(DataType dataType)
        +static NullableValue getColumnValue(Row row, int position, CassandraType cassandraType, List~CassandraType~ typeArguments)
        +static NullableValue getColumnValueForPartitionKey(Row row, int position, CassandraType cassandraType, List~CassandraType~ typeArguments)
        +static String getColumnValueForCql(Row row, int position, CassandraType cassandraType, List~CassandraType~ typeArguments)
        +Object getJavaValue(Object nativeValue)
        +static CassandraType toCassandraType(Type type)
        -static String bytesToHex(ByteBuffer buffer)
    }

    class NativeCassandraSession {
        -CassandraColumnHandle buildColumnHandle(TableMetadata tableMetadata, ColumnMetadata columnMeta, boolean partitionKey, boolean clusteringKey, int ordinalPosition, boolean hidden) CassandraColumnHandle
    }

    class CassandraPageSink {
        -CassandraSession cassandraSession
        -PreparedStatement insert
        -List~Type~ columnTypes
        -boolean generateUUID
        +CassandraPageSink(CassandraSession cassandraSession, String schemaName, String tableName, List~String~ columnNames, List~Type~ columnTypes, boolean generateUUID)
        +Collection~Slice~ finish()
        +void appendPage(Page page)
        -Object toCassandraObject(Type type, Block block, int position)
    }

    class CassandraRecordCursor {
        -List~FullCassandraType~ fullCassandraTypes
        -ResultSet rs
        -Row currentRow
        -long count
        -Iterator_Row_ iterator
        +CassandraRecordCursor(CassandraSession cassandraSession, List~FullCassandraType~ fullCassandraTypes, String cql)
        +boolean advanceNextPosition()
        +double getDouble(int i)
        +long getLong(int i)
    }

    CassandraType ..> DataType : maps
    CassandraType ..> Row : reads
    CassandraType ..> NullableValue : returns
    CassandraType ..> ByteBuffer : uses
    CassandraType ..> Instant : uses
    CassandraType ..> LocalDate : uses

    NativeCassandraSession ..> CassandraType : uses
    CassandraPageSink ..> CassandraType : via Type mapping
    CassandraRecordCursor ..> CassandraType : via FullCassandraType
Loading

File-Level Changes

Change Details Files
Migrate core Cassandra session, metadata, and token/replica APIs from driver 3.x to driver 4.x types and patterns.
  • Replace Cluster/Session with CqlSession and a new ReopeningSession wrapper, updating NativeCassandraSession to memoize and use CqlSession instances.
  • Switch metadata access from cluster.getMetadata() to session.getMetadata() and TokenMap, updating partitioner, token range, replicas, keyspace, table, and view lookups.
  • Change Host to Node for replica/endpoints and adapt HostAddressFactory, CassandraTokenSplitManager, CassandraSplitManager, and CassandraSession interface to the new types.
  • Return Cassandra version as a String instead of VersionNumber and introduce simple string-based version comparison where needed.
presto-cassandra/src/main/java/com/facebook/presto/cassandra/NativeCassandraSession.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraSession.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/util/HostAddressFactory.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraTokenSplitManager.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraSplitManager.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraClusteringPredicatesExtractor.java
Update query building, prepared/executed statements, and record access to driver 4.x APIs.
  • Replace driver 3.x QueryBuilder and Statement hierarchy with java-driver-query-builder Select/RegularInsert and SimpleStatement, manually constructing WHERE clauses where the builder is not yet used.
  • Change CassandraSession.prepare/execute signatures to use String and Statement<?> and adjust all call sites including CassandraPageSink, NativeCassandraSession, and test utilities.
  • Update CassandraRecordCursor and CassandraType to use new Row getters (getUuid, getInstant, getLocalDate, getBigDecimal, getBigInteger, getInetAddress) and adjust ResultSet consumption to use iterators instead of isExhausted/one().
  • Refactor CassandraCqlUtils select helpers to use driver 4.x QueryBuilder.selectFrom and .asCql() instead of getQueryString().
presto-cassandra/src/main/java/com/facebook/presto/cassandra/util/CassandraCqlUtils.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraPageSink.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraRecordCursor.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraRecordSetProvider.java
presto-cassandra/src/test/java/com/facebook/presto/cassandra/CassandraTestingUtils.java
Revise type system integration and temporal/collection handling for the new driver.
  • Change CassandraType mapping from DataType.Name to the driver 4.x DataType/DataTypes hierarchy, including list/set/map/tuple/custom detection via instanceof checks.
  • Switch TIMESTAMP backing type from java.util.Date to java.time.Instant and adjust conversions for TIMESTAMP and DATE (Instant.ofEpochMilli, LocalDate.ofEpochDay) in CassandraType and CassandraPageSink.
  • Update numeric and binary accessors to use getBigDecimal/getBigInteger and replace Bytes.toHexString with an internal bytesToHex helper consistent with driver 4.x ByteBuffer handling.
  • Remove protocol-version-dependent DATE handling and simplify CassandraType.toCassandraType to no longer accept ProtocolVersion; adjust CassandraMetadata and clustering/date conversion callers accordingly.
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraType.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraPageSink.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraMetadata.java
Rebuild client configuration, retry policy, and session construction for driver 4.x, adding Astra/secure connect support.
  • Rework CassandraClientModule to build CqlSession via CqlSessionBuilder and programmatic DriverConfigLoader, configuring timeouts, consistency, page size, reconnection, retries, load balancing, and speculative execution using DefaultDriverOption.
  • Introduce ReopeningSession as the new reusable session wrapper that recreates CqlSession instances on closure; deprecate ReopeningCluster and adapt NativeCassandraSession and tests to use ReopeningSession.
  • Update CassandraClientConfig to use DefaultConsistencyLevel, remove protocol-version and white-list configs, adjust default timeouts, and add cassandra.cloud.secure-connect-bundle support with corresponding File field and Optional accessor.
  • Port RetryPolicyType and BackoffRetryPolicy to the driver 4.x retry API (org.apache.cassandra.* retry interfaces) and change configuration wiring to pass retry policy classes instead of instances.
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraClientModule.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraClientConfig.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/RetryPolicyType.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/BackoffRetryPolicy.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/ReopeningSession.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/ReopeningCluster.java
presto-cassandra/src/test/java/com/facebook/presto/cassandra/CassandraServer.java
Adapt write path, metadata, and connector wiring to the updated driver and configuration.
  • Drop ProtocolVersion from CassandraPageSinkProvider and CassandraPageSink construction, always using driver 4.x behavior for DATE/TIMESTAMP and insert statement construction.
  • Update CassandraMetadata table-creation logic to call CassandraType.toCassandraType without protocol version and wire case-sensitive name matching through to the new metadata APIs.
  • Adjust size estimates querying and system.size_estimates presence checks to use Optional<KeyspaceMetadata/TableMetadata> from the new metadata API and SimpleStatement-based queries.
  • Extend CassandraErrorCode with a generic CASSANDRA_ERROR used in new error cases (e.g., missing secure connect bundle, missing required local DC).
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraPageSinkProvider.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraPageSink.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraMetadata.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/NativeCassandraSession.java
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraErrorCode.java
Update Maven dependencies and plugin configuration for the new driver and supporting libraries.
  • Replace com.datastax.cassandra:cassandra-driver-core with org.apache.cassandra:java-driver-core and add org.apache.cassandra:java-driver-query-builder in both the root POM and presto-cassandra module POM, centralized under dep.cassandra-java-driver.version.
  • Add at.yawk.lz4:lz4-java as a runtime dependency required by Cassandra Java Driver 4.x and declare its version in the root POM.
  • Introduce a jsr305 dependency in presto-cassandra and configure maven-dependency-plugin to ignore it as unused declared.
  • Mark legacy configuration properties (protocol-version, load-policy white-list) as defunct in CassandraClientConfig to reflect their removal.
pom.xml
presto-cassandra/pom.xml
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraClientConfig.java

Assessment against linked issues

Issue Objective Addressed Explanation
#26852 Upgrade the Cassandra connector to use the latest supported Cassandra Java Driver 4.x instead of the EOL 3.x driver, including updating dependencies and adapting code to the new driver APIs.
#26852 Maintain existing Cassandra connector behavior and functionality (session management, metadata access, query execution, partitioning, write path, and tests) so that the upgrade is transparent to users.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@msmygit msmygit force-pushed the attempt2 branch 6 times, most recently from d43e8ab to 3f521cc Compare January 25, 2026 18:00
@msmygit msmygit force-pushed the attempt2 branch 4 times, most recently from fb5735d to 38a3f11 Compare January 26, 2026 01:20
@msmygit msmygit force-pushed the attempt2 branch 2 times, most recently from 6cadce7 to a87b630 Compare January 26, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to using the most latest and supported Cassandra Java Driver 4.x

1 participant