Skip to content

Conversation

@wraymo
Copy link

@wraymo wraymo commented Jun 13, 2025

Description

This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.

This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to presto-native-execution.

Like other connectors, we have created a presto-clp module and implemented all required connector interfaces. The plan optimizer will be a future PR.

The important classes in the connector are described below.


Core Classes in Java

ClpConfig

The configuration class for CLP. Currently, we support the following properties:

  • clp.metadata-expire-interval: Defines the time interval after which metadata entries are considered expired and removed from the cache.
  • clp.metadata-refresh-interval: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.
  • clp.polymorphic-type-enabled: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.
  • clp.metadata-provider-type: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.
  • clp.metadata-db-url: The connection URL for the metadata database, used when clp.metadata-provider-type is configured to use a database.
  • clp.metadata-db-name: The name of the metadata database.
  • clp.metadata-db-user: The database user with access to the metadata database.
  • clp.metadata-db-password: The password for the metadata database user.
  • clp.metadata-table-prefix: A prefix applied to table names in the metadata database.
  • clp.split-provider-type: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.

ClpSchemaTree

A helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g., a.b.c), handles name/type conflicts when the clp.polymorphic-type-enabled option is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list of ClpColumnHandle instances, including RowType for nested structures, making it suitable for dynamic or semi-structured data formats.

When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a Varstring (CLP type) field named "a" coexist in CLP’s schema tree, they are represented as a_bigint and a_varchar in Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.


ClpMetadataProvider

An interface responsible for retrieving metadata from a specified source.

public interface ClpMetadataProvider {
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);  
    List<ClpTableHandle> listTableNames(String schema);  
}

We provide a default implementation called ClpMySqlMetadataProvider, which uses two MySQL tables. One of these is the datasets table, defined with the schema shown below. Currently, we support only a single Presto schema named default, and this metadata table stores all table names, paths, and storage types associated with that Presto schema.

Column Name Data Type Constraints
name VARCHAR(255) PRIMARY KEY
archive_storage_type VARCHAR(4096) NOT NULL
archive_storage_directory VARCHAR(4096) NOT NULL

The second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.

Column Name Data Type Constraints
name VARCHAR(512) NOT NULL
type TINYINT NOT NULL
Primary Key (name, type)

ClpSplitProvider

In CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The ClpSplitProvider interface, shown below, defines how to retrieve split information from a specified source:

public interface ClpSplitProvider {
    List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle);  
}

We provide a default implementation called ClpMySqlSplitProvider. It uses an archive table to store archive IDs associated with each table. The table below shows part of the schema (some irrelevant fields are omitted).

Column Name Data Type Constraints
pagination_id BIGINT AUTO_INCREMENT PRIMARY KRY
id VARCHAR(128) NOT NULL
... ... ...

By concatenating the table path (archive_storage_directory) and the archive ID (id), we can retrieve all split paths for a table.


ClpMetadata

This interface enables Presto to access various metadata. All requests are delegated to ClpMetadataProvider

For metadata management, it also maintains two caches and periodically refreshes the metadata.

  • columnHandleCache: A LoadingCache<SchemaTableName, List<ClpColumnHandle>> that maps a SchemaTableName to its corresponding list of ClpColumnHandle objects.
  • tableHandleCache: A LoadingCache<String, List<ClpTableHandle>> that maps a schema name (String) to a list of ClpTableHandle objects

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • All unit tests passed successfully.
  • End-to-end tests verified, including Velox workers and Prestissimo protocol integration.

Summary by CodeRabbit

  • New Features

    • Introduced the CLP Connector for querying CLP-S archives in Presto with support for MySQL metadata and split providers.
    • Added support for nested and polymorphic column types, mapping CLP data types to Presto types including complex ROW structures.
    • Provided extensive configuration options for metadata refresh, expiration intervals, polymorphic type support, and provider selection.
    • Integrated lifecycle management and seamless plugin framework support within Presto.
  • Tests

    • Added tests covering metadata and split provider functionality, validating schema, table, and split retrieval.
    • Included utilities for setting up and tearing down H2-based test metadata databases.
  • Documentation

    • Delivered comprehensive CLP Connector documentation detailing setup, configuration, data type mappings, and usage.
    • Updated connector index to include the new CLP Connector.
  • Chores

    • Updated build and provisioning configurations to incorporate the new CLP Connector module and plugin packaging.

@coderabbitai
Copy link

coderabbitai bot commented Jun 13, 2025

Walkthrough

A new Presto connector module named presto-clp is introduced, enabling SQL access to CLP-S archives via Presto. The change includes all core connector classes, metadata and split providers (with MySQL implementations), configuration, error handling, and test infrastructure. Documentation and server provisioning are updated to support and describe the new connector.

Changes

File(s) / Group Change Summary
pom.xml, presto-clp/pom.xml Added the presto-clp module to the Maven build and introduced its project configuration with dependencies.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/
(ClpColumnHandle, ClpConfig, ClpConnector, ClpConnectorFactory, ClpErrorCode, ClpHandleResolver, ClpMetadata, ClpModule, ClpPlugin, ClpRecordSetProvider, ClpSplit, ClpSplitManager, ClpTableHandle, ClpTableLayoutHandle, ClpTransactionHandle)
Implemented all main connector classes, configuration, error codes, handle resolvers, and plugin entry point for the CLP connector.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/
(ClpMetadataProvider, ClpMySqlMetadataProvider, ClpSchemaTree, ClpSchemaTreeNodeType)
Added metadata provider interface and MySQL implementation, schema tree structure for columns, and node type enum.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/
(ClpMySqlSplitProvider, ClpSplitProvider)
Added split provider interface and MySQL-based split provider implementation.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/
(ClpMetadataDbSetUp, TestClpMetadata, TestClpSplit)
Introduced test utilities for H2-based metadata DB setup and teardown, and tests for metadata and split provider functionality.
presto-docs/src/main/sphinx/connector.rst Added "connector/clp" to the list of documented Presto connectors.
presto-docs/src/main/sphinx/connector/clp.rst Added comprehensive documentation for the CLP Connector, including configuration, usage, and type mapping details.
presto-server/src/main/provisio/presto.xml Registered the presto-clp plugin artifact for server provisioning.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PrestoEngine
    participant ClpPlugin
    participant ClpConnectorFactory
    participant ClpConnector
    participant ClpMetadata
    participant ClpSplitManager
    participant ClpRecordSetProvider
    participant ClpMetadataProvider (MySQL)
    participant ClpSplitProvider (MySQL)

    User->>PrestoEngine: Submit SQL query
    PrestoEngine->>ClpPlugin: Load connector factories
    ClpPlugin->>ClpConnectorFactory: getConnectorFactories()
    PrestoEngine->>ClpConnectorFactory: create()
    ClpConnectorFactory->>ClpConnector: instantiate with dependencies
    PrestoEngine->>ClpConnector: beginTransaction()
    PrestoEngine->>ClpConnector: getMetadata()
    PrestoEngine->>ClpMetadata: listSchemaNames()/listTables()/getTableMetadata()
    ClpMetadata->>ClpMetadataProvider (MySQL): listTableHandles()/listColumnHandles()
    PrestoEngine->>ClpConnector: getSplitManager()
    PrestoEngine->>ClpSplitManager: getSplits()
    ClpSplitManager->>ClpSplitProvider (MySQL): listSplits()
    PrestoEngine->>ClpConnector: getRecordSetProvider()
    PrestoEngine->>ClpRecordSetProvider: getRecordSet() (not supported)
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 34

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f0bdbd and 6873557.

📒 Files selected for processing (30)
  • pom.xml (1 hunks)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector.rst (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
  • presto-server/src/main/provisio/presto.xml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
  • ClpSplit (28-62)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
  • ClpTableLayoutHandle (25-79)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
  • ClpTableHandle (25-96)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (9)
presto-docs/src/main/sphinx/connector.rst (1)

17-17: Add CLP connector entry to documentation index

The CLP connector has been included in the toctree. Please verify that connector/clp.rst exists and that the reference resolves correctly in the generated documentation.

presto-server/src/main/provisio/presto.xml (1)

236-240: Provision new CLP connector plugin

An artifactSet for presto-clp has been added. Confirm that the Maven coordinates (${project.groupId}:presto-clp:zip:${project.version}) align with your module’s POM and that the plugin directory is populated as expected during provisioning.

pom.xml (1)

218-218: Register the CLP connector module.
The new <module>presto-clp</module> is correctly added to the root POM. Ensure this aligns with provisioning and documentation updates.

Please verify that the connector is also declared in presto-server/src/main/provisio/presto.xml.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)

23-27: Expose connector factory.
getConnectorFactories() correctly returns an immutable list containing ClpConnectorFactory, conforming to Presto SPI expectations for a single‐factory plugin.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

25-35:

❌ Incorrect review comment

Advertised provider throws unconditionally – verify connector wiring

Registering a ConnectorRecordSetProvider that always throws risks runtime failures if the engine ever routes scan tasks here (e.g., during fallback when no PageSourceProvider is present). If record sets are intentionally unsupported, omit this class from the Guice bindings, or implement the proper PageSourceProvider and let this class disappear.

-@Singleton
-bind(ConnectorRecordSetProvider.class).to(ClpRecordSetProvider.class);
+// Remove binding – record sets not supported
+// bind(ConnectorPageSourceProvider.class).to(ClpPageSourceProvider.class);

Please double-check ClpModule to ensure only the supported scan path is exposed.


To verify how ClpModule is wiring ConnectorRecordSetProvider, let’s inspect its bindings:


🏁 Script executed:

#!/bin/bash
# Locate ClpModule and show its Guice bindings
rg "class ClpModule" -n -A200

# Check for any binding references to ConnectorRecordSetProvider
rg "bind.*ConnectorRecordSetProvider" -n

Length of output: 3767


Ignore record-set provider removal suggestion

The ClpModule only binds the implementation class (ClpRecordSetProvider) to itself—it does not bind ConnectorRecordSetProvider to ClpRecordSetProvider. As a result, this provider is never registered or invoked by the engine, and there’s no risk of unconditionally thrown exceptions at runtime.

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)

25-30: Error-code block looks good

The enum follows Presto conventions (unique high-order bits, EXTERNAL type). No issues spotted.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

39-46:

❓ Verification inconclusive

Configuration read occurs before injector is fully initialised

buildConfigObject(ClpConfig.class) pulls configuration during module bootstrap.
If a required property is missing or malformed, the thrown PrestoException will abort the entire server startup before Guice has a chance to give a helpful​ diagnostics trace.

Consider delaying provider-type validation to @PostConstruct in ClpConnector (or dedicated validator) so that bootstrap can succeed and the connector reports a clearer error in the query failure path.

This also removes hard coupling between module wiring and provider enumeration values.


🏁 Script executed:

#!/bin/bash
set -eux

# Locate ClpModule.java
CLP_MODULE_PATH=$(fd ClpModule.java -t f | head -n1)
echo "ClpModule file path: $CLP_MODULE_PATH"

# Display the first 200 lines of ClpModule.java for context
sed -n '1,200p' "$CLP_MODULE_PATH"

# Find occurrences of buildConfigObject in the repository
grep -R "buildConfigObject" -n .

# Locate AbstractConfigurationAwareModule to inspect buildConfigObject implementation
ACAM_PATH=$(grep -R "class AbstractConfigurationAwareModule" -n . | cut -d: -f1 | head -n1)
echo "AbstractConfigurationAwareModule file path: $ACAM_PATH"
sed -n '1,200p' "$ACAM_PATH"

Length of output: 13425


Delay provider-type validation until connector startup

Currently, ClpModule.setup calls buildConfigObject(ClpConfig.class) during Guice module configuration, which will throw a PrestoException and abort server startup if metadataProviderType (or splitProviderType) is missing or invalid—before the injector can provide full diagnostics. Move these checks into a @PostConstruct method in ClpConnector (or a dedicated validator) so that the server can start normally and the connector reports clear errors at runtime. This also decouples module wiring from enum values.

  • File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
    Lines: 39–46
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)

70-80: Add null/blank checks for mandatory DB connection properties

metadataDbUrl, metadataDbName, and metadataDbUser are essential for the MySQL provider but are accepted unchecked. A misconfigured cluster would fail much later at runtime. Consider validating them eagerly (e.g., in a @PostConstruct method or individual setters) and throwing PrestoException with CLP_UNSUPPORTED_CONFIG_OPTION if a required value is missing or blank.

Also applies to: 82-92, 94-104


24-32: Enum extensibility looks fine

Using enums for provider types with a single MYSQL option is reasonable for the initial drop and leaves room for future back-ends. No action required.

Comment on lines 21 to 27
public interface ClpSplitProvider
{
/**
* Returns a list of splits for the given table layout handle.
*/
List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return type limits scalability

For very large tables, materialising all splits eagerly into a List<ClpSplit> can exhaust memory and block planning. Presto’s SPI usually streams splits via ConnectorSplitSource. Consider changing the contract to return a ConnectorSplitSource (or at least Iterable<ClpSplit>) so implementations can lazily yield splits.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
around lines 21 to 27, the method listSplits currently returns a List<ClpSplit>,
which eagerly materializes all splits and can cause memory issues for large
tables. Change the return type to ConnectorSplitSource or at least
Iterable<ClpSplit> to enable lazy streaming of splits. Update the method
signature accordingly and adjust implementations to yield splits lazily instead
of returning a fully populated list.

Comment on lines 23 to 26
/**
* Returns a list of splits for the given table layout handle.
*/
List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Missing nullability / immutability hints

Annotating the parameter and return value with @Nonnull (and returning an ImmutableList) clarifies expectations and prevents accidental nulls.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
around lines 23 to 26, the method listSplits lacks nullability annotations and
does not specify immutability for its return type. Add @Nonnull annotations to
the parameter clpTableLayoutHandle and the method return type to indicate they
should never be null. Change the return type to ImmutableList<ClpSplit> to
ensure the returned list is immutable, improving safety and clarity.

Comment on lines 22 to 33
public interface ClpMetadataProvider
{
/**
* Returns the list of column handles for the given table.
*/
List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

/**
* Returns the list of table handles in the given schema.
*/
List<ClpTableHandle> listTableHandles(String schema);
}
Copy link

@coderabbitai coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Interface should specify caching / concurrency guarantees

Metadata calls may be invoked concurrently by Presto workers. The interface should document whether implementations must be thread-safe and whether returned collections may be cached / mutated. Adding @ThreadSafe (if applicable) and returning immutable collections would make the contract explicit.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
around lines 22 to 33, the interface lacks documentation on thread-safety and
caching guarantees. Update the interface to explicitly state whether
implementations must be thread-safe, add the @ThreadSafe annotation if
applicable, and modify the methods to return immutable collections to prevent
mutation and clarify caching behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wraymo Understood—if only a single instance lives in the coordinator process, concurrency concerns are moot, so the extra thread-safety annotations and immutability constraints aren’t required. I’ll keep that context in mind for future reviews. Thanks for clarifying!


✏️ Learnings added
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.302Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)

57-65: 🧹 Nitpick (assertive)

Still relying on Apache Math3 Pair; consider a lighter alternative

Prior feedback suggested replacing org.apache.commons.math3.util.Pair with a smaller dependency (e.g., ImmutablePair, a Java record, or Map.entry). Keeping Math3 just for tests pulls in ~1 MB of transitive jars.

-                                new Pair<>("a", ClpNodeType.Integer),
-                                new Pair<>("a", ClpNodeType.VarString),
+                                Map.entry("a", ClpNodeType.Integer),
+                                Map.entry("a", ClpNodeType.VarString),

You’d then adapt the helper to accept Map.Entry<String, ClpNodeType>.
Same logic, leaner class-path.


83-86: 🧹 Nitpick (assertive)

Replace mutable HashSet with immutable collection for clarity

A mutable HashSet is built only to be compared immediately. Use ImmutableSet to reduce noise:

-        HashSet<SchemaTableName> tables = new HashSet<>();
-        tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName));
-        assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables);
+        ImmutableSet<SchemaTableName> expected =
+                ImmutableSet.of(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName));
+        assertEquals(ImmutableSet.copyOf(
+                metadata.listTables(SESSION, Optional.empty())), expected);
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

64-76: 🧹 Nitpick (assertive)

Hard-coded /tmp clean-up path is brittle

The tear-down assumes H2 always writes to /tmp. This breaks on Windows or when the JVM’s temp dir is re-configured. Capture the actual file location from ClpMetadataDbSetUp (or use Files.createTempDirectory) and delete via that handle instead of hard-coding.

No functional risk today, but the test will fail in CI environments with non-standard temp dirs.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

58-62: 🧹 Nitpick (assertive)

Add toString() for easier debugging of splits

A small helper greatly simplifies log inspection and test diffs.

@@
     public Object getInfo()
     {
         return this;
     }
+
+    @Override
+    public String toString()
+    {
+        return "ClpSplit{path='" + path + "'}";
+    }
 }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)

38-46: 🧹 Nitpick (assertive)

Fail fast on null handle components

schemaTableName, tablePath, and storageType are required for identity but accepted unchecked. Add requireNonNull to surface mis-wired injections early.

-        this.schemaTableName = schemaTableName;
-        this.tablePath = tablePath;
-        this.storageType = storageType;
+        this.schemaTableName = java.util.Objects.requireNonNull(schemaTableName, "schemaTableName is null");
+        this.tablePath = java.util.Objects.requireNonNull(tablePath, "tablePath is null");
+        this.storageType = java.util.Objects.requireNonNull(storageType, "storageType is null");
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

35-45: 🧹 Nitpick (assertive)

Enforce non-null invariants for column handle fields

For consistency with other Presto handle classes, guard all reference fields:

-        this.columnName = columnName;
-        this.originalColumnName = originalColumnName;
-        this.columnType = columnType;
+        this.columnName = java.util.Objects.requireNonNull(columnName, "columnName is null");
+        this.originalColumnName = java.util.Objects.requireNonNull(originalColumnName, "originalColumnName is null");
+        this.columnType = java.util.Objects.requireNonNull(columnType, "columnType is null");
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)

137-147: 🧹 Nitpick (assertive)

Consider sorting root-level columns for deterministic order

Nested ROW fields are now sorted, but top-level columns are still emitted in HashMap iteration order. While functional correctness is unaffected, stable ordering helps with plan-cache keys, golden-file tests, and diff-based reviews.

-for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) {
+for (String name : new java.util.TreeSet<>(root.children.keySet())) {
+    ClpNode child = root.children.get(name);
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)

140-142: Replace System.out.println with a logger
Same concern raised earlier; direct stdout still appears here.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6873557 and 657c9ad.

📒 Files selected for processing (8)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test

Comment on lines 162 to 167
if (prefix.getTableName() == null) {
schemaTableNames = listTables(session, Optional.ofNullable(prefix.getSchemaName()));
}
else {
schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Null schema triggers NPE in listTableColumns

When prefix.getSchemaName() is null and prefix.getTableName() is provided, you build
new SchemaTableName(null, …), which violates Presto’s non-null contract and blows up at runtime.

-else {
-    schemaTableNames = ImmutableList.of(
-            new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+else {
+    String schema = Optional.ofNullable(prefix.getSchemaName())
+            .orElse(DEFAULT_SCHEMA_NAME);
+    schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
around lines 162 to 167, the code creates a SchemaTableName with a null schema
name when prefix.getSchemaName() is null but prefix.getTableName() is not,
causing a NullPointerException. To fix this, add a null check for
prefix.getSchemaName() before constructing SchemaTableName and handle the null
case appropriately, ensuring that SchemaTableName is never created with a null
schema name.

@kirkrodrigues kirkrodrigues requested a review from anlowee June 13, 2025 14:19
// The remaining (non-translatable) portion of the RowExpression, if any.
private final Optional<RowExpression> remainingExpression;

public ClpExpression(Optional<String> definition, Optional<RowExpression> remainingExpression)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some IDEA warnings about that the Optional should only be used in the resturn type not the argument type. I think it make sense. Maybe in the caller we can call something like new ClpExpression(definition.orElse(null), remainingExpression.orElse(null)) instead of definiing it as Opational here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it's pretty common in Presto code. e.g. BasicQueryInfo, QueryStateInfo, SessionRepresentation, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But I think we may want to make the code style consistent within our connector. In my metadata filter PR, I also need to add properties to this class, shall I also make it Optional? Maybe the higer level question is when to use Optional and when not use. WDYT @kirkrodrigues ?

Copy link
Member

@kirkrodrigues kirkrodrigues Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read a bit about it. I don't think there's a good reason to use Optional for parameters.

However, Presto's style guide does require it for public method parameters if the method isn't performance critical.

Personally, I would just use overloading in this case (i.e., remove this constructor and get rid of the Optional in the other constructors). This case doesn't really need Optional though since we have overloads, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this class is delayed to the next PR, I think this constructor can be considered as performance critical because it is in the critical path (to generate the pushdown and metadata filter SQL). So I think we can use @Nullable to annotate the nullable parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in the critical path, as there are over 100 plan optimizers, and in most real-world cases, the coordinator accounts for less than 10%—or even under 1%—of the total query time. Even if all of the coordinator's time were spent on plan optimization, its impact on the overall query latency would be below 0.1%. The true critical path lies in data scanning and expression evaluation. As discussed before, our plan was to pass raw types but store them as Optional

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)

165-173: Still constructs SchemaTableName with null schema – runtime NPE persists

schemaName can be null when the caller supplied only a table name (prefix.getSchemaName() == null).
new SchemaTableName(null, …) violates Presto’s non-null contract and blows up at runtime (same issue raised in the previous review).

-else {
-    SchemaTableName table = new SchemaTableName(schemaName, prefix.getTableName());
+else {
+    // Default the schema when not provided to avoid NPEs
+    String effectiveSchema = Optional.ofNullable(schemaName).orElse(DEFAULT_SCHEMA_NAME);
+    SchemaTableName table = new SchemaTableName(effectiveSchema, prefix.getTableName());

175-179: getTableHandle() may return null ⇒ guaranteed NPE

The stream unconditionally passes the result of getTableHandle() to getTableMetadata().
When the table is missing, this evaluates to getTableMetadata(session, null) and dereferences null on the first cast inside getTableMetadata().

A minimal Java-8-compatible fix:

 return schemaTableNames.stream()
-        .collect(ImmutableMap.toImmutableMap(
-                Function.identity(),
-                tableName -> getTableMetadata(session, getTableHandle(session, tableName)).getColumns()));
+        .map(tableName -> {
+            ConnectorTableHandle handle = getTableHandle(session, tableName);
+            return handle == null ? null : ImmutableMap.of(
+                    tableName,
+                    getTableMetadata(session, handle).getColumns());
+        })
+        .filter(Objects::nonNull)
+        .flatMap(map -> map.entrySet().stream())
+        .collect(ImmutableMap.toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));

(or switch to an ImmutableMap.Builder loop for clarity).

Leaving this unresolved will crash any metadata listing that references a non-existent table.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70f07dd and 04ded2d.

📒 Files selected for processing (3)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
  • ClpMySqlMetadataProvider (33-135)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (32-97)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

84-92: LGTM – deterministic comparison fixes earlier flakiness

Switching to ImmutableSet removes the ordering assumption and makes the test reliable. Nice catch.

Comment on lines 40 to 75
private final String databaseName = "split_testdb";

@BeforeMethod
public void setUp()
{
clpMetadataDbSetUp = new ClpMetadataDbSetUp();
tableSplits = new HashMap<>();

int numKeys = 3;
int numValuesPerKey = 10;

for (int i = 0; i < numKeys; i++) {
String key = "test_" + i;
List<String> values = new ArrayList<>();

for (int j = 0; j < numValuesPerKey; j++) {
values.add("id_" + j);
}

tableSplits.put(key, values);
}
clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits);
}

@AfterMethod
public void tearDown()
{
clpMetadataDbSetUp.tearDown(databaseName);
}

@Test
public void testListSplits()
{
for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) {
String tableName = entry.getKey();
String tablePath = "/tmp/archives/" + tableName;
List<String> expectedSplits = entry.getValue();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hard-coded DB name & /tmp path risk collisions across parallel runs

Using the constant split_testdb and a fixed on-disk directory means two test invocations (CI shards, IDE re-runs, etc.) will trample each other’s files or fail to start H2 because the DB is already open.

Consider:

- private final String databaseName = "split_testdb";
+ private final String databaseName = "split_testdb_" + System.nanoTime();

and parameterising tablePath via Files.createTempDirectory("archives_") propagated through ClpMetadataDbSetUp.

This keeps the test hermetic and OS-portable.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private final String databaseName = "split_testdb";
@BeforeMethod
public void setUp()
{
clpMetadataDbSetUp = new ClpMetadataDbSetUp();
tableSplits = new HashMap<>();
int numKeys = 3;
int numValuesPerKey = 10;
for (int i = 0; i < numKeys; i++) {
String key = "test_" + i;
List<String> values = new ArrayList<>();
for (int j = 0; j < numValuesPerKey; j++) {
values.add("id_" + j);
}
tableSplits.put(key, values);
}
clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits);
}
@AfterMethod
public void tearDown()
{
clpMetadataDbSetUp.tearDown(databaseName);
}
@Test
public void testListSplits()
{
for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) {
String tableName = entry.getKey();
String tablePath = "/tmp/archives/" + tableName;
List<String> expectedSplits = entry.getValue();
private final String databaseName = "split_testdb_" + System.nanoTime();
@BeforeMethod
public void setUp()
{
clpMetadataDbSetUp = new ClpMetadataDbSetUp();
tableSplits = new HashMap<>();
int numKeys = 3;
int numValuesPerKey = 10;
for (int i = 0; i < numKeys; i++) {
String key = "test_" + i;
List<String> values = new ArrayList<>();
for (int j = 0; j < numValuesPerKey; j++) {
values.add("id_" + j);
}
tableSplits.put(key, values);
}
clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits);
}
@AfterMethod
public void tearDown()
{
clpMetadataDbSetUp.tearDown(databaseName);
}
@Test
public void testListSplits()
{
for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) {
String tableName = entry.getKey();
String tablePath = "/tmp/archives/" + tableName;
List<String> expectedSplits = entry.getValue();
🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
between lines 40 and 76, the test uses a hard-coded database name "split_testdb"
and a fixed directory path "/tmp/archives/", which can cause conflicts when
tests run in parallel. To fix this, modify the setup to generate a unique
temporary directory for the tablePath using
Files.createTempDirectory("archives_") and update ClpMetadataDbSetUp to accept
and use this dynamic path. Also, parameterize the database name if needed to
avoid collisions, ensuring tests run hermetically and are OS-portable.

Comment on lines 38 to 43
public static final String metadataDbUrlTemplate =
"jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
public static final String metadataDbTablePrefix = "clp_";
public static final String metadataDbUser = "sa";
public static final String metadataDbPassword = "";
private final String datasetsTableName = metadataDbTablePrefix + "datasets";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

/tmp hard-coding limits portability

metadataDbUrlTemplate writes directly to /tmp, which doesn’t exist (or isn’t writable) on all OSes.
Generate a temp directory instead:

-    public static final String metadataDbUrlTemplate =
-            "jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+    private static final String TMP_DIR =
+            System.getProperty("java.io.tmpdir").replace(File.separatorChar, '/');
+    public static final String metadataDbUrlTemplate =
+            "jdbc:h2:file:" + TMP_DIR + "/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";

This preserves behaviour while remaining cross-platform.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static final String metadataDbUrlTemplate =
"jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
public static final String metadataDbTablePrefix = "clp_";
public static final String metadataDbUser = "sa";
public static final String metadataDbPassword = "";
private final String datasetsTableName = metadataDbTablePrefix + "datasets";
private static final String TMP_DIR =
System.getProperty("java.io.tmpdir").replace(File.separatorChar, '/');
public static final String metadataDbUrlTemplate =
"jdbc:h2:file:" + TMP_DIR + "/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
public static final String metadataDbTablePrefix = "clp_";
public static final String metadataDbUser = "sa";
public static final String metadataDbPassword = "";
private final String datasetsTableName = metadataDbTablePrefix + "datasets";
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 38 to 43, the metadataDbUrlTemplate hardcodes the /tmp directory,
which is not portable across all operating systems. Modify the code to generate
a temporary directory dynamically using Java's standard library (e.g.,
Files.createTempDirectory or System.getProperty("java.io.tmpdir")) and use that
path in the JDBC URL template instead of /tmp. This change will preserve the
intended behavior while ensuring cross-platform compatibility.

@wraymo wraymo force-pushed the clp_integration-0.293 branch from 04ded2d to 996d67c Compare June 16, 2025 14:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)

155-178: listTableColumns can still throw NPEs for missing tables
The earlier review comment about filtering out null handles and avoiding null schema names is still unresolved.

Please apply the previously suggested Java-8 compatible loop or guard the stream chain by:

return schemaTableNames.stream()
        .map(name -> Optional.ofNullable(getTableHandle(session, name))
                .map(handle -> Map.entry(name, getTableMetadata(session, handle).getColumns())))
        .filter(Optional::isPresent)
        .map(Optional::get)
        .collect(ImmutableMap.toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));

and ensure schemaName defaults to DEFAULT_SCHEMA_NAME when prefix.getSchemaName() is null.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

39-75: Hard-coded DB name and /tmp path still risk collisions and break Windows
The previous feedback about generating a unique database name and using java.io.tmpdir (or Files.createTempDirectory) hasn’t been addressed.

Adopt something like:

private final Path tempDir = Files.createTempDirectory("clp_split_");
private final String databaseName = "split_testdb_" + System.nanoTime();
...
String tablePath = tempDir.resolve(tableName).toString();
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)

38-43: /tmp is still hard-coded – not portable
Reuse System.getProperty("java.io.tmpdir") or a java.nio.file.Path created via Files.createTempDirectory so the tests run on Windows and sandboxed CI environments.


140-148: Check File.delete() return value
Ignoring the boolean result hides failed clean-ups and can leave artefacts behind. Log a warning when deletion fails, as previously suggested.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04ded2d and 996d67c.

📒 Files selected for processing (3)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
  • ClpMySqlMetadataProvider (33-135)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (32-97)
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)

Comment on lines +57 to +65
this.columnHandleCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadColumnHandles));
this.tableHandleCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableHandles));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add an upper bound to the caches and assign the provider first

Creating unbounded LoadingCaches can lead to unrestrained memory growth in long-lived coordinator JVMs.
Also, initialise clpMetadataProvider before building caches so that a future code change (or constructor re-ordering) cannot inadvertently trigger a load while the provider is still null.

-        this.columnHandleCache = CacheBuilder.newBuilder()
+        this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null");
+
+        this.columnHandleCache = CacheBuilder.newBuilder()
+                .maximumSize(10_000)   // tune appropriately
                 .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
                 .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
                 .build(CacheLoader.from(this::loadColumnHandles));
-        this.tableHandleCache = CacheBuilder.newBuilder()
+        this.tableHandleCache = CacheBuilder.newBuilder()
+                .maximumSize(1_000)    // tune appropriately
                 .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
                 .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
                 .build(CacheLoader.from(this::loadTableHandles));
-        this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
this.columnHandleCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadColumnHandles));
this.tableHandleCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableHandles));
// assign provider first to avoid any loader invocation with a null field
this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null");
this.columnHandleCache = CacheBuilder.newBuilder()
.maximumSize(10_000) // tune appropriately
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadColumnHandles));
this.tableHandleCache = CacheBuilder.newBuilder()
.maximumSize(1_000) // tune appropriately
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableHandles));
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
around lines 57 to 65, the caches are created without an upper bound which risks
unbounded memory growth. Add a maximum size limit to both columnHandleCache and
tableHandleCache using CacheBuilder's maximumSize method. Also, ensure that
clpMetadataProvider is assigned before building these caches to prevent any
accidental cache loading when the provider might still be null.

anlowee
anlowee previously approved these changes Jun 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)

29-29: Replace heavyweight Pair with lighter/idiomatic alternative

The test still uses org.apache.commons.math3.util.Pair, bringing in an otherwise-unused Math3 dependency. Switching to ImmutablePair, a Java 17 record, or Map.entry would eliminate the extra JAR and reduce class-path surface.
This was raised in a previous review and remains unaddressed.

Also applies to: 58-64


34-34: Use immutable collections instead of HashSet for expectations

HashSet is verbose and mutable; ImmutableSet.of(...) (or ImmutableSet.builder()) is the convention used elsewhere in the codebase and avoids accidental mutation.

Also applies to: 82-85

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

42-57: Hard-coded DB name risks collisions; generate a unique one

getDbHandle("split_testdb") still uses a constant name. Parallel CI shards can trample each other’s files. Append System.nanoTime() or reuse the UUID technique employed in ClpMetadataDbSetUp.getDbHandle.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)

39-57: Avoid hard-coding /tmp; use the platform temp dir

The JDBC URL template and database path rely on /tmp, which is not guaranteed to exist or be writable on all OSes (e.g., Windows). Build the path from System.getProperty("java.io.tmpdir") or Files.createTempDirectory(...).


154-162: Check File.delete() return value to surface clean-up failures

delete() is invoked without verifying its boolean result, silently ignoring failures (file locks, permissions, etc.). Log a warning when deletion returns false so leftover artefacts aren’t missed.

-if (dbFile.exists()) {
-    dbFile.delete();
-    log.info("Deleted database file: " + dbFile.getAbsolutePath());
-}
+if (dbFile.exists() && !dbFile.delete()) {
+    log.warn("Failed to delete database file: %s", dbFile.getAbsolutePath());
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 996d67c and d5e3ccc.

📒 Files selected for processing (3)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)

Comment on lines 93 to 129
ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder()
.add(ColumnMetadata.builder()
.setName("a_bigint")
.setType(BigintType.BIGINT)
.setNullable(true)
.build())
.add(ColumnMetadata.builder()
.setName("a_varchar")
.setType(VarcharType.VARCHAR)
.setNullable(true)
.build())
.add(ColumnMetadata.builder()
.setName("b_double")
.setType(DoubleType.DOUBLE)
.setNullable(true)
.build())
.add(ColumnMetadata.builder()
.setName("b_varchar")
.setType(VarcharType.VARCHAR)
.setNullable(true)
.build())
.add(ColumnMetadata.builder()
.setName("c")
.setType(RowType.from(ImmutableList.of(
RowType.field("d", BooleanType.BOOLEAN),
RowType.field("e", VarcharType.VARCHAR))))
.setNullable(true)
.build())
.add(ColumnMetadata.builder()
.setName("f")
.setType(RowType.from(ImmutableList.of(
RowType.field("g",
RowType.from(ImmutableList.of(
RowType.field("h", new ArrayType(VarcharType.VARCHAR))))))))
.setNullable(true)
.build())
.build();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider extracting expected column metadata into a helper for readability

The 35-line builder chain obscures the intent of the test. A small helper such as expectedColumns("a_bigint", BIGINT, …) or an ImmutableSet<ColumnMetadata> returned from a dedicated method would make the assertion easier to read and maintain.

🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
around lines 93 to 129, the long builder chain for creating the expected column
metadata reduces readability. Extract this column metadata construction into a
separate helper method that returns the ImmutableSet<ColumnMetadata> or use a
concise helper like expectedColumns(...) to build the set. This refactor will
simplify the test method and improve maintainability by isolating the setup
logic.

Comment on lines 175 to 189
private static void updateDatasetsTable(Connection conn, String tableName) throws SQLException
{
final String insertDatasetTableSQL = String.format(
"INSERT INTO %s (name, archive_storage_type, archive_storage_directory) VALUES (?, ?, ?)", datasetsTableName);
try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetTableSQL)) {
pstmt.setString(1, tableName);
pstmt.setString(2, "fs");
pstmt.setString(3, "/tmp/archives/" + tableName);
pstmt.executeUpdate();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hard-coded archive path leaks /tmp usage into metadata

The helper inserts /tmp/archives/ directly into the datasets table. Consider parameterising the root directory (or using the same temp dir as the DB) so tests stay hermetic and portable.

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 175 to 183, the archive storage directory path is hard-coded to
"/tmp/archives/", which leaks the use of the /tmp directory and reduces test
portability. Modify the method to accept the root archive directory as a
parameter or retrieve it from a configurable source, then use this parameter to
construct the archive_storage_directory value instead of the hard-coded path.
This will make the tests more hermetic and portable by avoiding fixed filesystem
paths.

@wraymo wraymo force-pushed the clp_integration-0.293 branch from d5e3ccc to 6859026 Compare June 16, 2025 16:28
@wraymo wraymo force-pushed the clp_integration-0.293 branch from 6859026 to 75fea65 Compare June 16, 2025 16:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)

82-84: Use ImmutableSet for cleaner code

Replace HashSet construction with ImmutableSet for better immutability and brevity.

-        HashSet<SchemaTableName> tables = new HashSet<>();
-        tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName));
-        assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables);
+        ImmutableSet<SchemaTableName> expectedTables = ImmutableSet.of(
+                new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName));
+        assertEquals(ImmutableSet.copyOf(metadata.listTables(SESSION, Optional.empty())), expectedTables);

93-129: Extract column metadata creation for better readability

The long builder chain makes the test harder to read. Consider extracting to a helper method.

+    private static ImmutableSet<ColumnMetadata> buildExpectedColumnMetadata()
+    {
+        return ImmutableSet.<ColumnMetadata>builder()
+                .add(ColumnMetadata.builder()
+                        .setName("a_bigint")
+                        .setType(BigintType.BIGINT)
+                        .setNullable(true)
+                        .build())
+                .add(ColumnMetadata.builder()
+                        .setName("a_varchar")
+                        .setType(VarcharType.VARCHAR)
+                        .setNullable(true)
+                        .build())
+                .add(ColumnMetadata.builder()
+                        .setName("b_double")
+                        .setType(DoubleType.DOUBLE)
+                        .setNullable(true)
+                        .build())
+                .add(ColumnMetadata.builder()
+                        .setName("b_varchar")
+                        .setType(VarcharType.VARCHAR)
+                        .setNullable(true)
+                        .build())
+                .add(ColumnMetadata.builder()
+                        .setName("c")
+                        .setType(RowType.from(ImmutableList.of(
+                                RowType.field("d", BooleanType.BOOLEAN),
+                                RowType.field("e", VarcharType.VARCHAR))))
+                        .setNullable(true)
+                        .build())
+                .add(ColumnMetadata.builder()
+                        .setName("f")
+                        .setType(RowType.from(ImmutableList.of(
+                                RowType.field("g",
+                                        RowType.from(ImmutableList.of(
+                                                RowType.field("h", new ArrayType(VarcharType.VARCHAR))))))))
+                        .setNullable(true)
+                        .build())
+                .build();
+    }
+
     @Test
     public void testGetTableMetadata()
     {
         ClpTableHandle clpTableHandle =
                 (ClpTableHandle) metadata.getTableHandle(SESSION, new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName));
         ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(SESSION, clpTableHandle);
-        ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder()
-                .add(ColumnMetadata.builder()
-                        .setName("a_bigint")
-                        .setType(BigintType.BIGINT)
-                        .setNullable(true)
-                        .build())
-                // ... rest of the builder chain ...
-                .build();
+        ImmutableSet<ColumnMetadata> columnMetadata = buildExpectedColumnMetadata();
         assertEquals(columnMetadata, ImmutableSet.copyOf(tableMetadata.getColumns()));
     }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)

162-168: Check file deletion results to detect cleanup failures

The return value of File.delete() is ignored, which can hide cleanup failures.

 if (dbFile.exists()) {
-    dbFile.delete();
-    log.info("Deleted database file: " + dbFile.getAbsolutePath());
+    if (dbFile.delete()) {
+        log.info("Deleted database file: " + dbFile.getAbsolutePath());
+    }
+    else {
+        log.warn("Failed to delete database file: " + dbFile.getAbsolutePath());
+    }
 }
 if (lockFile.exists()) {
-    lockFile.delete();
+    if (!lockFile.delete()) {
+        log.warn("Failed to delete lock file: " + lockFile.getAbsolutePath());
+    }
 }

181-190: Parameterize archive directory path for better test isolation

The archive storage directory is hard-coded to /tmp/archives/, reducing portability and test isolation.

Consider adding a parameter to accept the archive root directory:

-private static void updateDatasetsTable(Connection conn, String tableName) throws SQLException
+private static void updateDatasetsTable(Connection conn, String tableName, String archiveRootDir) throws SQLException
 {
     final String insertDatasetTableSQL = String.format(
             "INSERT INTO %s (name, archive_storage_type, archive_storage_directory) VALUES (?, ?, ?)", datasetsTableName);
     try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetTableSQL)) {
         pstmt.setString(1, tableName);
         pstmt.setString(2, "fs");
-        pstmt.setString(3, "/tmp/archives/" + tableName);
+        pstmt.setString(3, archiveRootDir + "/" + tableName);
         pstmt.executeUpdate();
     }
 }

Then update the callers to pass a configurable archive root directory.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6859026 and 75fea65.

📒 Files selected for processing (3)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

81-89: Good fix for the ordering issue!

Using ImmutableSet to compare splits as sets rather than relying on order is the correct approach. This eliminates the test flakiness that could occur from non-deterministic ordering.


public static DbHandle getDbHandle(String dbName)
{
return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use system temp directory for cross-platform compatibility

While the path now includes UUID for uniqueness, it still hard-codes /tmp which doesn't exist on all operating systems (e.g., Windows).

+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;

 public static DbHandle getDbHandle(String dbName)
 {
-    return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID()));
+    try {
+        Path tempDir = Files.createTempDirectory("presto-clp-test");
+        return new DbHandle(tempDir.resolve(dbName + "-" + UUID.randomUUID()).toString());
+    }
+    catch (IOException e) {
+        throw new RuntimeException("Failed to create temporary directory", e);
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID()));
// add at top of file
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
public static DbHandle getDbHandle(String dbName)
{
try {
Path tempDir = Files.createTempDirectory("presto-clp-test");
return new DbHandle(tempDir.resolve(dbName + "-" + UUID.randomUUID()).toString());
}
catch (IOException e) {
throw new RuntimeException("Failed to create temporary directory", e);
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
at line 62, the file path is hard-coded to "/tmp", which is not cross-platform
compatible. Replace the hard-coded "/tmp" with a call to the system's temporary
directory, such as using Java's System.getProperty("java.io.tmpdir"), to ensure
the path works on all operating systems. Keep the UUID for uniqueness but
construct the full path using the system temp directory dynamically.

@anlowee
Copy link

anlowee commented Jun 19, 2025

@coderabbitai please learn from my comments, especially for the static import and Guava:
Prefer Immutable collections in Guava.
Static imports For example, in your code, do not use something like

String.format("%s", rule.getClass().getSimpleName())

Instead, static import this method and use “format()”

import static java.lang.String.format;
...
format("%s", rule.getClass().getSimpleName())

@coderabbitai
Copy link

coderabbitai bot commented Jun 19, 2025

Tip

For best results, initiate chat on the files or code changes.

@anlowee, thank you for the clear coding style guidance! I've noted both preferences and will apply them in future reviews of this repository.

🐰
/| /|
( ✌️ )
|(_)|


✏️ Learnings added
Learnt from: anlowee
URL: https://github.com/y-scope/presto/pull/15

Timestamp: 2025-06-19T08:41:46.280Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.

Learnt from: anlowee
URL: https://github.com/y-scope/presto/pull/15

Timestamp: 2025-06-19T08:41:46.280Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (22)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (3)

16-28: Consider alphabetizing imports for consistency.

The import statements could be organized alphabetically for better maintainability, as suggested in previous reviews.


54-63: Missing strictConfig() in Bootstrap initialization.

The Bootstrap is initialized without strictConfig(), which means configuration typos or deprecated keys will be silently ignored, potentially leading to runtime issues.

Apply this fix to enable strict configuration:

-            Bootstrap app = new Bootstrap(new JsonModule(), new ClpModule(), binder -> {
+            Bootstrap app = new Bootstrap(new JsonModule(), new ClpModule(), binder -> {
                 binder.bind(FunctionMetadataManager.class).toInstance(context.getFunctionMetadataManager());
                 binder.bind(NodeManager.class).toInstance(context.getNodeManager());
                 binder.bind(RowExpressionService.class).toInstance(context.getRowExpressionService());
                 binder.bind(StandardFunctionResolution.class).toInstance(context.getStandardFunctionResolution());
                 binder.bind(TypeManager.class).toInstance(context.getTypeManager());
-            });
+            }).strictConfig();

55-59: Consider alphabetizing the binding statements.

The service bindings could be organized alphabetically for consistency, as suggested in previous reviews.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)

89-91: Use ImmutableSet instead of HashSet for better immutability.

Replace HashSet construction with ImmutableSet for consistency with the codebase preferences.

-        HashSet<SchemaTableName> tables = new HashSet<>();
-        tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME));
-        assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables);
+        ImmutableSet<SchemaTableName> expectedTables = ImmutableSet.of(
+                new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME));
+        assertEquals(ImmutableSet.copyOf(metadata.listTables(SESSION, Optional.empty())), expectedTables);

99-135: Extract expected column metadata into a helper method.

The 35-line builder chain obscures the test intent. Consider extracting this into a helper method for better readability and maintainability.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

44-56: Consider using a strategy map to simplify provider binding.

The repeated if/else blocks for metadata and split provider binding could be simplified using a strategy map pattern as suggested in previous reviews.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

61-64: Implement getInfoMap instead of deprecated getInfo method.

The getInfo() method is deprecated. Consider implementing getInfoMap() instead, which provides better type safety and flexibility for split information.

Based on the HiveSplit example from previous reviews:

-    @Override
-    public Map<String, String> getInfo()
-    {
-        return ImmutableMap.of("path", path);
-    }
+    @Override
+    public Map<String, String> getInfoMap()
+    {
+        return ImmutableMap.of("path", path);
+    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)

82-84: Critical: Swallowing SQL exceptions hides split failures

The catch block logs at WARN level and returns an empty list, causing queries to proceed with zero splits and produce confusing "table is empty" results instead of surfacing the real metadata issue.

Apply this fix to propagate the error properly:

-        catch (SQLException e) {
-            log.warn("Database error while processing splits for %s: %s", tableName, e);
-        }
+        catch (SQLException e) {
+            throw new PrestoException(StandardErrorCode.CONNECTOR_ERROR,
+                    format("Unable to list splits for table %s: %s", tableName, e.getMessage()), e);
+        }
presto-docs/src/main/sphinx/connector/clp.rst (1)

54-56: Add missing target label for Data Types cross-reference

The link See \Data Types`_ for details` won't resolve because there's no label for the Data Types heading.

Add a target label before the Data Types heading:

+.. _Data Types:
+
 Data Types
 ----------
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)

62-64: Add null check for input parameter

The method should validate that fullName is not null to prevent a potential NullPointerException at line 65 when calling split().

Add a null check at the beginning:

 public void addColumn(String fullName, byte type)
 {
+    if (fullName == null) {
+        throw new PrestoException(ClpErrorCode.CLP_INVALID_INPUT, "Column name cannot be null");
+    }
     Type prestoType = mapColumnType(type);

167-170: Sanitise polymorphic type suffixes

type.getDisplayName() can contain spaces, commas, and parentheses (e.g., row(col bigint)), producing invalid column identifiers like id_row(col bigint).

Replace non-alphanumeric characters with underscores:

 private String getTypeSuffix(Type type)
 {
-    return (type instanceof ArrayType) ? "array" : type.getDisplayName();
+    String suffix = (type instanceof ArrayType) ? "array" : type.getDisplayName();
+    return suffix.replaceAll("[^a-zA-Z0-9_]", "_");
 }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)

78-78: Validate table name before SQL embedding

schemaTableName.getTableName() is directly embedded into the SQL query without validation, risking SQL injection or broken queries if the name contains backticks.

Add validation before constructing the query:

+String tableName = schemaTableName.getTableName();
+if (!isValidIdentifier(tableName)) {
+    throw new PrestoException(ClpErrorCode.CLP_INVALID_TABLE_NAME, "Invalid table name: " + tableName);
+}
-String query = format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, config.getMetadataTablePrefix(), schemaTableName.getTableName());
+String query = format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, config.getMetadataTablePrefix(), tableName);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)

69-77: Add upper bounds to caches for memory safety

Creating unbounded LoadingCaches can lead to uncontrolled memory growth in long-lived coordinator JVMs.

Add reasonable maximum size limits:

 this.columnHandleCache = CacheBuilder.newBuilder()
+        .maximumSize(10_000)
         .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
         .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
         .build(CacheLoader.from(this::loadColumnHandles));
 this.tableHandleCache = CacheBuilder.newBuilder()
+        .maximumSize(1_000)
         .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
         .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
         .build(CacheLoader.from(this::loadTableHandles));

167-170: Potential NPE from null table handle

getTableHandle() can return null, but listTableColumns() unconditionally passes its result to getTableMetadata(), which will throw a NullPointerException.

Filter out null handles before processing:

 return schemaTableNames.stream()
+        .map(tableName -> {
+            ConnectorTableHandle handle = getTableHandle(session, tableName);
+            return handle != null ? Map.entry(tableName, getTableMetadata(session, handle).getColumns()) : null;
+        })
+        .filter(Objects::nonNull)
         .collect(toImmutableMap(
-                Function.identity(),
-                tableName -> getTableMetadata(session, getTableHandle(session, tableName)).getColumns()));
+                Map.Entry::getKey,
+                Map.Entry::getValue));

Note: Add import java.util.Objects; if not already present.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)

52-52: Replace hardcoded /tmp with system temp directory.

The archive storage directory still uses a hardcoded /tmp path, which isn't portable across operating systems.

Consider using System.getProperty("java.io.tmpdir") for cross-platform compatibility.


65-65: Use system temp directory instead of hardcoded /tmp.

The database handle creation still uses a hardcoded /tmp path which was flagged in previous reviews for portability issues.

Replace with a system-agnostic temporary directory approach.


175-175: Use parameterized logging instead of string concatenation.

String concatenation creates unnecessary object allocation.

-                log.info("Deleted database dir" + dir.getAbsolutePath());
+                log.info("Deleted database dir: %s", dir.getAbsolutePath());
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (5)

33-34: Consider using Duration objects for time intervals.

Storing time intervals as raw long values lacks unit safety and validation. Presto's configuration framework supports Duration objects which provide better type safety.

This would allow human-friendly configuration values like "60s" or "10m" and prevent invalid negative values.


103-108: Add security annotation for password configuration.

The password setter lacks @ConfigSecuritySensitive annotation which prevents accidental logging of sensitive values.

 @Config("clp.metadata-db-password")
+@ConfigSecuritySensitive
 public ClpConfig setMetadataDbPassword(String metadataDbPassword)

128-150: Time interval methods should use Duration type.

As noted in previous reviews, these methods should use com.facebook.airlift.units.Duration instead of long for better unit safety and validation.

Duration objects provide human-friendly parsing and prevent unit ambiguities.


103-108: Duplicate comment: Password annotation missing

This setter should be annotated with @ConfigSecuritySensitive to prevent accidental logging of the password, as previously mentioned in past review comments.


133-138: Duplicate comment: Time intervals should use Duration

As noted in past comments, these time intervals should use com.facebook.airlift.units.Duration instead of raw long values for better unit safety and validation.

Also applies to: 145-150

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3b7cb7 and 12f2515.

📒 Files selected for processing (12)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.280Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (13)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)

33-70: Well-structured connector factory implementation.

The factory correctly implements the ConnectorFactory interface, properly handles null validation, and appropriately wraps exceptions. The dependency injection setup is sound.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)

80-137: Comprehensive test coverage with good validation.

The tests effectively validate schema listing, table listing, and detailed table metadata retrieval including complex nested row and array types. The test setup and assertions are thorough.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)

86-94: Good fix for order-dependent test issues.

The test now properly compares splits as sets rather than relying on order, which addresses the potential flakiness identified in previous reviews. This is a solid improvement.


72-96: Well-structured split validation test.

The test effectively validates that the split provider returns the expected splits with correct paths. The setup with multiple tables and the set-based comparison ensures robust testing.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

31-58: Solid dependency injection configuration.

The module properly configures all necessary bindings for the CLP connector, including conditional provider binding based on configuration. The error handling for unsupported provider types is appropriate.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

31-65: Clean and well-implemented split class.

The split implementation correctly handles immutability, JSON serialization, and null validation. The structure follows Presto connector patterns appropriately.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)

68-120: Well-structured metadata setup implementation.

The method properly uses try-with-resources for database connections and prepared statements, follows the configuration builder pattern, and handles batch operations efficiently.


122-167: Clean split provider setup implementation.

Good use of try-with-resources, proper batch processing, and consistent configuration pattern.


42-43: Apply static import preference from coding guidelines

Based on the PR coding style guidance, consider using static import for String.format:

+import static java.lang.String.format;
-import static java.lang.String.format;

The static import is already present and correctly applied throughout the codebase.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (4)

115-126: Excellent validation implementation for metadata table prefix.

The validation logic properly checks for null values and uses regex pattern matching to ensure safe SQL identifiers, with clear error messaging.


164-172: Clean enum definitions for provider types.

The enum structure provides a solid foundation for future extensibility while currently supporting only MySQL providers as intended.


23-23: ```shell
#!/bin/bash

Locate all long-typed configuration fields in ClpConfig.java

rg -n "^\s*private\s+long" presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java


---

`118-122`: **Consider validating non-empty prefix**

The current validation allows empty strings but not null. Consider whether an empty prefix should be valid or if additional validation is needed:

```diff
-        if (metadataTablePrefix == null || !SAFE_SQL_TABLE_NAME_PATTERN.matcher(metadataTablePrefix).matches()) {
+        if (metadataTablePrefix == null || metadataTablePrefix.trim().isEmpty() || 
+            !SAFE_SQL_TABLE_NAME_PATTERN.matcher(metadataTablePrefix).matches()) {

This would prevent accidental empty prefixes that might cause issues with table naming.

Likely an incorrect or invalid review comment.

ClpTableHandle clpTableHandle = clpTableLayoutHandle.getTable();
String tablePath = clpTableHandle.getTablePath();
String tableName = clpTableHandle.getSchemaTableName().getTableName();
String archivePathQuery = format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate table name before SQL embedding

The table name from tableName is directly embedded into the SQL query without validation, risking SQL injection or broken queries if the name contains backticks or other special characters.

Add validation before constructing the query:

 String tableName = clpTableHandle.getSchemaTableName().getTableName();
+if (!isValidTableName(tableName)) {
+    throw new PrestoException(ClpErrorCode.CLP_INVALID_TABLE_NAME, "Invalid table name: " + tableName);
+}
 String archivePathQuery = format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName);

Add this validation method:

private boolean isValidTableName(String tableName) {
    return tableName != null && ClpConfig.SAFE_SQL_TABLE_NAME_PATTERN.matcher(tableName).matches();
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
at line 70, the tableName is directly embedded into the SQL query without
validation, which risks SQL injection or malformed queries. To fix this, add a
validation method isValidTableName that checks the tableName against a safe
pattern, and call this method before constructing the query. If the tableName is
invalid, handle it appropriately (e.g., throw an exception) to prevent unsafe
SQL execution.

Comment on lines +169 to +181
public static void tearDown(DbHandle dbHandle)
{
File dir = new File(dbHandle.dbPath).getParentFile();
if (dir.exists()) {
try {
FileUtils.deleteDirectory(dir);
log.info("Deleted database dir" + dir.getAbsolutePath());
}
catch (IOException e) {
log.warn("Failed to delete directory " + dir + ": " + e.getMessage());
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Effective cleanup implementation with minor logging improvement.

The teardown logic properly handles directory removal and error cases.

Consider using parameterized logging on line 178 as well:

-                log.warn("Failed to delete directory " + dir + ": " + e.getMessage());
+                log.warn("Failed to delete directory %s: %s", dir, e.getMessage());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static void tearDown(DbHandle dbHandle)
{
File dir = new File(dbHandle.dbPath).getParentFile();
if (dir.exists()) {
try {
FileUtils.deleteDirectory(dir);
log.info("Deleted database dir" + dir.getAbsolutePath());
}
catch (IOException e) {
log.warn("Failed to delete directory " + dir + ": " + e.getMessage());
}
}
}
public static void tearDown(DbHandle dbHandle)
{
File dir = new File(dbHandle.dbPath).getParentFile();
if (dir.exists()) {
try {
FileUtils.deleteDirectory(dir);
log.info("Deleted database dir" + dir.getAbsolutePath());
}
catch (IOException e) {
- log.warn("Failed to delete directory " + dir + ": " + e.getMessage());
+ log.warn("Failed to delete directory %s: %s", dir, e.getMessage());
}
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 169 to 181, the log statement on line 178 concatenates strings
directly, which is less efficient and not recommended. Change the log.info call
to use parameterized logging by passing the directory path as a parameter
instead of concatenating it into the message string.

try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetsTableSql)) {
pstmt.setString(1, tableName);
pstmt.setString(2, "fs");
pstmt.setString(3, ARCHIVE_STORAGE_DIRECTORY_BASE + tableName);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider parameterizing the archive storage directory for consistency

The hardcoded concatenation of ARCHIVE_STORAGE_DIRECTORY_BASE + tableName could benefit from using the same format pattern used elsewhere in the class:

-            pstmt.setString(3, ARCHIVE_STORAGE_DIRECTORY_BASE + tableName);
+            pstmt.setString(3, format("%s%s", ARCHIVE_STORAGE_DIRECTORY_BASE, tableName));

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
at line 210, replace the hardcoded string concatenation of
ARCHIVE_STORAGE_DIRECTORY_BASE + tableName with the class's existing format
pattern for archive storage directory to maintain consistency. Identify the
format pattern used elsewhere in the class for this purpose and apply it here
when setting the string parameter in pstmt.

@wraymo
Copy link
Author

wraymo commented Jun 19, 2025

@anlowee I’ve updated the PR. There are a few cases I intentionally left unchanged:

  1. For general method names like of, copyOf, equals, or symbols such as RowType.fromType that are not typically used as static imports in PrestoDB, I avoided static imports. Using the qualified form improves readability by preventing ambiguity with overly generic names.
  2. For tearDown, I added a qualifier to clearly distinguish it from the class's own tearDown method.
  3. The private modifier on enum constructors is redundant, so I removed it.
  4. For enum comparisons, using == is preferred over .equals() for clarity and correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants