Skip to content

Conversation

@EdwardArchive
Copy link
Contributor

@EdwardArchive EdwardArchive commented Nov 24, 2025

Overview
#6839

This PR adds comprehensive support for StarRocks as an OLAP engine in Rill, enabling users to connect to existing StarRocks clusters and power Rill dashboards with external
tables.

https://www.starrocks.io/ is an open-source, high-performance OLAP database designed for real-time analytics. It provides a MySQL-compatible interface with columnar storage
and vectorized query execution, making it ideal for large-scale data analytics workloads.


Key Features

  1. Full OLAP Store Implementation
  • Query execution with proper type mapping (BOOLEAN, INT, DECIMAL, DATE, DATETIME, JSON, ARRAY, MAP, STRUCT, etc.)
  • Schema inference from query results
  • Connection pooling with lazy initialization
  • External catalog support (Iceberg, Hive) via SET CATALOG and USE database
  1. Information Schema Support
  • ListDatabaseSchemas() - List all databases in a catalog
  • ListTables() - List tables/views in a database with pagination
  • GetTable() - Get table metadata including column types
  • Lookup() - Look up specific table by catalog.database.table
  1. Model Executor (CTAS Support)
  • Self-to-self execution - Create tables from SQL within same connector
  • Cross-connector execution - ETL from external catalog to default catalog
  • Table models - DUPLICATE, AGGREGATE, UNIQUE, PRIMARY KEY
  • Partitioning - RANGE and LIST partitioning support
  • Distribution - HASH and RANDOM distribution with configurable buckets
  • Incremental models - Append strategy support
  1. Model Manager
  • Create(), Rename(), Delete(), Exists() operations
  • Staging table support for atomic deployments

Architecture Decisions

StarRocks 3-Level Hierarchy Mapping

StarRocks -> Rill

Catalog -> Database
Database -> DatabaseSchema
Table -> Table

External Catalog Handling

For external catalogs (Iceberg, Hive), the database is NOT included in DSN because it doesn't exist in default_catalog. Instead:

  1. Connect without database in DSN
  2. Execute SET CATALOG <catalog_name>
  3. Execute USE <database_name>

Files Added/Modified

Backend (Go)

File Description
runtime/drivers/starrocks/starrocks.go Driver registration, connection config, DSN parsing
runtime/drivers/starrocks/olap.go OLAPStore implementation, query execution, type mapping
runtime/drivers/starrocks/information_schema.go InformationSchema implementation
runtime/drivers/starrocks/model_executor.go ModelExecutor for CTAS operations
runtime/drivers/starrocks/model_manager.go ModelManager for table lifecycle
runtime/drivers/starrocks/utils.go Utility functions (keyword escaping, catalog switching)
runtime/drivers/dialect.go Added DialectStarRocks
runtime/queries/*.go Added StarRocks dialect support for various queries

Frontend (TypeScript/Svelte)

File Description
web-common/src/features/connectors/connectors-utils.ts StarRocks table name formatting
web-common/src/components/icons/connectors/StarRocksIcon.svelte StarRocks logo icon

Query Support

The following query types are supported for StarRocks:

Query Status Notes
column_desc_stats.go Supported Uses percentile_approx() for quantiles
column_numeric_histogram.go Supported Custom bucket generation
column_time_range.go Supported MIN/MAX for time columns
column_topk.go Supported Top-K with COUNT aggregation
table_head.go Supported Preview with LIMIT

Testing

  • Go build passes
  • Manual testing with StarRocks cluster (internal tables + external Iceberg catalog)

Future Improvements

  • Add unit tests for StarRocks driver
  • Support merge incremental strategy for UNIQUE/PRIMARY KEY models
  • Add physical size calculation via system tables
  • Support more StarRocks-specific features (Bitmap index, Bloom filter, etc.)

@EdwardArchive EdwardArchive changed the title Add olap connection type starrocks Add StarRocks Driver Support Nov 24, 2025
Add Support Dialect
- EscapeTable
- JoinOnExpression
- SelectTimeRangeBins
- GetTimeExpr
- SupportsILike(diabled)
@EdwardArchive
Copy link
Contributor Author

I just resolve conflicts

@k-anshul
Copy link
Member

Hey @EdwardArchive

Thanks for the PR. I tested it locally and noticed a few issues:

  1. Some of the profiling queries are returning 5xx errors.
  2. A few queries are also returning data in binary rather than strings.

Can you take a deeper look at the profiling queries and address these?

Separately, after internal discussion, we’d like to evaluate supporting StarRocks through its MySQL compatibility layer instead of treating it as a dedicated OLAP engine. In this model, we would not support modeling in Rill and would only enable dashboards on top of existing datasets, similar to how we support Druid today.

@EdwardArchive
Copy link
Contributor Author

EdwardArchive commented Nov 29, 2025

Hi @k-anshul

Thank you for the review and feedback!

Regarding the technical issues (5xx errors and binary data):

I'm glad you raised these issues - I've addressed them in the current commits. The fixes include improved type mapping for LARGEINT, BINARY/VARBINARY,
DECIMAL types, and proper handling for complex types (ARRAY, MAP, STRUCT).

If you encounter any additional issues during testing, please let me know the specific cases and I'll be happy to investigate further.

Regarding the strategic direction (MySQL compatibility layer vs dedicated OLAP):

I appreciate the team's consideration on this approach. Before proceeding, I'd like to understand the reasoning behind preferring the MySQL compatibility layer over the dedicated OLAP implementation.

From my perspective, StarRocks offers powerful capabilities that could benefit Rill users - particularly features like external catalog support (Iceberg, Hive), advanced profiling queries (histogram, time range analysis), and the modeling layer. These were tested and working in the current implementation.

Additionally, StarRocks natively supports Arrow Flight SQL, which aligns well with Rill's architecture and could provide better performance for large-scale data transfers in future iterations.

That said, I'm open to discussing a simplified approach. If the concern is maintenance overhead or complexity, perhaps we could consider keeping the core OLAP functionality while removing the model executor, similar to how other connectors handle read-only data sources?

I'd love to hear more about the team's considerations so we can find the best path forward.

PS. I Just write some update on first comment

@k-anshul
Copy link
Member

k-anshul commented Dec 2, 2025

Hi @EdwardArchive

Thanks for the fixes. I will take a look at them and get back to you if I find any issues.

Before proceeding, I'd like to understand the reasoning behind preferring the MySQL compatibility layer over the dedicated OLAP implementation.

The reasoning is that instead of implementing dialects of each OLAP system we implement MySQL and Postgres dialects so that all systems that support those will automatically be supported. As I understood there aren't any performance trade-offs with using Starrocks in MySQL compatibility at least when querying data stored in it.
I also saw many other BI tools support Starrocks using it MySQL compatibility.

StarRocks offers powerful capabilities that could benefit Rill users - particularly features like external catalog support (Iceberg, Hive), advanced profiling queries (histogram, time range analysis), and the modeling layer

Do you think it is not possible to implement profiling queries using MySQL dialect in performant way ? If that is the case then we can consider Starrocks support as well.
We may not need external catalog support or the modeling layer support since we don't intend to support modeling in Starrocks right now.

Additionally, StarRocks natively supports Arrow Flight SQL, which aligns well with Rill's architecture and could provide better performance for large-scale data transfers in future iterations.

Yeah I think using arrow to export data would be nice and performant though exports are a small use case of the application.

That said, I'm open to discussing a simplified approach. If the concern is maintenance overhead or complexity, perhaps we could consider keeping the core OLAP functionality while removing the model executor, similar to how other connectors handle read-only data sources?

Yeah in the first cut we should definitely keep the scope limited and we can remove modeling.

@EdwardArchive
Copy link
Contributor Author

Hi @k-anshul, thank you for sharing your thoughts.

As the first step, I will create a new PR based on the current code while excluding the model functionality.
In the meantime, if you encounter any errors in the tested code, I would appreciate your feedback.

Thank you.

@k-anshul
Copy link
Member

k-anshul commented Dec 2, 2025

Hi @k-anshul, thank you for sharing your thoughts.

As the first step, I will create a new PR based on the current code while excluding the model functionality. In the meantime, if you encounter any errors in the tested code, I would appreciate your feedback.

Thank you.

Please hold on to implementing MySQL dialect.

@k-anshul
Copy link
Member

k-anshul commented Dec 3, 2025

Hey @EdwardArchive

Let us go ahead with your current implementation of using Starrocks dialect.
Please remove the modeling support from the PR. Once done I can take a deeper look at the PR.

@EdwardArchive
Copy link
Contributor Author

Hey @k-anshul,

I split out the model removal work on StarRocks into this PR.

Let’s go over it when you’re free:
#8454

And I'll closed for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants