Skip to content

gate schema discovery behind "auto" for remote engines#9784

Open
Light2Dark wants to merge 2 commits into
mainfrom
cursor/9a445275
Open

gate schema discovery behind "auto" for remote engines#9784
Light2Dark wants to merge 2 commits into
mainfrom
cursor/9a445275

Conversation

@Light2Dark
Copy link
Copy Markdown
Collaborator

📝 Summary

Closes #9775

The datasources panel scanned every catalog/schema on remote backends such as Snowflake because the ibis and adbc engines hardcoded auto_discover_schemas=True. Switch them to "auto" so discovery is gated by the cheap-dialect heuristic, matching the sqlalchemy engine.

Also unify the shared default discovery config (default_inference_config) and the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery) across engines to prevent this kind of drift in the future.

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

The datasources panel scanned every catalog/schema on remote backends such
as Snowflake because the ibis and adbc engines hardcoded
auto_discover_schemas=True. Switch them to "auto" so discovery is gated by
the cheap-dialect heuristic, matching the sqlalchemy engine.

Also unify the shared default discovery config (default_inference_config) and
the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery)
across engines to prevent this kind of drift in the future.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 4, 2026 05:39
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 4, 2026 6:53am

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a performance issue in the datasources panel for remote SQL backends (notably Snowflake) by ensuring schema discovery is no longer unconditionally enabled for the Ibis and ADBC engines. It aligns discovery behavior across engines by centralizing the default inference config and the "auto" discovery resolution heuristic, matching the existing intent of the SQLAlchemy engine.

Changes:

  • Switch Ibis and ADBC engines from auto_discover_schemas=True to a shared default_inference_config() where schemas/tables are gated by "auto".
  • Centralize "auto" resolution and cheap-dialect detection via EngineCatalog._resolve_should_auto_discover() / _is_cheap_discovery() and is_cheap_dialect().
  • Add regression tests asserting SQLAlchemy/Ibis/ADBC engines use the shared default inference config (and that schemas default to "auto").

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
marimo/_sql/utils.py Adds is_cheap_dialect() helper to centralize the “cheap discovery” dialect heuristic.
marimo/_sql/engines/types.py Introduces default_inference_config() and centralizes "auto" discovery resolution in the base engine catalog type.
marimo/_sql/engines/sqlalchemy.py Switches SQLAlchemy engine to use default_inference_config() and removes duplicated "auto" resolution logic.
marimo/_sql/engines/ibis.py Fixes Ibis schema discovery default by using default_inference_config() (prevents eager schema scanning on remote backends).
marimo/_sql/engines/adbc.py Fixes ADBC schema discovery default by using default_inference_config(); uses shared cheap-dialect helper.
marimo/_sql/engines/redshift.py Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution.
marimo/_sql/engines/pyiceberg.py Removes duplicated "auto" resolver override; relies on base resolver + _is_cheap_discovery() override.
marimo/_sql/engines/clickhouse.py Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution.
tests/_sql/test_sqlalchemy.py Adds test asserting SQLAlchemy engine uses the shared default inference config.
tests/_sql/test_ibis.py Adds test asserting Ibis engine uses the shared default inference config and schemas default to "auto".
tests/_sql/test_adbc.py Adds test asserting ADBC engine uses the shared default inference config and schemas default to "auto".

Comment thread marimo/_sql/engines/types.py Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 11 files

Architecture diagram
sequenceDiagram
    participant UI as Datasources Panel
    participant Engine as SQL Engine (base)
    participant Adbc as ADBC Engine
    participant Ibis as Ibis Engine
    participant SA as SQLAlchemy Engine
    participant ClickHouse as ClickHouse Engine
    participant Redshift as Redshift Engine
    participant PyIceberg as PyIceberg Engine
    participant Types as types.py
    participant Utils as utils.py
    
    Note over UI,PyIceberg: Schema/Table Discovery Flow
    
    UI->>Engine: request datasource discovery
    Engine->>Engine: inference_config
    Note over Engine: Default: auto_discover_schemas="auto"
    
    alt Engine is ADBC or Ibis or SQLAlchemy
        Engine->>Types: call default_inference_config()
        Types-->>Engine: InferenceConfig with auto_discover_schemas="auto"
    else Engine is ClickHouse or Redshift
        Engine->>Engine: hardcoded override (not using default)
        Note over Engine: _is_cheap_discovery() returns False
    else Engine is PyIceberg
        Engine->>Engine: hardcoded override
        Note over Engine: _is_cheap_discovery() returns True
    end
    
    Engine->>Engine: _resolve_should_auto_discover("auto")
    alt Resolution by dialect heuristic
        Engine->>Utils: is_cheap_dialect(dialect)
        Utils->>Utils: check CHEAP_DISCOVERY_DATABASES set
        alt dialect in cheap set (e.g., sqlite, duckdb)
            Utils-->>Engine: return True
            Engine-->>UI: auto-discover schemas
        else dialect not in cheap set (e.g., snowflake, bigquery)
            Utils-->>Engine: return False
            Engine-->>UI: skip auto-discovery
        end
    else Override by engine policy
        ClickHouse-->>ClickHouse: return False (always expensive)
        Redshift-->>Redshift: return False (opt out)
        PyIceberg-->>PyIceberg: return True (metadata always fast)
    end
    
    Note over UI,PyIceberg: Dialect-based gating unified via base class
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_sql/engines/clickhouse.py
Comment thread marimo/_sql/utils.py
Comment thread marimo/_sql/engines/redshift.py
Comment thread marimo/_sql/engines/types.py Outdated
- Resolve include_tables in ClickHouse server get_databases so "auto" no
  longer scans every database's tables (it stays truthy otherwise).
- Add mariadb to the cheap-dialect allowlist so MariaDB connections still
  auto-discover (SQLAlchemy reports dialect.name == "mariadb").
- Fix grammar in default_inference_config docstring.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@Light2Dark Light2Dark added the enhancement New feature or request label Jun 4, 2026
@Light2Dark Light2Dark requested a review from kirangadhave June 5, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datasources panel slow on Snowflake: auto_discover_schemas should be "auto"

2 participants