gate schema discovery behind "auto" for remote engines#9784
Open
Light2Dark wants to merge 2 commits into
Open
Conversation
The datasources panel scanned every catalog/schema on remote backends such as Snowflake because the ibis and adbc engines hardcoded auto_discover_schemas=True. Switch them to "auto" so discovery is gated by the cheap-dialect heuristic, matching the sqlalchemy engine. Also unify the shared default discovery config (default_inference_config) and the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery) across engines to prevent this kind of drift in the future. Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a performance issue in the datasources panel for remote SQL backends (notably Snowflake) by ensuring schema discovery is no longer unconditionally enabled for the Ibis and ADBC engines. It aligns discovery behavior across engines by centralizing the default inference config and the "auto" discovery resolution heuristic, matching the existing intent of the SQLAlchemy engine.
Changes:
- Switch Ibis and ADBC engines from
auto_discover_schemas=Trueto a shareddefault_inference_config()where schemas/tables are gated by"auto". - Centralize
"auto"resolution and cheap-dialect detection viaEngineCatalog._resolve_should_auto_discover()/_is_cheap_discovery()andis_cheap_dialect(). - Add regression tests asserting SQLAlchemy/Ibis/ADBC engines use the shared default inference config (and that schemas default to
"auto").
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
marimo/_sql/utils.py |
Adds is_cheap_dialect() helper to centralize the “cheap discovery” dialect heuristic. |
marimo/_sql/engines/types.py |
Introduces default_inference_config() and centralizes "auto" discovery resolution in the base engine catalog type. |
marimo/_sql/engines/sqlalchemy.py |
Switches SQLAlchemy engine to use default_inference_config() and removes duplicated "auto" resolution logic. |
marimo/_sql/engines/ibis.py |
Fixes Ibis schema discovery default by using default_inference_config() (prevents eager schema scanning on remote backends). |
marimo/_sql/engines/adbc.py |
Fixes ADBC schema discovery default by using default_inference_config(); uses shared cheap-dialect helper. |
marimo/_sql/engines/redshift.py |
Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution. |
marimo/_sql/engines/pyiceberg.py |
Removes duplicated "auto" resolver override; relies on base resolver + _is_cheap_discovery() override. |
marimo/_sql/engines/clickhouse.py |
Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution. |
tests/_sql/test_sqlalchemy.py |
Adds test asserting SQLAlchemy engine uses the shared default inference config. |
tests/_sql/test_ibis.py |
Adds test asserting Ibis engine uses the shared default inference config and schemas default to "auto". |
tests/_sql/test_adbc.py |
Adds test asserting ADBC engine uses the shared default inference config and schemas default to "auto". |
Contributor
There was a problem hiding this comment.
4 issues found across 11 files
Architecture diagram
sequenceDiagram
participant UI as Datasources Panel
participant Engine as SQL Engine (base)
participant Adbc as ADBC Engine
participant Ibis as Ibis Engine
participant SA as SQLAlchemy Engine
participant ClickHouse as ClickHouse Engine
participant Redshift as Redshift Engine
participant PyIceberg as PyIceberg Engine
participant Types as types.py
participant Utils as utils.py
Note over UI,PyIceberg: Schema/Table Discovery Flow
UI->>Engine: request datasource discovery
Engine->>Engine: inference_config
Note over Engine: Default: auto_discover_schemas="auto"
alt Engine is ADBC or Ibis or SQLAlchemy
Engine->>Types: call default_inference_config()
Types-->>Engine: InferenceConfig with auto_discover_schemas="auto"
else Engine is ClickHouse or Redshift
Engine->>Engine: hardcoded override (not using default)
Note over Engine: _is_cheap_discovery() returns False
else Engine is PyIceberg
Engine->>Engine: hardcoded override
Note over Engine: _is_cheap_discovery() returns True
end
Engine->>Engine: _resolve_should_auto_discover("auto")
alt Resolution by dialect heuristic
Engine->>Utils: is_cheap_dialect(dialect)
Utils->>Utils: check CHEAP_DISCOVERY_DATABASES set
alt dialect in cheap set (e.g., sqlite, duckdb)
Utils-->>Engine: return True
Engine-->>UI: auto-discover schemas
else dialect not in cheap set (e.g., snowflake, bigquery)
Utils-->>Engine: return False
Engine-->>UI: skip auto-discovery
end
else Override by engine policy
ClickHouse-->>ClickHouse: return False (always expensive)
Redshift-->>Redshift: return False (opt out)
PyIceberg-->>PyIceberg: return True (metadata always fast)
end
Note over UI,PyIceberg: Dialect-based gating unified via base class
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
- Resolve include_tables in ClickHouse server get_databases so "auto" no longer scans every database's tables (it stays truthy otherwise). - Add mariadb to the cheap-dialect allowlist so MariaDB connections still auto-discover (SQLAlchemy reports dialect.name == "mariadb"). - Fix grammar in default_inference_config docstring. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Summary
Closes #9775
The datasources panel scanned every catalog/schema on remote backends such as Snowflake because the ibis and adbc engines hardcoded auto_discover_schemas=True. Switch them to "auto" so discovery is gated by the cheap-dialect heuristic, matching the sqlalchemy engine.
Also unify the shared default discovery config (default_inference_config) and the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery) across engines to prevent this kind of drift in the future.
📋 Pre-Review Checklist
✅ Merge Checklist