Skip to content

Conversation

@rusackas
Copy link
Member

@rusackas rusackas commented Dec 23, 2025

Summary

This PR introduces an automated database documentation system that generates documentation pages from engine spec metadata attributes. Each database engine spec now contains its own documentation metadata, providing a single source of truth.

Key Features

  • Single source of truth: All database documentation lives in each engine spec's metadata attribute (removed 1150+ line DATABASE_DOCS dict from lib.py)
  • Auto-generated pages: Individual MDX pages for each database with connection strings, drivers, auth methods
  • Overview table: Searchable/filterable table showing all databases with scores, features, time grains
  • Metadata linter: Script to track completeness and identify specs needing documentation
  • PostgreSQL-compatible DBs: TimescaleDB, YugabyteDB, Hologres now have their own engine specs
  • Simplified build: Generator script parses engine spec metadata via AST - no Flask context required for CI

Architecture

superset/db_engine_specs/
├── postgres.py          # PostgresEngineSpec with metadata = {...}
├── mysql.py             # MySQLEngineSpec with metadata = {...}
├── timescaledb.py       # NEW: PostgreSQL-compatible stub spec
├── yugabytedb.py        # NEW: PostgreSQL-compatible stub spec
├── hologres.py          # NEW: PostgreSQL-compatible stub spec
├── arc.py               # NEW: Stub spec for Arc
├── d1.py                # NEW: Stub spec for Cloudflare D1
├── lint_metadata.py     # Metadata completeness linter
├── METADATA_STATUS.md   # Auto-generated completeness report
└── README.md            # Updated with "How to Add a Database" guide

How to Add a New Database

  1. Create a new file in superset/db_engine_specs/ (e.g., mydatabase.py)
  2. Add a metadata attribute with required fields:
    from superset.db_engine_specs.base import BaseEngineSpec, DatabaseCategory
    
    class MyDatabaseEngineSpec(BaseEngineSpec):
        engine = "mydatabase"
        engine_name = "My Database"
    
        metadata = {
            "description": "Brief description of the database.",
            "category": DatabaseCategory.TRADITIONAL_RDBMS,
            "pypi_packages": ["my-driver"],
            "connection_string": "mydb://{username}:{password}@{host}:{port}/{database}",
            "logo": "mydb.svg",
            "homepage_url": "https://mydb.example.com/",
            "default_port": 5432,
        }
  3. Run the linter: python superset/db_engine_specs/lint_metadata.py
  4. Add a logo (optional) in docs/static/img/databases/

Changes

New Files:

  • superset/db_engine_specs/arc.py - Arc data platform stub spec
  • superset/db_engine_specs/d1.py - Cloudflare D1 stub spec
  • superset/db_engine_specs/hologres.py - Alibaba Cloud Hologres (PostgreSQL-compatible)
  • superset/db_engine_specs/timescaledb.py - TimescaleDB (PostgreSQL-compatible)
  • superset/db_engine_specs/yugabytedb.py - YugabyteDB (PostgreSQL-compatible)
  • superset/db_engine_specs/lint_metadata.py - Metadata completeness linter
  • superset/db_engine_specs/METADATA_STATUS.md - Auto-generated status report

Modified Files:

  • superset/db_engine_specs/lib.py - Removed DATABASE_DOCS dict (~1150 lines), updated get_documentation_metadata()
  • superset/db_engine_specs/README.md - Added comprehensive "How to Add a Database" guide
  • superset/db_engine_specs/*.py - Added metadata attributes to 60+ engine specs
  • docs/scripts/generate-database-docs.mjs - Simplified to read from engine spec metadata via AST (removed DATABASE_DOCS fallback)

Documentation Build Modes

The generate-database-docs.mjs script supports two modes:

  1. Full mode (with Flask context): Runs diagnose() to get detailed feature scores. Requires Superset installed locally.
  2. AST fallback (CI/Netlify): Parses engine spec files directly to extract metadata attributes. Works without Flask.

Metadata Completeness

Current status (63 engine specs with metadata):

  • All have required fields (description, category, pypi_packages, connection_string)
  • Most have recommended fields (logo, homepage_url, default_port)

Run python superset/db_engine_specs/lint_metadata.py to see the full report.

Screenshots

image

image

image

Test Plan

  1. Run python superset/db_engine_specs/lint_metadata.py to verify metadata extraction
  2. Run cd docs && yarn build to verify documentation generation
  3. Check the generated databases.json for all 63 databases
  4. Verify each database page renders correctly at /docs/databases/<database-name>

@github-actions github-actions bot added doc Namespace | Anything related to documentation and removed size/XXL labels Dec 23, 2025
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from bito-code-review bot Dec 23, 2025
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

1 similar comment
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@netlify
Copy link

netlify bot commented Jan 6, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 5873ead
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/696eb5a0981f24000849ed27
😎 Deploy Preview https://deploy-preview-36805--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

rusackas and others added 13 commits January 8, 2026 10:45
Rebuild the database documentation system so that lib.py is the
single source of truth. The script outputs JSON that React components
consume to render the documentation pages.

Changes:
- Add comprehensive DATABASE_DOCS dictionary to lib.py with 53 databases
- Create generate-database-docs.mjs build script
- Create DatabaseIndex and DatabasePage React components
- Replace 1900 lines of manual markdown with component-based rendering
- Integrate into docs build pipeline (yarn start/build)

To update documentation, just update DATABASE_DOCS in lib.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Run the diagnostic tests with Flask context to get actual feature
scores for each database engine spec.

Top scores:
- Presto: 159/201
- Trino: 149/201
- Apache Hive/Spark: 140/201
- PostgreSQL: 104/201

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add compatible databases (YugabyteDB, TimescaleDB, Hologres) to the
overview table with a link to their parent database's documentation.

Compatible DBs show a "PostgreSQL compatible" tag and inherit feature
scores from their parent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update generate-database-docs.mjs to create individual MDX files
- Each database now has its own page at /docs/configuration/databases/{slug}
- Overview page at /docs/configuration/databases/ with filterable table
- Fix category counts in filter dropdown
- Links in table now point to individual pages
- Use cached databases.json when it has full diagnostic data

Generated 64 database pages + index page.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move Databases section from Configuration to top-level navigation
- Add Databases to Documentation dropdown menu in navbar
- Set "Next" version as default documentation version
- Improve database page layout with larger logos (height: 120)
- Hide duplicate H1 headings via hide_title frontmatter
- Fix diagnostics preservation in fallback mode when Flask context unavailable
- Add logos and homepage URLs to DATABASE_DOCS in lib.py
- Show compatible databases (e.g., YugabyteDB) in overview table
- Dynamically generate front page database grid from databases.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The generate-database-docs script now updates the main README.md
with database logos between marker comments:
- <!-- SUPPORTED_DATABASES_START -->
- <!-- SUPPORTED_DATABASES_END -->

This ensures the README stays in sync with DATABASE_DOCS in lib.py.
Also updated docs links to point to new /docs/databases path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The generate-database-docs script now only updates README.md when
explicitly requested via:
- --update-readme flag
- UPDATE_README=true env var

Added npm script: yarn update:readme-db-logos

This prevents CI from failing due to uncommitted README changes
during docs builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes CodeQL security alert: incomplete string escaping.
Backslashes must be escaped before quotes to prevent
malformed YAML frontmatter in generated MDX files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes CodeQL security warning about shell commands built from
environment values. Now uses spawnSync with:
- cwd option instead of cd in shell command
- env option for environment variables
- arguments passed as array (no shell parsing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fix for CodeQL warning about shell command injection.
Converted extractDatabaseDocs() and extractDatabaseDocsSimple()
to use spawnSync with cwd option instead of execSync with shell
string interpolation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Compatible databases can share the same name across multiple parent
engines. Using only the name as rowKey leads to duplicate React keys.
Fixed by combining parent engine name with database name for compatible
database entries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The language prop was defined but never used since CodeBlock
doesn't implement syntax highlighting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added the github_actions Pull requests that update GitHub Actions code label Jan 18, 2026
Added three new licensing categories to DatabaseCategory:
- OPEN_SOURCE: Self-hosted open source databases (PostgreSQL, MySQL, ClickHouse, etc.)
- HOSTED_OPEN_SOURCE: Managed services running open source software (Aurora, MotherDuck, Databricks)
- PROPRIETARY: Closed source databases (Snowflake, BigQuery, Oracle, etc.)

Updated all 60 database engine specs with appropriate licensing categories.
Also added categories to compatible_databases entries (Aurora MySQL/PostgreSQL,
MotherDuck, IBM Db2 for i) and updated CompatibleDatabase TypedDict to support
the categories field.

This gives users three dimensions to filter databases:
1. Cloud provider (AWS, GCP, Azure)
2. Database type (Analytical, RDBMS, NoSQL, etc.)
3. Licensing (Open Source, Hosted Open Source, Proprietary)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 19, 2026
rusackas and others added 3 commits January 18, 2026 18:13
Updated the docs generation and React component to properly handle
the categories array (instead of singular category):

- generate-database-docs.mjs: Fixed byCategory stats to use docs.categories
  array and map constant names to display names
- DatabaseIndex.tsx: Updated to render multiple category tags per database
  and filter by any matching category
- types.ts: Changed category to categories (array) in TypeScript types
- Regenerated databases.json with correct category mappings

Each database now correctly shows all its categories (database type,
cloud provider, and licensing) as separate tags.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 'categories' to NON_INHERITABLE_FIELDS in the deep_merge function.
This prevents child classes from accumulating parent categories, which was
causing databases like Apache Spark SQL to show duplicate category tags.

Each engine spec class now defines only its own categories without
inheriting from parent classes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The generate-database-docs.mjs script was not adding a trailing newline
when writing databases.json, causing the end-of-file-fixer pre-commit
hook to fail.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

rusackas and others added 8 commits January 19, 2026 14:11
When hovering over the time grain count in the database index table,
users now see a tooltip listing all supported time grains for that
database (e.g., "Second, Minute, Hour, Day, Week, Month, Quarter, Year").

Time grain names are formatted for readability (e.g., FIVE_MINUTES -> "5 min").

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Display time grains as individual tags wrapped in a flex container
instead of a comma-separated string.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 5 new feature indicators to the Features column:
- File Upload (38 DBs) - Can upload CSV/Excel files
- Query Cancel (15 DBs) - Can cancel running queries
- Cost Estimation (7 DBs) - Can estimate query cost before running
- User Impersonation (7 DBs) - Supports user impersonation for RLS
- SQL Validation (2 DBs) - Can validate SQL syntax

All features are filterable in the table header dropdown.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added compatible_databases for cloud-hosted versions of open source databases:

- MotherDuck: Now has its own metadata with motherduck.png logo (was inheriting
  DuckDB's logo)
- StarRocks: Added CelerData (cloud-hosted StarRocks)
- ClickHouse: Added ClickHouse Cloud and Altinity.Cloud
- Trino: Added Starburst Galaxy and Starburst Enterprise
- Elasticsearch: Added Elastic Cloud and Amazon OpenSearch Service

Also deduplicated the logo wall on the docs homepage by filtering out
duplicate logo filenames (fixes duplicate DB2 logos appearing).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Imply is the enterprise/cloud distribution of Apache Druid.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rds smaller

Added unique logos for:
- CelerData (starrocks cloud)
- Starburst (trino cloud/enterprise)
- Altinity (clickhouse managed)
- Imply (druid cloud/enterprise)

Also made the database logo cards smaller on the homepage:
- 8 columns instead of 5
- Smaller card height (80px vs 120px)
- Tighter spacing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added optional `link` prop to SectionHeader component and used it
to link the "Supported Databases" title to /docs/databases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…s" sidebar

Reorganized the sidebar structure so individual database pages are
nested under a collapsible "Supported Databases" section:

Before:
- Databases
  - Overview
  - Amazon Athena
  - Apache Druid
  - ...

After:
- Databases
  - Overview
  - Supported Databases (collapsible)
    - Amazon Athena
    - Apache Druid
    - ...

Updated:
- generate-database-docs.mjs to output MDX to supported/ subdirectory
- DatabaseIndex.tsx links to use /supported/ path
- Homepage database card links to use /supported/ path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 20, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 20, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 20, 2026
@apache apache deleted a comment from codeant-ai-for-open-source bot Jan 20, 2026
@rusackas rusackas merged commit b460ca9 into master Jan 21, 2026
71 checks passed
@rusackas rusackas deleted the feat/db-engine-docs branch January 21, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Namespace | Anything related to documentation github_actions Pull requests that update GitHub Actions code preset-io size/XXL size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants