|
| 1 | +# Data Dictionary & Introspection |
| 2 | + |
| 3 | +SQLSpec provides a unified Data Dictionary API to introspect database schemas across all supported adapters. This allows you to retrieve table metadata, columns, indexes, and foreign keys in a consistent format, regardless of the underlying database engine. |
| 4 | + |
| 5 | +## Core Concepts |
| 6 | + |
| 7 | +The `DataDictionary` is accessed via the `driver.data_dictionary` property. It provides methods to query the database catalog. |
| 8 | + |
| 9 | +### Introspection Capabilities |
| 10 | + |
| 11 | +- **Tables**: List tables in a schema. |
| 12 | +- **Columns**: Get column details (name, type, nullable, default). |
| 13 | +- **Indexes**: Get index definitions (columns, uniqueness). |
| 14 | +- **Foreign Keys**: Get foreign key constraints and relationships. |
| 15 | +- **Topological Sorting**: Get tables sorted by dependency order (useful for cleanups or migrations). |
| 16 | + |
| 17 | +## Usage |
| 18 | + |
| 19 | +### Basic Introspection |
| 20 | + |
| 21 | +```python |
| 22 | +async with config.provide_session() as session: |
| 23 | + # Get all tables in the default schema |
| 24 | + tables = await session.data_dictionary.get_tables(session) |
| 25 | + print(f"Tables: {tables}") |
| 26 | + |
| 27 | + # Get columns for a specific table |
| 28 | + columns = await session.data_dictionary.get_columns(session, "users") |
| 29 | + for col in columns: |
| 30 | + print(f"{col['column_name']}: {col['data_type']}") |
| 31 | +``` |
| 32 | + |
| 33 | +### Topological Sort (Dependency Ordering) |
| 34 | + |
| 35 | +`get_tables` now returns table names sorted such that parent tables appear before child tables (tables with foreign keys to parents). |
| 36 | + |
| 37 | +This is essential for: |
| 38 | + |
| 39 | +- **Data Loading**: Insert into parents first. |
| 40 | +- **Cleanup**: Delete in reverse order to avoid foreign key violations. |
| 41 | + |
| 42 | +```python |
| 43 | +async with config.provide_session() as session: |
| 44 | + # Get tables sorted parent -> child |
| 45 | + sorted_tables = await session.data_dictionary.get_tables(session) |
| 46 | + |
| 47 | + print("Insertion Order:", sorted_tables) |
| 48 | + print("Deletion Order:", list(reversed(sorted_tables))) |
| 49 | +``` |
| 50 | + |
| 51 | +**Implementation Details**: |
| 52 | + |
| 53 | +- **Postgres / SQLite / MySQL 8+**: Uses efficient Recursive CTEs in SQL. |
| 54 | +- **Oracle**: Uses `CONNECT BY` queries. |
| 55 | +- **Others (BigQuery, MySQL 5.7)**: Falls back to a Python-based topological sort using `graphlib`. |
| 56 | + |
| 57 | +### Metadata Types |
| 58 | + |
| 59 | +SQLSpec uses regular classes with __slots__ for metadata results to ensure mypyc compatibility and memory efficiency. |
| 60 | + |
| 61 | +```python |
| 62 | +from sqlspec.driver import ForeignKeyMetadata |
| 63 | + |
| 64 | +async with config.provide_session() as session: |
| 65 | + fks: list[ForeignKeyMetadata] = await session.data_dictionary.get_foreign_keys(session, "orders") |
| 66 | + |
| 67 | + for fk in fks: |
| 68 | + print(f"FK: {fk.column_name} -> {fk.referenced_table}.{fk.referenced_column}") |
| 69 | +``` |
| 70 | + |
| 71 | +## Adapter Support Matrix |
| 72 | + |
| 73 | +| Feature | Postgres | SQLite | Oracle | MySQL | DuckDB | BigQuery | |
| 74 | +|---------|----------|--------|--------|-------|--------|----------| |
| 75 | +| Tables | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 76 | +| Columns | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 77 | +| Indexes | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | |
| 78 | +| Foreign Keys | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 79 | +| Topological Sort | ✅ (CTE) | ✅ (CTE) | ✅ (Connect By) | ✅ (CTE/Python) | ✅ (CTE) | ✅ (Python) | |
| 80 | + |
| 81 | +## API Reference |
| 82 | + |
| 83 | +For a complete API reference of the Data Dictionary components, including `DataDictionaryMixin`, `AsyncDataDictionaryBase`, `SyncDataDictionaryBase`, and the metadata classes (`ForeignKeyMetadata`, `ColumnMetadata`, `IndexMetadata`), please refer to the :doc:`/reference/driver`. |
0 commit comments