Skip to content

Commit c7b9bdb

Browse files
authored
feat: topological sorting and foreign key retrieval enhancements (#260)
Introduce methods for topological sorting of tables and retrieval of foreign key metadata across multiple database adapters. Update metadata class patterns and improve test isolation in examples. Add integration tests to validate new functionalities.
1 parent 93c6627 commit c7b9bdb

File tree

23 files changed

+1916
-79
lines changed

23 files changed

+1916
-79
lines changed

AGENTS.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -451,6 +451,58 @@ if supports_where(obj):
451451
result = obj.where("condition")
452452
```
453453

454+
### Mypyc-Compatible Metadata Class Pattern
455+
456+
When defining data-holding classes intended for core modules (`sqlspec/core/`, `sqlspec/driver/`) that will be compiled with MyPyC, use regular classes with `__slots__` and explicitly implement `__init__`, `__repr__`, `__eq__`, and `__hash__`. This approach ensures optimal performance and MyPyC compatibility, as `dataclasses` are not directly supported by MyPyC for compilation.
457+
458+
**Key Principles:**
459+
460+
- **`__slots__`**: Reduces memory footprint and speeds up attribute access.
461+
- **Explicit `__init__`**: Defines the constructor for the class.
462+
- **Explicit `__repr__`**: Provides a clear string representation for debugging.
463+
- **Explicit `__eq__`**: Enables correct equality comparisons.
464+
- **Explicit `__hash__`**: Makes instances hashable, allowing them to be used in sets or as dictionary keys. The hash implementation should be based on all fields that define the object's identity.
465+
466+
**Example Implementation:**
467+
468+
```python
469+
class MyMetadata:
470+
__slots__ = ("field1", "field2", "optional_field")
471+
472+
def __init__(self, field1: str, field2: int, optional_field: str | None = None) -> None:
473+
self.field1 = field1
474+
self.field2 = field2
475+
self.optional_field = optional_field
476+
477+
def __repr__(self) -> str:
478+
return f"MyMetadata(field1={self.field1!r}, field2={self.field2!r}, optional_field={self.optional_field!r})"
479+
480+
def __eq__(self, other: object) -> bool:
481+
if not isinstance(other, MyMetadata):
482+
return NotImplemented
483+
return (
484+
self.field1 == other.field1
485+
and self.field2 == other.field2
486+
and self.optional_field == other.optional_field
487+
)
488+
489+
def __hash__(self) -> int:
490+
return hash((self.field1, self.field2, self.optional_field))
491+
```
492+
493+
**When to Use:**
494+
495+
- For all new data-holding classes in performance-critical paths (e.g., `sqlspec/driver/_common.py`).
496+
- When MyPyC compilation is enabled for the module containing the class.
497+
498+
**Anti-Patterns to Avoid:**
499+
500+
- Using `@dataclass` decorators for classes intended for MyPyC compilation.
501+
- Omitting `__slots__` when defining performance-critical data structures.
502+
- Relying on default `__eq__` or `__hash__` behavior for complex objects, especially for equality comparisons in collections.
503+
504+
---
505+
454506
### Performance Patterns (MANDATORY)
455507

456508
**PERF401 - List Operations**:

docs/examples/usage/usage_drivers_and_querying_10.py

Lines changed: 50 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,30 +12,57 @@
1212

1313
def test_example_10_duckdb_config(tmp_path: Path) -> None:
1414
# start-example
15+
import tempfile
16+
1517
from sqlspec import SQLSpec
1618
from sqlspec.adapters.duckdb import DuckDBConfig
1719

18-
spec = SQLSpec()
19-
# In-memory
20-
config = DuckDBConfig()
21-
22-
# Persistent
23-
database_file = tmp_path / "analytics.duckdb"
24-
config = DuckDBConfig(pool_config={"database": database_file.name, "read_only": False})
25-
26-
with spec.provide_session(config) as session:
27-
# Create table from Parquet
28-
session.execute(f"""
29-
CREATE TABLE if not exists users AS
30-
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
31-
""")
32-
33-
# Analytical query
34-
session.execute("""
35-
SELECT date_trunc('day', created_at) as day,
36-
count(*) as user_count
37-
FROM users
38-
GROUP BY day
39-
ORDER BY day
40-
""")
20+
# Use a temporary directory for the DuckDB database for test isolation
21+
with tempfile.TemporaryDirectory() as tmpdir:
22+
db_path = Path(tmpdir) / "analytics.duckdb"
23+
24+
spec = SQLSpec()
25+
# In-memory
26+
in_memory_db = spec.add_config(DuckDBConfig())
27+
persistent_db = spec.add_config(DuckDBConfig(pool_config={"database": str(db_path)}))
28+
29+
try:
30+
# Test with in-memory config
31+
with spec.provide_session(in_memory_db) as session:
32+
# Create table from Parquet
33+
session.execute(f"""
34+
CREATE TABLE if not exists users AS
35+
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
36+
""")
37+
38+
# Analytical query
39+
session.execute("""
40+
SELECT date_trunc('day', created_at) as day,
41+
count(*) as user_count
42+
FROM users
43+
GROUP BY day
44+
ORDER BY day
45+
""")
46+
47+
# Test with persistent config
48+
with spec.provide_session(persistent_db) as session:
49+
# Create table from Parquet
50+
session.execute(f"""
51+
CREATE TABLE if not exists users AS
52+
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
53+
""")
54+
55+
# Analytical query
56+
session.execute("""
57+
SELECT date_trunc('day', created_at) as day,
58+
count(*) as user_count
59+
FROM users
60+
GROUP BY day
61+
ORDER BY day
62+
""")
63+
finally:
64+
# Close the pool for the persistent config
65+
spec.get_config(in_memory_db).close_pool()
66+
spec.get_config(persistent_db).close_pool()
67+
# The TemporaryDirectory context manager handles directory cleanup automatically
4168
# end-example
Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Test module converted from docs example - code-block 6
22
"""Minimal smoke test for drivers_and_querying example 6."""
33

4+
import tempfile
45
from pathlib import Path
56

67
from sqlspec import SQLSpec
@@ -12,24 +13,34 @@ def test_example_6_sqlite_config(tmp_path: Path) -> None:
1213
# start-example
1314
from sqlspec.adapters.sqlite import SqliteConfig
1415

15-
spec = SQLSpec()
16-
17-
database_file = tmp_path / "myapp.db"
18-
config = SqliteConfig(pool_config={"database": database_file.name, "timeout": 5.0, "check_same_thread": False})
19-
20-
with spec.provide_session(config) as session:
21-
# Create table
22-
session.execute("""
23-
CREATE TABLE IF NOT EXISTS usage6_users (
24-
id INTEGER PRIMARY KEY,
25-
name TEXT NOT NULL
26-
)
27-
""")
28-
29-
# Insert with parameters
30-
session.execute("INSERT INTO usage6_users (name) VALUES (?)", "Alice")
31-
32-
# Query
33-
result = session.execute("SELECT * FROM usage6_users")
34-
result.all()
35-
# end-example
16+
# Use a temporary file for the SQLite database for test isolation
17+
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_db_file:
18+
db_path = tmp_db_file.name
19+
20+
spec = SQLSpec()
21+
22+
db = spec.add_config(
23+
SqliteConfig(pool_config={"database": db_path, "timeout": 5.0, "check_same_thread": False})
24+
)
25+
26+
try:
27+
with spec.provide_session(db) as session:
28+
# Create table
29+
session.execute("""
30+
CREATE TABLE IF NOT EXISTS usage6_users (
31+
id INTEGER PRIMARY KEY,
32+
name TEXT NOT NULL
33+
)
34+
""")
35+
36+
# Insert with parameters
37+
session.execute("INSERT INTO usage6_users (name) VALUES (?)", "Alice")
38+
39+
# Query
40+
result = session.execute("SELECT * FROM usage6_users")
41+
result.all()
42+
finally:
43+
# Clean up the temporary database file
44+
spec.get_config(db).close_pool()
45+
Path(db_path).unlink()
46+
# end-example

docs/extensions/aiosql/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ AiosqlAsyncAdapter
2424
:members:
2525
:undoc-members:
2626
:show-inheritance:
27+
:no-index:
2728

2829
AiosqlSyncAdapter
2930
-----------------
@@ -32,6 +33,7 @@ AiosqlSyncAdapter
3233
:members:
3334
:undoc-members:
3435
:show-inheritance:
36+
:no-index:
3537

3638
Query Operators
3739
===============

docs/extensions/litestar/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ SQLSpecPlugin
1111
:members:
1212
:undoc-members:
1313
:show-inheritance:
14+
:no-index:
1415

1516
Configuration
1617
=============
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Data Dictionary & Introspection
2+
3+
SQLSpec provides a unified Data Dictionary API to introspect database schemas across all supported adapters. This allows you to retrieve table metadata, columns, indexes, and foreign keys in a consistent format, regardless of the underlying database engine.
4+
5+
## Core Concepts
6+
7+
The `DataDictionary` is accessed via the `driver.data_dictionary` property. It provides methods to query the database catalog.
8+
9+
### Introspection Capabilities
10+
11+
- **Tables**: List tables in a schema.
12+
- **Columns**: Get column details (name, type, nullable, default).
13+
- **Indexes**: Get index definitions (columns, uniqueness).
14+
- **Foreign Keys**: Get foreign key constraints and relationships.
15+
- **Topological Sorting**: Get tables sorted by dependency order (useful for cleanups or migrations).
16+
17+
## Usage
18+
19+
### Basic Introspection
20+
21+
```python
22+
async with config.provide_session() as session:
23+
# Get all tables in the default schema
24+
tables = await session.data_dictionary.get_tables(session)
25+
print(f"Tables: {tables}")
26+
27+
# Get columns for a specific table
28+
columns = await session.data_dictionary.get_columns(session, "users")
29+
for col in columns:
30+
print(f"{col['column_name']}: {col['data_type']}")
31+
```
32+
33+
### Topological Sort (Dependency Ordering)
34+
35+
`get_tables` now returns table names sorted such that parent tables appear before child tables (tables with foreign keys to parents).
36+
37+
This is essential for:
38+
39+
- **Data Loading**: Insert into parents first.
40+
- **Cleanup**: Delete in reverse order to avoid foreign key violations.
41+
42+
```python
43+
async with config.provide_session() as session:
44+
# Get tables sorted parent -> child
45+
sorted_tables = await session.data_dictionary.get_tables(session)
46+
47+
print("Insertion Order:", sorted_tables)
48+
print("Deletion Order:", list(reversed(sorted_tables)))
49+
```
50+
51+
**Implementation Details**:
52+
53+
- **Postgres / SQLite / MySQL 8+**: Uses efficient Recursive CTEs in SQL.
54+
- **Oracle**: Uses `CONNECT BY` queries.
55+
- **Others (BigQuery, MySQL 5.7)**: Falls back to a Python-based topological sort using `graphlib`.
56+
57+
### Metadata Types
58+
59+
SQLSpec uses regular classes with __slots__ for metadata results to ensure mypyc compatibility and memory efficiency.
60+
61+
```python
62+
from sqlspec.driver import ForeignKeyMetadata
63+
64+
async with config.provide_session() as session:
65+
fks: list[ForeignKeyMetadata] = await session.data_dictionary.get_foreign_keys(session, "orders")
66+
67+
for fk in fks:
68+
print(f"FK: {fk.column_name} -> {fk.referenced_table}.{fk.referenced_column}")
69+
```
70+
71+
## Adapter Support Matrix
72+
73+
| Feature | Postgres | SQLite | Oracle | MySQL | DuckDB | BigQuery |
74+
|---------|----------|--------|--------|-------|--------|----------|
75+
| Tables |||||||
76+
| Columns |||||||
77+
| Indexes |||||||
78+
| Foreign Keys |||||||
79+
| Topological Sort | ✅ (CTE) | ✅ (CTE) | ✅ (Connect By) | ✅ (CTE/Python) | ✅ (CTE) | ✅ (Python) |
80+
81+
## API Reference
82+
83+
For a complete API reference of the Data Dictionary components, including `DataDictionaryMixin`, `AsyncDataDictionaryBase`, `SyncDataDictionaryBase`, and the metadata classes (`ForeignKeyMetadata`, `ColumnMetadata`, `IndexMetadata`), please refer to the :doc:`/reference/driver`.

docs/reference/driver.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,43 @@ Connection Pooling
103103
:undoc-members:
104104
:show-inheritance:
105105

106+
Data Dictionary
107+
===============
108+
109+
The Data Dictionary API provides standardized introspection capabilities across all supported databases.
110+
111+
.. currentmodule:: sqlspec.driver
112+
113+
.. autoclass:: DataDictionaryMixin
114+
:members:
115+
:undoc-members:
116+
:show-inheritance:
117+
118+
.. autoclass:: AsyncDataDictionaryBase
119+
:members:
120+
:undoc-members:
121+
:show-inheritance:
122+
123+
.. autoclass:: SyncDataDictionaryBase
124+
:members:
125+
:undoc-members:
126+
:show-inheritance:
127+
128+
.. autoclass:: ForeignKeyMetadata
129+
:members:
130+
:undoc-members:
131+
:show-inheritance:
132+
133+
.. autoclass:: ColumnMetadata
134+
:members:
135+
:undoc-members:
136+
:show-inheritance:
137+
138+
.. autoclass:: IndexMetadata
139+
:members:
140+
:undoc-members:
141+
:show-inheritance:
142+
106143
Driver Protocols
107144
================
108145

docs/usage/drivers_and_querying.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,7 @@ Performance Tips
471471
:start-after: # start-example
472472
:end-before: # end-example
473473
:caption: ``asyncpg connection pooling``
474-
:dedent: 4
474+
:dedent: 2
475475

476476
**2. Batch Operations**
477477

0 commit comments

Comments
 (0)