A sophisticated Python utility for bridging SQL database schemas with modern data analysis and optimization capabilities. DataBridge provides a fluent API for schema discovery, relationship mapping, query optimization, and multi-format data exports.
- π Quick Start Guide - Get up and running in minutes
- π οΈ Installation Guide - Detailed installation and setup
- βοΈ Configuration Guide - System configuration
- ποΈ Database Setup - Setup SQL Server with Docker
- π User Guide - Complete user manual with examples
- π API Reference - Comprehensive API documentation
- π― Filtering Guide - Advanced filtering capabilities
- ποΈ Architecture Guide - Technical architecture and design
- π Examples Gallery - Working code examples and demonstrations
- Fluent Schema Discovery: Intuitive API for extracting and analyzing database schemas
- Advanced Filtering: Pattern-based table filtering with regex support
- Relationship Mapping: Multi-source relationship analysis (database FKs, CSV, XML)
- Query Optimization: Intelligent SQL generation with index analysis
- Multi-Format Export: YAML, XML, JSON schema serialization
- Index Analysis: Performance optimization recommendations
- DAG Visualization: Network graph generation for relationship analysis
- Modular Architecture: Layered design for extensibility and testing
- Connection Management: Robust database connection handling (pyodbc)
- XML Processing: Advanced XML parsing for metadata and relationships
- Network Analysis: Graph-based relationship analysis using NetworkX
- Comprehensive Logging: Detailed operation tracking and debugging
- Configuration Management: YAML-based configuration system
databridge/
βββ img/ # Project assets and logos
βββ src/
β βββ database/ # Core database operations
β β βββ datamodel_service.py # Main DataBridge orchestration
β β βββ schema_extractor.py # Schema discovery engine
β β βββ relationship_manager.py # Relationship mapping
β β βββ query_builder.py # Optimized query generation
β β βββ schema_serializer.py # Multi-format export
β βββ xml/ # XML processing and parsing
β βββ utils/ # Utilities (logging, config)
β βββ tests/ # Comprehensive unit tests
β βββ main.py # Demonstration entry point
βββ data/ # Example data and configurations
βββ requirements.txt # Python dependencies
βββ config.yaml # Configuration file
-
Clone the repository:
git clone https://github.com/yourusername/databridge.git cd databridge
-
Install dependencies:
pip install -r requirements.txt
-
Configure database connection in
config.yaml
:source_database: connection_string: "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=pocdb;UID=sa;PWD=DataBridge2025!"
-
Run the demonstration:
python src/main.py
- Python 3.8+
- pyodbc: Database connectivity
- networkx: Relationship analysis and graph visualization
- matplotlib: Graph plotting and visualization
from src.database.datamodel_service import DataBridge
import pyodbc
# Initialize connection
db_conn = pyodbc.connect(connection_string)
bridge = DataBridge(db_conn)
# Discover schema with fluent API
schema = (bridge.discover_schema()
.from_database()
.only_tables(['users', 'orders', 'products'])
.with_relationships_from_csv('data/relationships.csv')
.build())
# Generate optimized queries
query = (bridge.generate_query()
.select_all()
.where({'users': ['user_id']})
.with_joins()
.build())
# Export to multiple formats
bridge.export_schema().to_yaml('output/schema.yaml')
bridge.export_schema().to_json('output/schema.json')
print(f"Discovered {len(schema.tables)} tables")
print(f"Generated query: {query['query']}")
-
Clone the repository:
git clone https://github.com/yourusername/databridge.git cd databridge
-
Install dependencies:
pip install -r requirements.txt
-
Configure database connection in
config.yaml
:source_database: connection_string: "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=pocdb;UID=sa;PWD=DataBridge2025!"
-
Run the demonstration:
python src/main.py
For detailed installation instructions, see the Installation Guide.
# Complex schema discovery with filtering
schema = (bridge.discover_schema()
.from_database()
.excluding_pattern([r'^sys.*', r'^temp.*'])
.with_relationships_from_database()
.with_relationships_from_csv('data/custom_relationships.csv')
.build())
# Generate relationship DAG
dag_info = bridge.relationship_manager.create_dag(schema.tables)
bridge.relationship_manager.visualize_dag(dag_info['dag'], 'output/schema_graph.png')
# Generate optimized queries with index analysis
result = (bridge.generate_query()
.select_all()
.where({'users': ['user_id', 'email']})
.with_joins()
.optimize_with_indexes()
.build())
print("Generated SQL:", result['query'])
print("Index Recommendations:", result['index_recommendations'])
# Export schema to multiple formats
bridge.export_schema().to_yaml('output/schema.yaml')
bridge.export_schema().to_xml('output/schema.xml')
bridge.export_schema().to_json('output/schema.json')
DataBridge supports multiple relationship data sources:
# CSV relationships
bridge.discover_schema().with_relationships_from_csv('data/relationships.csv')
# XML relationships
bridge.discover_schema().with_relationships_from_xml('data/relationships.xml')
# Database foreign keys
bridge.discover_schema().with_relationships_from_database()
# Combine multiple sources
schema = (bridge.discover_schema()
.from_database()
.with_relationships_from_database()
.with_relationships_from_csv('data/custom_relationships.csv')
.build())
# Analyze existing indexes
index_analysis = bridge.query_builder.analyze_indexes(['users', 'orders'])
# Generate optimization recommendations
recommendations = bridge.query_builder.suggest_index_optimizations(
tables=['users', 'orders'],
query_patterns=['user_id', 'order_date', 'status']
)
print("Index Recommendations:")
for rec in recommendations:
print(f" {rec['table']}: {rec['recommendation']}")
# Create and visualize relationship DAG
dag_info = bridge.relationship_manager.create_dag(schema.tables)
# Generate network graph
bridge.relationship_manager.visualize_dag(
dag_info['dag'],
'output/relationships.png',
layout='spring',
node_size=1000,
font_size=8
)
print(f"DAG has {dag_info['node_count']} nodes and {dag_info['edge_count']} edges")
DataBridge supports multiple export formats for maximum compatibility:
tables:
users:
columns:
user_id:
type: int
nullable: false
primary_key: true
email:
type: varchar
max_length: 255
nullable: false
indexes:
- name: idx_users_email
type: NONCLUSTERED
columns: [email]
relationships:
- target_table: orders
relationship_type: one-to-many
foreign_key: user_id
<schema>
<table name="users">
<columns>
<column name="user_id" type="int" nullable="false" primary_key="true"/>
<column name="email" type="varchar" max_length="255" nullable="false"/>
</columns>
<relationships>
<relationship target="orders" type="one-to-many" key="user_id"/>
</relationships>
</table>
</schema>
table,parent,relation,parent_column,child_column
orders,users,many-to-one,user_id,user_id
order_items,orders,many-to-one,order_id,order_id
order_items,products,many-to-one,product_id,product_id
<relationships>
<relationship>
<child_table>orders</child_table>
<parent_table>users</parent_table>
<relationship_type>many-to-one</relationship_type>
<parent_column>user_id</parent_column>
<child_column>user_id</child_column>
</relationship>
</relationships>
Run the comprehensive test suite:
# Run all tests
python -m pytest src/tests/
# Run specific test modules
python -m pytest src/tests/test_schema_extractor.py
python -m pytest src/tests/test_relationship_manager.py
# Run with coverage
python -m pytest src/tests/ --cov=src/
Customize DataBridge behavior via config.yaml
:
# Database connections
source_database:
connection_string: "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=pocdb;UID=sa;PWD=DataBridge2025!"
# Logging configuration
logging:
level: INFO
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# Schema discovery settings
schema_discovery:
include_system_tables: false
max_table_count: 1000
relationship_sources:
- database_foreign_keys
- csv_files
# Export settings
export:
default_format: yaml
include_indexes: true
include_relationships: true
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- pyodbc: Database connectivity foundation
- NetworkX: Graph analysis capabilities
- matplotlib: Visualization support
- SQL Server community for database best practices
DataBridge - Bridging the gap between database schemas and modern data analysis.