A modern 3-layer data warehouse implementation using SQL Server, following the Bronze-Silver-Gold (medallion) architecture pattern. This project demonstrates ETL processes, data quality checks, and dimensional modeling for analytics and reporting.
This data warehouse follows the medallion architecture with three distinct layers:
- π₯ Bronze Layer: Raw data ingestion from source systems
- π₯ Silver Layer: Cleaned and transformed data with business logic applied
- π₯ Gold Layer: Aggregated, business-ready data optimized for analytics
sql-warehouse/
βββ datasets/
β βββ source_crm/ # CRM system data files
β β βββ cust_info.csv
β β βββ prd_info.csv
β β βββ sales_details.csv
β βββ source_erp/ # ERP system data files
β βββ CUST_AZ12.csv
β βββ LOC_A101.csv
β βββ PX_CAT_G1V2.csv
βββ scripts/
β βββ init_database.sql # Database and schema initialization
β βββ bronze/ # Bronze layer scripts
β β βββ ddl_bronze.sql
β β βββ proc_load_bronze.sql
β βββ silver/ # Silver layer scripts
β β βββ ddl_silver.sql
β β βββ proc_load_silver.sql
β βββ gold/ # Gold layer scripts
β βββ ddl_gold.sql
βββ tests/ # Data quality validation scripts
β βββ quality_checks_silver.sql
β βββ quality_checks_gold.sql
βββ docs/
β βββ dataintegration.jpg # Architecture diagram
βββ README.md
- Customer Information: Demographics and account details
- Product Information: Product catalog with pricing and categories
- Sales Details: Transaction records with order information
- Customer Demographics: Extended customer data (birthdate, gender)
- Location Data: Customer geographic information
- Product Categories: Product classification and maintenance data
- Direct ingestion from CSV files using
BULK INSERT
- Minimal transformation, preserving source data structure
- Tables:
bronze.crm_*
andbronze.erp_*
- Data standardization and normalization
- Business logic application
- Data quality improvements:
- Duplicate removal
- Null value handling
- Date format standardization
- Value mapping (codes to descriptions)
- Star schema implementation with:
- Dimension Tables:
dim_customers
,dim_products
- Fact Tables:
fact_sales
- Dimension Tables:
- Optimized for reporting and analytics
- Views combining data from multiple silver tables
- SQL Server (2016 or later)
- SQL Server Management Studio (SSMS) or Azure Data Studio
- Appropriate file system permissions for BULK INSERT operations
-
Clone the repository
git clone <repository-url> cd sql-warehouse
-
Initialize the database
-- Run in SQL Server Management Studio :r scripts/init_database.sql
-
Create Bronze layer tables
:r scripts/bronze/ddl_bronze.sql
-
Create Silver layer tables
:r scripts/silver/ddl_silver.sql
-
Create Gold layer views
:r scripts/gold/ddl_gold.sql
-
Load Bronze layer
EXEC bronze.load_bronze;
-
Load Silver layer
EXEC silver.load_silver;
-
Query Gold layer (Views are automatically populated)
SELECT * FROM gold.dim_customers; SELECT * FROM gold.dim_products; SELECT * FROM gold.fact_sales;
The project includes comprehensive data quality checks:
- Primary key integrity validation
- Data standardization verification
- Date range and order validation
- Business rule consistency checks
-- Silver layer quality checks
:r tests/quality_checks_silver.sql
-- Gold layer quality checks
:r tests/quality_checks_gold.sql
-- Customer demographics overview
SELECT
country,
gender,
marital_status,
COUNT(*) as customer_count
FROM gold.dim_customers
GROUP BY country, gender, marital_status;
-- Monthly sales performance
SELECT
YEAR(order_date) as year,
MONTH(order_date) as month,
SUM(sales_amount) as total_sales,
COUNT(DISTINCT order_number) as order_count
FROM gold.fact_sales
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
-- Top-selling product categories
SELECT
p.category,
p.subcategory,
SUM(f.sales_amount) as total_sales,
SUM(f.quantity) as total_quantity
FROM gold.fact_sales f
JOIN gold.dim_products p ON f.product_key = p.product_key
GROUP BY p.category, p.subcategory
ORDER BY total_sales DESC;
- File Paths: Update CSV file paths in
proc_load_bronze.sql
to match your environment - Backup Strategy: The
init_database.sql
script drops and recreates the entire database - Performance: Consider indexing strategies for production environments
- Security: Implement appropriate access controls and data encryption
- Create new tables in the Bronze layer
- Add transformation logic in Silver layer procedures
- Update Gold layer views to include new data
- Add corresponding quality checks
- Update transformation logic in
scripts/silver/proc_load_silver.sql
- Adjust data mappings and standardization rules
- Update quality checks accordingly
- Implement incremental data loading
- Add automated data quality monitoring
- Create data lineage documentation
- Implement CI/CD pipeline for deployments
- Add performance optimization (indexes, partitioning)
- Create business intelligence dashboards
- Implement data retention policies
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or issues, please:
- Open an issue in the GitHub repository
- Review the data quality check results
- Check SQL Server error logs for detailed error messages
Built with β€οΈ for modern data warehousing