A data engineering project focused on building a data warehouse using SQL Server. This project demonstrates the practical implementation of the Medallion Architecture to extract, clean, and model raw data into a structured format ready for analytics.
This project involves:
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
- Data Modeling: Developing fact and dimension tables optimized for analytical queries.
- Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights.
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
- Data Quality: Cleanse and resolve data quality issues prior to analysis.
- Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
- Scope: Focus on the latest dataset only; historization of data is not required.
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.
The data architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers:
- Bronze Layer: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
- Silver Layer: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
- Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics.
This data flow diagram illustrates how records move and transform through the project's Medallion Architecture. It maps the journey of legacy CRM and ERP data from raw ingestion (Bronze), through data cleansing and type standardization (Silver), to final business-level aggregations and Star Schema modeling (Gold).
This diagram illustrates the logical relationships and shared keys between tables in the source systems (CRM and ERP) that will be integrated into the Bronz layer. Identifying these connections is essential for designing robust ingestion pipelines and ensuring data integrity during transformation.
The Gold layer of this SQL Enterprise Data Warehouse is modeled using a standard Star Schema, optimized for fast aggregations and BI reporting.
- Database: MS SQL Server
- Skills: T-SQL, ETL / ELT Development, Data Cleansing, Stored Procedures
- Data Modeling: Star Schema, Dimensional Modeling
- Architecture: Medallion Architecture (Bronze, Silver, Gold)
sql-enterprise-data-warehouse/
│
├── datasets/ # Raw datasets used for the project (ERP and CRM data)
│
├── docs/ # Project documentation and architecture details
│ ├── etl.drawio # Draw.io file shows all different techniquies and methods of ETL
│ ├── data_architecture.drawio.svg # Draw.io file shows the project's architecture
│ ├── data_catalog.md # Catalog of datasets, including field descriptions and metadata
│ ├── data_flow.drawio # Draw.io file for the data flow diagram
│ ├── data_models.drawio # Draw.io file for data models (star schema)
│ ├── naming-conventions.md # Consistent naming guidelines for tables, columns, and files
│
├── scripts/ # SQL scripts for ETL and transformations
│ ├── bronze/ # Scripts for extracting and loading raw data
│ ├── silver/ # Scripts for cleaning and transforming data
│ ├── gold/ # Scripts for creating analytical models
│
├── tests/ # Test scripts and quality files
│
├── README.md # Project overview and instructions
├── LICENSE # License information for the repository
├── .gitignore # Files and directories to be ignored by Git
└── requirements.txt # Dependencies and requirements for the project
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
Hi there! I am Harvie Lorenz C. Babuyo, a 3rd-year B.S. in Computer Science student at the University of Science and Technology of Southern Philippines - Cagayan de Oro.
Let's connect: