Welcome to my Data Engineering Journey repository — a living portfolio and documentation of everything I learn and build as a data engineer. This repo is updated daily with new learnings, notes, hands-on projects, and tips across the full data engineering stack.
All daily learnings are tracked in the daily-logs/ folder, organized by date. Example:
- 2025-06-08.md – Learned about Kafka architecture and implemented a producer/consumer setup locally.
| Folder | Description |
|---|---|
notes/ |
Conceptual and practical notes categorized by tool (SQL, Spark, Airflow, etc.) |
projects/ |
Real-world data engineering projects with full pipelines, code, and diagrams |
tools-and-utilities/ |
Scripts, utilities, and Jupyter notebooks for exploration |
assets/ |
Diagrams, visuals, and architecture references |
resume/ |
My updated resume as a Data Engineer |
- Languages: Python, SQL, Bash
- ETL & Pipelines: Apache Airflow, dbt
- Big Data: Apache Spark, Kafka
- Cloud Platforms: AWS, GCP, Azure
- Warehousing: Snowflake, Redshift, BigQuery
- Data Quality: Great Expectations
- Orchestration & Infra: Docker, CI/CD, Terraform (coming soon)
| Project | Tech Stack | Description |
|---|---|---|
| ETL Pipeline: COVID-19 API | Python, Airflow, PostgreSQL | Extract data from public API, transform with Pandas, load into DB |
| Streaming Pipeline | Kafka, Spark, S3 | Real-time stream from simulated sensors to a lake |
| Lakehouse Architecture | Delta Lake, Spark | Bronze-Silver-Gold layer transformation using Spark |
| Data Quality Framework | Great Expectations | Monitor and alert on data anomalies |
| dbt Analytics | dbt, BigQuery | Transform and model analytics data with dbt |
- Build production-grade, cloud-native pipelines
- Master streaming and batch processing
- Learn data modeling and warehouse optimization
- Implement monitoring, logging, and cost-aware pipelines
- Share open knowledge with the community 💡
- Clone or fork it to track your own learning journey
- Navigate to
projects/to deep dive into hands-on cases - Follow along in
daily-logs/ornotes/for study material - Feel free to contribute via issues or pull requests
MIT License. Free to fork, learn, and use.