Skip to content

codesVarun/data-engineering-journey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛠️ Data Engineering Journey 🚀

Welcome to my Data Engineering Journey repository — a living portfolio and documentation of everything I learn and build as a data engineer. This repo is updated daily with new learnings, notes, hands-on projects, and tips across the full data engineering stack.


📅 Daily Logs

All daily learnings are tracked in the daily-logs/ folder, organized by date. Example:

  • 2025-06-08.md – Learned about Kafka architecture and implemented a producer/consumer setup locally.

🗂 Repository Structure

Folder Description
notes/ Conceptual and practical notes categorized by tool (SQL, Spark, Airflow, etc.)
projects/ Real-world data engineering projects with full pipelines, code, and diagrams
tools-and-utilities/ Scripts, utilities, and Jupyter notebooks for exploration
assets/ Diagrams, visuals, and architecture references
resume/ My updated resume as a Data Engineer

🔧 Skills Covered

  • Languages: Python, SQL, Bash
  • ETL & Pipelines: Apache Airflow, dbt
  • Big Data: Apache Spark, Kafka
  • Cloud Platforms: AWS, GCP, Azure
  • Warehousing: Snowflake, Redshift, BigQuery
  • Data Quality: Great Expectations
  • Orchestration & Infra: Docker, CI/CD, Terraform (coming soon)

🌍 Featured Projects

Project Tech Stack Description
ETL Pipeline: COVID-19 API Python, Airflow, PostgreSQL Extract data from public API, transform with Pandas, load into DB
Streaming Pipeline Kafka, Spark, S3 Real-time stream from simulated sensors to a lake
Lakehouse Architecture Delta Lake, Spark Bronze-Silver-Gold layer transformation using Spark
Data Quality Framework Great Expectations Monitor and alert on data anomalies
dbt Analytics dbt, BigQuery Transform and model analytics data with dbt

🧠 Learning Goals

  • Build production-grade, cloud-native pipelines
  • Master streaming and batch processing
  • Learn data modeling and warehouse optimization
  • Implement monitoring, logging, and cost-aware pipelines
  • Share open knowledge with the community 💡

📈 How to Use This Repo

  • Clone or fork it to track your own learning journey
  • Navigate to projects/ to deep dive into hands-on cases
  • Follow along in daily-logs/ or notes/ for study material
  • Feel free to contribute via issues or pull requests

🙌 Let's Connect


📝 License

MIT License. Free to fork, learn, and use.

About

This repo contains everything about data engineering.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published