You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -156,8 +156,8 @@ All transformations were written in modular `.sql` models, configured via `dbt_p
156
156
157
157
It shows how each model in the pipeline is derived from raw external source tables in BigQuery:
158
158
159
-
-✅**Sources** (`SRC`) like `claims_data_external`, `patient_data_external`, and `ehr_data_external` represent external tables that directly query files stored in Google Cloud Storage
160
-
-✅**Models** (`MDL`) like `high-claim-patients`, `chronic-conditions-summary`, and `health-anomalies` represent transformed tables built using SQL logic in DBT
159
+
-**Sources** (`SRC`) like `claims_data_external`, `patient_data_external`, and `ehr_data_external` represent external tables that directly query files stored in Google Cloud Storage
160
+
-**Models** (`MDL`) like `high-claim-patients`, `chronic-conditions-summary`, and `health-anomalies` represent transformed tables built using SQL logic in DBT
161
161
162
162
163
163
@@ -172,7 +172,7 @@ It shows how each model in the pipeline is derived from raw external source tabl
172
172
To ensure data quality and trust in the pipeline, I implemented column-level tests and added documentation using `schema.yml` files in DBT.
173
173
DBT allows us to define tests and metadata **alongside our models** — all inside YAML. These tests run automatically using `dbt test`.
174
174
175
-
#### ✅ Why I Used `schema.yml`:
175
+
#### Why I Used `schema.yml`:
176
176
177
177
- To enforce data integrity on critical columns (`not_null`, `unique`)
178
178
- To validate raw data coming from external sources
<p align="center"> <img src="./images/github-actions-cd-success.png" alt="CD to Production" width="700"/> </p>
261
261
262
-
✅ Secure deployment:
262
+
Secure deployment:
263
263
264
264
GCP credentials stored as GitHub Secrets
265
265
@@ -273,15 +273,15 @@ gcp-key.json and profiles.yml are generated at runtime (not stored in repo)
273
273
- Practiced writing modular, testable SQL models with automated validations
274
274
- Built an end-to-end pipeline that mirrors real-world engineering workflows
275
275
276
-
## ✅ Conclusion
276
+
## Conclusion
277
277
278
278
This project started as a hands-on learning exercise and became a full-stack, automated data engineering pipeline. I worked with industry-standard tools (GCP, DBT, GitHub Actions), built my own data sources, and pushed transformations all the way to production.
279
279
280
280
It reflects both the technical skills I’ve developed and my drive to learn independently and build real, usable solutions.
281
281
282
282
---
283
283
284
-
## 🙏 Acknowledgements
284
+
## Acknowledgements
285
285
286
286
This project was built by closely following a YouTube tutorial by [DATA TIME](https://www.youtube.com/playlist?list=PLs9W2D7jqlTXbHWkpNUzIC_G8KpLMH6yZ), which covered how to build an end-to-end data pipeline using DBT, BigQuery, and GitHub Actions.
0 commit comments