|
21 | 21 | 5. [Data Lineage & Model Flow](#data-lineage--model-flow) |
22 | 22 | 6. [Screenshots & Walkthrough](#screenshots--walkthrough) |
23 | 23 | 7. [Key Learnings](#key-learnings) |
24 | | -8. [Why This Project Matters](#why-this-project-matters) |
25 | | -9. [Conclusion](#conclusion) |
26 | | -10. [Contact Me](#contact-me) |
| 24 | + |
27 | 25 |
|
28 | 26 | --- |
29 | 27 |
|
@@ -65,6 +63,15 @@ Raw synthetic healthcare data is generated and stored in GCS, externalized into |
65 | 63 | - Created a new Google Cloud project (`root-matrix-457217-p5`) |
66 | 64 | - Enabled **BigQuery**, **Cloud Storage**, and **IAM** APIs |
67 | 65 | - Created service accounts with proper IAM roles (`BigQuery Admin`, `Storage Admin`, etc.) |
| 66 | +- - Created a dedicated **service account** in GCP for secure, programmatic access |
| 67 | +- Assigned necessary roles: |
| 68 | + - `BigQuery Admin` |
| 69 | + - `Storage Admin` |
| 70 | + - `BigQuery Job User` |
| 71 | +- Downloaded the service account's **JSON key** |
| 72 | +- Used this key for: |
| 73 | + - Local development (`profiles.yml` with `keyfile:` path) |
| 74 | + - GitHub Actions (`gcp-key.json` generated dynamically from GitHub Secrets) |
68 | 75 |
|
69 | 76 | <p align="center"> |
70 | 77 | <img src="./images/gcp-project-setup.png" alt="GCP Project Setup" width="700"/> |
@@ -100,25 +107,6 @@ To simulate a real-world healthcare data pipeline, I wrote a Python script that: |
100 | 107 |
|
101 | 108 | > 📁 Script location: [`data_generator/synthetic_data_generator.py`](./data_generator/synthetic_data_generator.py) |
102 | 109 |
|
103 | | -#### 🔑 Key Logic Overview |
104 | | - |
105 | | -```python |
106 | | -# Create the bucket if it doesn't exist |
107 | | -def create_bucket(): |
108 | | - bucket = storage_client.bucket(BUCKET_NAME) |
109 | | - if not bucket.exists(): |
110 | | - storage_client.create_bucket(BUCKET_NAME) |
111 | | - |
112 | | -# Generate synthetic patients |
113 | | -def generate_patients(num_records): |
114 | | - ... |
115 | | - return pd.DataFrame(patients) |
116 | | - |
117 | | -# Upload CSV, JSON, or Parquet to GCS |
118 | | -def upload_to_gcs(data, path, filename, file_format): |
119 | | - ... |
120 | | -``` |
121 | | - |
122 | 110 |
|
123 | 111 | ### 4.4 External Table Creation |
124 | 112 | - Used BigQuery to create external tables from GCS |
|
0 commit comments