Skip to content

Commit 268aa33

Browse files
authored
Update README.md
1 parent 3ffa500 commit 268aa33

File tree

1 file changed

+10
-22
lines changed

1 file changed

+10
-22
lines changed

README.md

Lines changed: 10 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,7 @@
2121
5. [Data Lineage & Model Flow](#data-lineage--model-flow)
2222
6. [Screenshots & Walkthrough](#screenshots--walkthrough)
2323
7. [Key Learnings](#key-learnings)
24-
8. [Why This Project Matters](#why-this-project-matters)
25-
9. [Conclusion](#conclusion)
26-
10. [Contact Me](#contact-me)
24+
2725

2826
---
2927

@@ -65,6 +63,15 @@ Raw synthetic healthcare data is generated and stored in GCS, externalized into
6563
- Created a new Google Cloud project (`root-matrix-457217-p5`)
6664
- Enabled **BigQuery**, **Cloud Storage**, and **IAM** APIs
6765
- Created service accounts with proper IAM roles (`BigQuery Admin`, `Storage Admin`, etc.)
66+
- - Created a dedicated **service account** in GCP for secure, programmatic access
67+
- Assigned necessary roles:
68+
- `BigQuery Admin`
69+
- `Storage Admin`
70+
- `BigQuery Job User`
71+
- Downloaded the service account's **JSON key**
72+
- Used this key for:
73+
- Local development (`profiles.yml` with `keyfile:` path)
74+
- GitHub Actions (`gcp-key.json` generated dynamically from GitHub Secrets)
6875

6976
<p align="center">
7077
<img src="./images/gcp-project-setup.png" alt="GCP Project Setup" width="700"/>
@@ -100,25 +107,6 @@ To simulate a real-world healthcare data pipeline, I wrote a Python script that:
100107

101108
> 📁 Script location: [`data_generator/synthetic_data_generator.py`](./data_generator/synthetic_data_generator.py)
102109
103-
#### 🔑 Key Logic Overview
104-
105-
```python
106-
# Create the bucket if it doesn't exist
107-
def create_bucket():
108-
bucket = storage_client.bucket(BUCKET_NAME)
109-
if not bucket.exists():
110-
storage_client.create_bucket(BUCKET_NAME)
111-
112-
# Generate synthetic patients
113-
def generate_patients(num_records):
114-
...
115-
return pd.DataFrame(patients)
116-
117-
# Upload CSV, JSON, or Parquet to GCS
118-
def upload_to_gcs(data, path, filename, file_format):
119-
...
120-
```
121-
122110

123111
### 4.4 External Table Creation
124112
- Used BigQuery to create external tables from GCS

0 commit comments

Comments
 (0)