Skip to content

Commit ef39076

Browse files
Fix Markdown syntax issues
1 parent defbd5a commit ef39076

File tree

1 file changed

+19
-15
lines changed
  • Workloads-Specific/DataWarehouse/Medallion_Architecture

1 file changed

+19
-15
lines changed

Workloads-Specific/DataWarehouse/Medallion_Architecture/README.md

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,12 @@ Last updated: 2025-05-03
2323

2424
- [Overview](#overview)
2525
- [Demo](#demo)
26-
- [Step 1: Set Up Your Environment](#step-1-set-up-your-environment)
27-
- [Step 2: Ingest Data into the Bronze Layer](#step-2-ingest-data-into-the-bronze-layer)
28-
- [Step 3: Transform Data in the Silver Layer](#step-3-transform-data-in-the-silver-layer)
29-
- [Step 4: Curate Data in the Gold Layer](#step-4-curate-data-in-the-gold-layer)
30-
- [Step 5: Set Up Pipelines for Orchestration](#step-5-set-up-pipelines-for-orchestration)
31-
- [Step 6: Enable Data Access for Reporting](#step-6-enable-data-access-for-reporting)
32-
26+
- [Step 1: Set Up Your Environment](#step-1-set-up-your-environment)
27+
- [Step 2: Ingest Data into the Bronze Layer](#step-2-ingest-data-into-the-bronze-layer)
28+
- [Step 3: Transform Data in the Silver Layer](#step-3-transform-data-in-the-silver-layer)
29+
- [Step 4: Curate Data in the Gold Layer](#step-4-curate-data-in-the-gold-layer)
30+
- [Step 5: Set Up Pipelines for Orchestration](#step-5-set-up-pipelines-for-orchestration)
31+
- [Step 6: Enable Data Access for Reporting](#step-6-enable-data-access-for-reporting)
3332

3433
</details>
3534

@@ -49,14 +48,14 @@ Last updated: 2025-05-03
4948
> [!IMPORTANT]
5049
> If you are not able to see the `auto-create report` option neither `copilot` be aware you need to enable AI features in your tenant, click [here](https://github.com/brown9804/MicrosoftCloudEssentialsHub/blob/main/0_Azure/2_AzureAnalytics/0_Fabric/demos/6_PBiCopilot.md#tenant-configuration) to see how.
5150
52-
5351
<img width="550" alt="image" src="https://github.com/user-attachments/assets/7eec0098-7b7b-453c-9dbb-ee1a6390577b">
5452

5553
<img width="550" alt="image" src="https://github.com/user-attachments/assets/4bbb5f10-415b-44b2-8fd0-a5a12482ce2c">
5654

5755
## Demo
5856

5957
Implementing a medallion architecture provides several benefits:
58+
6059
- **Data Quality**: By organizing data into layers, you can apply quality checks and transformations in a structured manner, ensuring that the data in the Gold layer is reliable and ready for analysis.
6160
- **Scalability**: The architecture allows you to scale your data processing pipelines independently for each layer, providing flexibility and efficiency.
6261
- **Performance**: The Gold layer is optimized for performance, which means that your reporting and analytics queries will run faster.
@@ -79,14 +78,13 @@ Implementing a medallion architecture provides several benefits:
7978
- Click on `Workspaces`, then select either your existing workspace or create a new one by clicking `New Workspace`:
8079
- Provide a name and other required details, then create the workspace.
8180

82-
8381
<img width="550" alt="image" src="https://github.com/user-attachments/assets/2f3225fc-6aa6-4eeb-8207-75038b36f18f">
8482

8583
- Now, assign the Fabric Capacity to your workspace by clicking on `Workspace settings` and selecting the fabric capacity under the license.
8684

8785
<img width="550" alt="image" src="https://github.com/user-attachments/assets/1831c97d-6b9a-4470-968d-e7803bc58b80">
8886

89-
https://github.com/user-attachments/assets/c524741c-be91-4fe4-82bc-c841fae8c6c9
87+
<https://github.com/user-attachments/assets/c524741c-be91-4fe4-82bc-c841fae8c6c9>
9088

9189
2. **Create Lakehouses**: Set up three lakehouses for the Bronze, Silver, and Gold layers.
9290

@@ -98,7 +96,7 @@ Implementing a medallion architecture provides several benefits:
9896

9997
<img width="958" alt="image" src="https://github.com/user-attachments/assets/828adf9d-8722-4bef-8694-8c22de330797">
10098

101-
https://github.com/user-attachments/assets/fdb64dd2-a6ec-4da0-a385-e55f875c8f8e
99+
<https://github.com/user-attachments/assets/fdb64dd2-a6ec-4da0-a385-e55f875c8f8e>
102100

103101
### Step 2: Ingest Data into the Bronze Layer
104102

@@ -116,7 +114,7 @@ Implementing a medallion architecture provides several benefits:
116114
| --- | --- |
117115
| <img width="550" alt="image" src="https://github.com/user-attachments/assets/09994e75-3029-4f61-aac8-b50f7c5fd2b1"> | <img width="550" alt="image" src="https://github.com/user-attachments/assets/2b25d187-85e2-48e7-9a97-e7549f28ed9c"> |
118116

119-
https://github.com/user-attachments/assets/56308a58-cf72-4f0f-bf3e-e9e1669fa0df
117+
<https://github.com/user-attachments/assets/56308a58-cf72-4f0f-bf3e-e9e1669fa0df>
120118

121119
> Suppose you need to extract data from your `sql database`
122120
@@ -145,7 +143,7 @@ VALUES
145143
(5, 'Sarah', 'Davis', '1995-09-30', '2020-11-20', 'Marketing Specialist', 60000.0000);
146144
```
147145

148-
https://github.com/user-attachments/assets/357184bf-cc49-4311-84d4-6369514b3366
146+
<https://github.com/user-attachments/assets/357184bf-cc49-4311-84d4-6369514b3366>
149147

150148
> [!IMPORTANT]
151149
> Besides using Data pipelines to bring your SQL information, you can also leverage Microsoft Fabric's mirrored SQL capability. This feature allows you to create a mirrored copy of your SQL database, improving data availability, reliability, and disaster recovery. By maintaining a synchronized copy of your database in a different location, it ensures that your data is always accessible, even in the event of a failure or outage.
@@ -155,8 +153,7 @@ https://github.com/user-attachments/assets/357184bf-cc49-4311-84d4-6369514b3366
155153
> `For example, both Azure SQL Database and Microsoft Fabric are Microsoft products. However, the concept of outbound connections still applies because the data is moving from one service (Azure SQL Database) to another service (Microsoft Fabric), even though they are both within the Microsoft ecosystem. This movement of data is considered outbound because it is leaving the Azure SQL Database environment and entering the Microsoft Fabric environment`. <br/> <br/>
156154
> Under the Zero Trust Architecture, both inbound and outbound connections are treated with the same level of scrutiny and security protocols. This means that whether the connection is inbound or outbound, it is subject to strict verification processes to ensure it is safe and authorized. Key principles of Zero Trust include verification of every access request, least privilege access, continuous monitoring, and micro-segmentation. By applying these principles, Azure ensures that both inbound and outbound connections are secure, reducing the risk of unauthorized access and data breaches.
157155
158-
159-
https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
156+
<https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb>
160157

161158
2. **Create Dataflows or Pipelines**: Use Data Factory to create dataflows or pipelines that ingest data into the Bronze lakehouse.
162159
- In Data Factory, create a new pipeline.
@@ -182,6 +179,7 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
182179
<img width="550" alt="image" src="https://github.com/user-attachments/assets/5e0ae097-e747-47a5-b0c3-e7408e90292a">
183180

184181
### Step 3: Transform Data in the Silver Layer
182+
185183
1. **Create Notebooks or Dataflows**: Use Fabric's notebooks or dataflows to read data from the Bronze layer.
186184
- In the Fabric workspace, create a new notebook.
187185

@@ -198,6 +196,7 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
198196
- Use the `write.format("delta").save()` method to save the data to the `cleansed_Silver` lakehouse.
199197

200198
> **PySpark Code to Move Data from Bronze to Silver**:
199+
201200
```python
202201
# Read data from the Bronze layer
203202
bronze_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-bronze-lakehousename>.Lakehouse/Tables/<table name>")
@@ -216,6 +215,7 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
216215
<img width="550" alt="image" src="https://github.com/user-attachments/assets/5affce77-ec21-4b03-881e-877ff2425b9d">
217216

218217
### Step 4: Curate Data in the Gold Layer
218+
219219
1. **Read Data from Silver Layer**: Use notebooks or dataflows to read data from the Silver lakehouse.
220220
- In a new notebook, connect to the `cleansed_Silver` lakehouse.
221221
2. **Apply Business Logic**: Apply any additional business logic or aggregations.
@@ -231,6 +231,7 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
231231
> Applying some transformations: If you want see more, click [here](./src/1_notebook_silver_to_gold.ipynb) to see a sample of the notebook.
232232

233233
> **PySpark Code to Move Data from Silver to Gold**:
234+
234235
```python
235236
# Read data from the Silver layer
236237
silver_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-silver-lakehouse>.Lakehouse/Tables/<table name>")
@@ -245,9 +246,11 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
245246
# Write data to the Gold layer
246247
gold_df.write.mode("overwrite").option("mergeSchema", "true").format("delta").save("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-gold-lakehouse name>.Lakehouse/Tables/<your table name>")
247248
```
249+
248250
<img width="550" alt="image" src="https://github.com/user-attachments/assets/d092d34f-86f5-4853-aea7-88ff4062f4af">
249251

250252
### Step 5: Set Up Pipelines for Orchestration
253+
251254
1. **Create Pipelines**: Create pipelines to automate the movement of data from the Bronze layer to the Silver layer, and from the Silver layer to the Gold layer.
252255
- In Data Factory, create a new pipeline.
253256
- Add a copy activity to move data from the `raw_Bronze` lakehouse to the `cleansed_Silver` lakehouse.
@@ -257,6 +260,7 @@ https://github.com/user-attachments/assets/2a64762a-f120-4448-b0fb-7a49f4d1bedb
257260
- Consider the frequency of data updates and the latency that is acceptable for your use case.
258261

259262
### Step 6: Enable Data Access for Reporting
263+
260264
1. **Configure SQL Analytics Endpoint**:
261265
- Validate if you have the SQL Analytics Endpoint configured, you can review it from workspace view, primarly is required for Gold layer to be accessible to your reporting tools.
262266

0 commit comments

Comments
 (0)