Skip to content

Commit defbd5a

Browse files
authored
Merge cdc674d into 390439c
2 parents 390439c + cdc674d commit defbd5a

File tree

18 files changed

+4605
-49
lines changed

18 files changed

+4605
-49
lines changed

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Costa Rica
55
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
66
[brown9804](https://github.com/brown9804)
77

8-
Last updated: 2025-05-02
8+
Last updated: 2025-05-03
99

1010
------------------------------------------
1111

@@ -147,7 +147,7 @@ From [Microsoft Documentation](https://learn.microsoft.com/pt-br/fabric/fundamen
147147
4. **You want to empower data consumers** (analysts, scientists, engineers) to discover and understand data assets easily.
148148
5. **You are scaling your data operations** and need consistent governance policies across teams and projects.
149149

150-
Click to read more about [Microsoft Purview for Fabric - Overview](./Purview-Fabric.md).
150+
Click to read more about [Microsoft Purview for Fabric - Overview](./Workloads-Specific/Purview/PurviewforFabric.md).
151151

152152
## Networking
153153

@@ -201,13 +201,13 @@ Click to read more about [Microsoft Purview for Fabric - Overview](./Purview-Fab
201201

202202
- [Azure Data Factory (ADF) - Best Practices Overview](./Workloads-Specific/DataFactory/BestPractices.md)
203203
- [Data Engineering - Best Practices Overview](./Workloads-Specific/DataEngineering/BestPractices.md)
204-
- [Data Warehouse - Best Practices Overview]() - in progress
205-
- [Data Science - Best Practices Overview]() - in progress
206-
- [Real-Time Intelligence - Best Practices Overview]() - in progress
204+
- [Data Warehouse - Best Practices Overview](./Workloads-Specific/DataWarehouse/BestPractices.md)
205+
- [Data Science - Best Practices Overview](./Workloads-Specific/DataScience/BestPractices.md) - in progress
206+
- [Real-Time Intelligence - Best Practices Overview](./Workloads-Specific/RealTimeIntelligence/BestPractices.md) - in progress
207207
- [Power Bi - Best Practices Overview](./Workloads-Specific/PowerBi/BestPractices.md)
208-
- [Copilot - Best Practices Overview]() - in progress
209-
- [Purview - Best Practices Overview]() - in progress
210-
- [OneLake - Best Practices Overview]() - in progress
208+
- [Copilot - Best Practices Overview](./Workloads-Specific/Copilot/BestPractices.md) - in progress
209+
- [Purview - Best Practices Overview](./Workloads-Specific/Purview/BestPractices.md) - in progress
210+
- [OneLake - Best Practices Overview](./Workloads-Specific/OneLake/BestPractices.md) - in progress
211211

212212
<div align="center">
213213
<h3 style="color: #4CAF50;">Total Visitors</h3>
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copilot - Best Practices Overview
2+
3+
Costa Rica
4+
5+
[![GitHub](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com)
6+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
7+
[brown9804](https://github.com/brown9804)
8+
9+
Last updated: 2025-05-03
10+
11+
----------
12+
13+
<details>
14+
<summary><b>List of References</b> (Click to expand)</summary>
15+
16+
</details>
17+
18+
<div align="center">
19+
<h3 style="color: #4CAF50;">Total Visitors</h3>
20+
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
21+
</div>

Workloads-Specific/DataEngineering/BestPractices.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Costa Rica
66
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
77
[brown9804](https://github.com/brown9804)
88

9-
Last updated: 2025-05-02
9+
Last updated: 2025-05-03
1010

1111
----------
1212

@@ -61,7 +61,7 @@ Last updated: 2025-05-02
6161
- **Comprehensive Schema Documentation:** Create detailed, auto-generated documentation for every endpoint; include sample queries, expected responses, and precise error messages to aid developer understanding.
6262
- **Robust Error Handling:** Implement consistent, informative error responses and integrate thorough test suites to guarantee smooth operation and backward compatibility as the API evolves.
6363

64-
https://github.com/user-attachments/assets/8971651d-9aff-4b41-94ca-9a35b9241f22
64+
<https://github.com/user-attachments/assets/8971651d-9aff-4b41-94ca-9a35b9241f22>
6565

6666
<div align="center">
6767
<h3 style="color: #4CAF50;">Total Visitors</h3>

Workloads-Specific/DataFactory/BestPractices.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Costa Rica
66
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
77
[brown9804](https://github.com/brown9804)
88

9-
Last updated: 2025-05-02
9+
Last updated: 2025-05-03
1010

1111
----------
1212

@@ -56,6 +56,9 @@ Last updated: 2025-05-02
5656

5757
</details>
5858

59+
<div align="center">
60+
<img src="https://github.com/user-attachments/assets/658689cd-f045-491f-996c-e64e4008acd1" alt="Centered Image" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
61+
</div>
5962

6063
## Clear Pipeline Structure
6164

@@ -364,6 +367,7 @@ graph TD
364367
## Source Control
365368

366369
> Benefits of Git Integration: <br/>
370+
>
367371
> - **Version Control**: Track and audit changes, and revert to previous versions if needed. <br/>
368372
> - **Collaboration**: Multiple team members can work on the same project simultaneously. <br/>
369373
> - **Incremental Saves**: Save partial changes without publishing them live. <br/>
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Data Science - Best Practices Overview
2+
3+
Costa Rica
4+
5+
[![GitHub](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com)
6+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
7+
[brown9804](https://github.com/brown9804)
8+
9+
Last updated: 2025-05-03
10+
11+
----------
12+
13+
<details>
14+
<summary><b>List of References</b> (Click to expand)</summary>
15+
16+
</details>
17+
18+
<div align="center">
19+
<h3 style="color: #4CAF50;">Total Visitors</h3>
20+
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
21+
</div>
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Data Warehouse - Best Practices Overview
2+
3+
Costa Rica
4+
5+
[![GitHub](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com)
6+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
7+
[brown9804](https://github.com/brown9804)
8+
9+
Last updated: 2025-05-03
10+
11+
----------
12+
13+
> Ensure that your data warehouse solution is engineered for scalability, resilience, and efficient integration of diverse data sources. Every component (from the core warehouse to mirrored databases) should adhere to strict best practices for structure, documentation, and management, ensuring long-term maintainability and robust disaster recovery.
14+
15+
<details>
16+
<summary><b>List of References</b> (Click to expand)</summary>
17+
18+
- [Ingest data into the Warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/ingest-data)
19+
- [Performance guidelines in Fabric Data Warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/guidelines-warehouse-performance)
20+
21+
</details>
22+
23+
<details>
24+
<summary><b>Table of Content</b> (Click to expand)</summary>
25+
26+
- [Sample Warehouse Environment](#sample-warehouse-environment)
27+
- [Structured Warehouse Implementation](#structured-warehouse-implementation)
28+
- [Interactive Notebooks for Data Warehousing](#interactive-notebooks-for-data-warehousing)
29+
- [Using Mirroring to Your Benefit](#using-mirroring-to-your-benefit)
30+
31+
</details>
32+
33+
<div align="center">
34+
<img src="https://github.com/user-attachments/assets/47c01e2a-48aa-4bc5-9a0f-fd2630618687" alt="Centered Image" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
35+
</div>
36+
37+
## Sample Warehouse Environment
38+
39+
> Develop an isolated sample warehouse to prototype, test, and train on the data warehouse structure. This environment mimics the production warehouse architecture but contains a representative subset of data. Its purpose is to validate new queries, ETL routines, and performance tuning while insulating production operations from potential disruptions. You can deploy a sample warehouse using anonymized or synthetic data. For example, use a smaller, mirrored version of the production warehouse structure to experiment with SQL queries, develop new ETL pipelines, or train team members without impacting live data and processes.
40+
41+
<https://github.com/user-attachments/assets/acaecdd1-e81c-4e3a-b14a-db054f700f3e>
42+
43+
## Structured Warehouse Implementation
44+
45+
> Build a robust, centralized data warehouse that organizes data into well-defined layers (often referred to as Bronze, Silver, and Gold). Layering the data warehouse ensures fast query performance, streamlined management, and strong governance. Leverage proper indexing, partitioning schemes, metadata tagging, and lineage tracking to support compliance and facilitate troubleshooting.
46+
47+
Create a warehouse solution that segments data as follows:
48+
49+
- Bronze Layer: Ingests raw, untransformed data maintaining source fidelity.
50+
- Silver Layer: Applies data cleansing, validation, and enrichment.
51+
- Gold Layer: Produces analytics-ready data using optimized storage formats like Parquet or Delta Lake, with partitioning by date or region. Integrate metadata catalogs and RBAC controls for added governance.
52+
53+
> Here is a [reference of a medallion architecture using only Fabric](./Medallion_Architecture/). If you need to handle `complex data transformations and large-scale data processing`, you can use our combined solution of **Fabric + Databricks**. This powerful combination leverages the strengths of both platforms to provide a robust data processing pipeline. This workshop on [Fabric with Databricks for Data Analytics](https://microsoft.github.io/TechExcel-Fabric-with-Databricks-for-Data-Analytics/) offers a comprehensive step-by-step guide on developing Medallion Architecture using Fabric and Databricks. <br/>
54+
55+
| Medallion Architecture using only Fabric | Medallion Architecture Fabric + Databricks |
56+
| --- | --- |
57+
| <img width="550" alt="image" src="https://github.com/user-attachments/assets/b4394d54-9bb0-453b-abf8-cfaaa8e532d2" /> | <img width="550" alt="image" src="https://github.com/user-attachments/assets/c866098c-ffd1-4438-bc77-565786c91601"> |
58+
59+
## Interactive Notebooks for Data Warehousing
60+
61+
> Use interactive notebooks as exploratory and documentation tools for your warehouse operations. These notebooks serve as an effective interface for testing queries, performing data analysis, and capturing transformation logic. Rich markdown annotations, code segmentation, and version control increase collaboration while ensuring reproducibility across the team.
62+
63+
Create notebooks that are segmented into distinct sections:
64+
65+
- Data Loading: Scripts to pull data from the warehouse.
66+
- Data Transformation: Blocks that illustrate cleaning and enrichment steps.
67+
- Analysis & Visualization: SQL queries and charts generated from warehouse data, supplemented with detailed markdown explanations and inline comments to clarify business logic.
68+
69+
## Using Mirroring to Your Benefit
70+
71+
> Mirroring offers a modern, efficient way to continuously and seamlessly access and ingest data from operational databases or data warehouses. It works by replicating a snapshot of the source database into OneLake, and then keeping that replica in near real-time sync with the original. This ensures that your data is always up to date and readily available for analytics or downstream processing. `As part of the value offering, each Fabric compute SKU includes a built-in allowance of free Mirroring storage, proportional to the compute capacity you provision. For example, provisioning an F64 SKU grants you 64 terabytes of free Mirroring storage. You only begin incurring OneLake storage charges if your mirrored data exceeds this free limit or if the compute capacity is paused.` Click [here](https://azure.microsoft.com/en-us/pricing/details/microsoft-fabric/?msockid=38ec3806873362243e122ce086486339) to read more about it.
72+
73+
<div align="center">
74+
<img src="https://github.com/user-attachments/assets/ed868665-1823-42ff-9cd7-d0ee3310c184" alt="Centered Image" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
75+
</div>
76+
77+
| **Mirroring Option** | Details |
78+
|--------------------------------------------------|--------------------|
79+
| **Mirrored Azure SQL Database** | Configure a mirrored Azure SQL Database with geo-redundancy and automatic failover. For example, use Azure’s built-in replication to maintain a secondary copy that seamlessly takes over during primary instance outages, ensuring continuous data availability. |
80+
| **Mirrored Snowflake** | Deploy a Snowflake mirror by setting up data replication between your primary instance and a secondary environment. Regularly validate synchronization and monitor rollback capabilities to confirm that the mirror remains current and can support operations during failover or testing cycles. |
81+
| **Mirrored Azure Cosmos DB** | Configure an Azure Cosmos DB mirroring setup in preview mode that replicates data across multiple regions. Test the environment by simulating high-load queries and failover events to ensure that global access is maintained with minimal latency. |
82+
| **Mirrored Azure Database for PostgreSQL** | Set up a mirrored Azure Database for PostgreSQL in its preview configuration. Create read replicas with continuous synchronization, perform failover drills, and track replication latency to guarantee that the mirrored instance maintains data integrity and high availability during operational stress. |
83+
| **Mirrored Azure SQL Managed Instance** | Configure an Azure SQL Managed Instance in a mirrored setup using strategies like log shipping or transactional replication. Monitor key performance metrics to ensure that replication latency is minimal, and the mirror is capable of supporting a swift transition during outages or maintenance windows. |
84+
| **Mirrored Database** | Set up a mirrored database configuration that synchronizes periodically with a primary instance. Schedule automated tests and synchronization checks, and simulate failover events to validate that the data remains consistent, with built-in alerts and monitoring demonstrating the mirror’s readiness for production use. |
85+
86+
<div align="center">
87+
<h3 style="color: #4CAF50;">Total Visitors</h3>
88+
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
89+
</div>

0 commit comments

Comments
 (0)