Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Sources/onboarding-brewery-solution
Submodule onboarding-brewery-solution updated 43 files
+0 −3 .gitignore
+20 −0 API/Scenario_parameters_whatif.json
+6 −0 API/Scenario_whatif.json
+59 −0 API/Solution.yaml
+105 −0 API/Solution_etl.yaml
+8 −0 API/deployment_info.sh
+4 −4 Simulation/BusinessApp_Simulation.sml.xml
+188 −0 Simulation/CSV_Simulation.sml.xml
+95 −0 Simulation/Resource/Brewery.ist.xml
+23 −0 Simulation/Resource/InstanceCalibration.ini.xml
+0 −0 Simulation/Resource/scenariorun-data/Bar.csv
+1 −1 Simulation/Resource/scenariorun-data/Customer.csv
+1 −1 Simulation/Resource/scenariorun-data/arc_to_Customer.csv
+144 −0 Simulation/XML_Simulation.sml.xml
+4 −0 code/requirements.txt
+0 −9 code/run_templates/ETL/run.json
+0 −15 code/run_templates/Example/run.json
+155 −0 code/run_templates/etl_with_azure_storage/etl.py
+17 −0 code/run_templates/etl_with_azure_storage/run.json
+147 −0 code/run_templates/etl_with_local_file/etl.py
+17 −0 code/run_templates/etl_with_local_file/run.json
+2 −0 code/run_templates/orchestrator_tutorial/dataset/Bar.csv
+64 −0 code/run_templates/orchestrator_tutorial/dataset/Customer.csv
+499 −0 code/run_templates/orchestrator_tutorial/dataset/arc_to_Customer.csv
+20 −0 code/run_templates/orchestrator_tutorial/parameters/parameters.json
+4 −0 code/run_templates/orchestrator_tutorial/replace_scenariorun_data.sh
+4 −0 code/run_templates/orchestrator_tutorial/restore_scenariorun_data.sh
+69 −0 code/run_templates/orchestrator_tutorial/run.json
+0 −15 code/run_templates/scripts/display_csm_env.py
+0 −9 code/run_templates/scripts/get_logger.py
+0 −21 code/run_templates/scripts/run_simulator.py
+0 −15 code/run_templates/scripts/use_default_dataset.py
+0 −0 code/run_templates/what_if/parameters_handler/__init__.py
+118 −0 code/run_templates/what_if/parameters_handler/main.py
+0 −0 code/run_templates/what_if/parameters_handler/tests/__init__.py
+2 −0 code/run_templates/what_if/parameters_handler/tests/test-data/expected-main/Bar.csv
+2 −0 code/run_templates/what_if/parameters_handler/tests/test-data/expected-update-dataset/Bar.csv
+3 −0 code/run_templates/what_if/parameters_handler/tests/test-data/scenariorun-data/Bar.csv
+5 −0 code/run_templates/what_if/parameters_handler/tests/test-data/scenariorun-data/Customer.csv
+9 −0 code/run_templates/what_if/parameters_handler/tests/test-data/scenariorun-data/arc_to_Customer.csv
+20 −0 code/run_templates/what_if/parameters_handler/tests/test-data/scenariorun-parameters/parameters.json
+170 −0 code/run_templates/what_if/parameters_handler/tests/test_main.py
+1 −3 project.csm
9 changes: 7 additions & 2 deletions Tutorial/docs/tutorials/web-app/.nav.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
nav:
- index.md
- Frontend: frontend.md
- ADX database: adx-database.md
- Power BI: power-bi.md
- ADX:
- Overview: adx-database.md
- Functions & Performance: adx-functions-performance.md
- Power BI:
- Overview: power-bi.md
- Parameters & Reuse: power-bi-parameters.md
- Embedding & Security: power-bi-embedding.md
204 changes: 202 additions & 2 deletions Tutorial/docs/tutorials/web-app/adx-database.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,203 @@
# ADX database
---
title: ADX Database Overview
summary: Integrate and manage the Azure Data Explorer (ADX) layer for your web application.
tags:
- adx
- data
- web-app
---

Integrate and manage the ADX database for your web application.
# ADX Database Overview

This page introduces how the web-app consumes data from Azure Data Explorer (ADX) and how to structure your data estate for analytics & dashboards.

## 1. Why ADX for the Solution?

- Sub-second exploratory analytics on time-series & events
- Native integration with Power BI (DirectQuery / import) and Azure ecosystem
- Scalable ingestion & retention policies

## 2. High-Level Flow (Project-specific)

```
Sources (onboarding data / storage)
└─> ETL (Cosmo Tech runner templates)
- code/run_templates/etl_with_local_file
- code/run_templates/etl_with_azure_storage
└─> Backend pipelines (Solution_etl.yaml, Create.kql)
- Apply ADX tables/mappings/policies
- Register update policies / materialized views
- Create serving functions
→ ADX Raw / Prepared Tables
→ Serving Functions
→ Power BI (embedded via backend)
```

Notes:
- The backend is the source of truth for ADX artifacts via `Create.kql` and CI/CD.
- Onboarding data is optional and used only in non-prod to validate end-to-end.

## 3. Core Concepts (in this project)

| Concept | Where it’s defined | Example |
|---------|---------------------|---------|
| Tables & Mappings | Backend `Create.kql` | `BrewerEvents`, JSON/CSV mappings |
| Update Policy / MV | Backend `Create.kql` | Raw -> Cleaned / 15-min aggregates |
| Serving Function | Backend KQL scripts | `Dashboard_Fact(...)` |

Avoid manual table/function creation in shared envs—propose changes via PR to backend scripts.

## 4. Schema Source of Truth & Design Patterns

`Create.kql` dictates the schema. Use these patterns when proposing changes:

| Column Type | Purpose | Naming |
|-------------|---------|--------|
| Timestamp | Query time axis | `Timestamp` first column |
| Entity Keys | Filter dimensions | `Site`, `BatchId`, `Product` |
| Measures | Numeric metrics | `DurationSeconds`, `Temperature` |
| Status / Type | Categorical attributes | `EventType` |

Retention example (applied by backend):

```kusto
.alter table BrewerEvents policy retention softdelete = 30d
```

## 5. Ingestion Tips (Project-specific)

- For dev/onboarding data, use the provided ETL templates to generate the source dataset:
- Local file template: `Sources/onboarding-brewery-solution/code/run_templates/etl_with_local_file/etl.py`
- Azure Storage template: `Sources/onboarding-brewery-solution/code/run_templates/etl_with_azure_storage/etl.py`
- Sample inputs live in `Sources/onboarding-brewery-solution/Simulation/Resource/scenariorun-data/` and, when zipped, under `reference/Nodes/*.csv` and `reference/Edges/*.csv`.
- ADX ingestion is executed by the backend pipeline using `Create.kql` mappings. Do not ingest directly into curated tables. If you need to validate a schema change, ingest into a staging table defined in `Create.kql` and let update policies/materialized views populate the serving tables.
- Keep file formats aligned with the repository (CSV for Nodes/ and Edges/). For larger volumes in higher environments, consider Parquet in storage and update the ingestion mapping in `Create.kql` accordingly.
- For near real-time flows, rely on backend-managed connectors and orchestration defined under `Sources/onboarding-brewery-solution/API/` (e.g., `Solution_etl.yaml`) rather than ad-hoc ingestion; coordinate with backend to register any new sources.

## 6. Schema Provisioning via Backend

In this project, the ADX schema (tables, mappings, policies) is defined in the backend `Create.kql` script and deployed through backend pipelines. Do not recreate tables manually from the UI.

Workflow for changes:
1. Propose updates in `Create.kql` (new columns, policies, mappings)
2. Run local validation against a dev cluster (optional)
3. Open a PR; backend CI applies changes to non-prod
4. After approval, pipeline promotes to production

Keep serving function contracts stable; if breaking, version functions (e.g., `Dashboard_Fact_v2`).

## 7. Serving Functions

All dashboard/API queries should route via serving functions (see: [ADX Functions & Performance](./adx-functions-performance.md)).

Benefits:
- Centralize filter logic
- Enforce parameter contracts
- Reduce duplication across Power BI & services

## 8. Governance & Source Control

- Store Kusto artifacts in backend repo alongside `Create.kql`.
- Use PRs and CI to apply changes to dev first, then promote.

## 9. Observability

Use built-in commands:
```kusto
.show table BrewerEvents details
.show functions
.show queries | where StartedOn > ago(15m)
.show operations | where StartedOn > ago(1h) and Status != 'Completed'
// If using managed ingestion into tables, check ingestion state
.show ingestion failures | where Database == '<database>' and Table == 'BrewerEvents' and Timestamp > ago(1h)
```

## 10. Next

Proceed to implementing reusable functions and performance tuning.

> See: [Functions & Performance](./adx-functions-performance.md)

## 11. Junior Onboarding: ADX Setup (Step-by-step)

This walkthrough assumes:
- You have access to the Azure subscription and target ADX cluster/database
- Backend owns deployment of schema via `Create.kql`
- You have the onboarding repo checked out at `Sources/onboarding-brewery-solution`

### A. Prerequisites (local)

- Azure CLI (logged in): `az login`
- ADX permissions: can view database and run read-only queries
- Coordinates from your team:
- Cluster URI: `https://<cluster>.<region>.kusto.windows.net`
- Database name: `<database>`
- Backend pipeline path for `Create.kql` (read-only)

### B. Provision (done by Backend)

Backend CI will apply `Create.kql` to provision tables, mappings, policies and serving functions. As a junior, you don’t run schema changes yourself.

What you should do: verify that objects exist after deployment.

- In Kusto Explorer or Web UI, run:
```kusto
.show tables
.show functions
```
Confirm presence of domain tables (e.g., `BrewerEvents`) and serving functions (e.g., `Dashboard_Fact` or similar per project naming).

### C. Load Sample Data (optional, non-prod)

If the environment is empty and your team allows sample data loads, use the onboarding dataset.

Location in repo:
- `Sources/onboarding-brewery-solution/Simulation/Resource/scenariorun-data/`
- `Bar.csv`, `Customer.csv`, `arc_to_Customer.csv` (example graph data)

Typical approaches (confirm which is permitted):
- Use backend ETL runner templates under `code/run_templates/` (e.g., `etl_with_local_file`) — usually interacts with Cosmo Tech API & datasets
- Or a one-off ADX ingestion (dev only) via Web UI upload to a staging table that matches `Create.kql` schemas

Ask your mentor which path to use in your environment.

### D. Sanity Queries (read-only)

Run simple queries to check data landed and functions work:
```kusto
// Replace names with your actual tables/functions
BrewerEvents | take 5
BrewerEvents | summarize count() by Site
Dashboard_Fact(datetime(ago(12h)), datetime(now()), 'Lyon_01', '') | take 10
```

Expect a few rows and reasonable counts. If empty:
- Confirm ingestion happened
- Check RLS/filters (if querying through Power BI)

### E. Power BI Integration Check

This project embeds Power BI via backend. Your tasks:
- Confirm the dataset’s main query references the serving function(s)
- Validate that parameter values (Site, TimeWindowHours) match what the function expects
- Open the embedded dashboard and verify visuals populate for a known site/time window

If visuals are blank, coordinate with backend to check RLS mapping and embed token scopes.

### F. Change Requests

When you need schema or serving function changes:
1. Discuss with backend owner
2. Propose edits in `Create.kql` or function `.kql` files
3. Open a PR → CI applies to dev → validate → promote

Avoid ad-hoc manual changes in shared environments.

### G. What to Document During Onboarding

- Cluster/Database you used
- Tables/functions you validated
- Screenshots of the sanity queries results
- Any discrepancies found & who you contacted

This helps the next newcomer follow the same path.
177 changes: 177 additions & 0 deletions Tutorial/docs/tutorials/web-app/adx-functions-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
title: ADX Functions & Performance
summary: Create reusable functions and optimize Kusto queries powering the web-app dashboards.
tags:
- adx
- performance
- web-app
- optimization
---

# ADX Functions & Performance

This guide shows how to encapsulate logic into Azure Data Explorer (ADX) functions and optimize query performance for responsive dashboards.

> Focus areas:
> 1. Reuse logic across dashboards & web-app deployments
> 2. Reduce data scanned & transferred to Power BI
> 3. Improve freshness vs cost balance
> 4. Make parameters first-class citizens

## 1. Organize Your ADX Layer

| Layer | Purpose | Artifacts |
|-------|---------|-----------|
| Raw ingestion | Land data as-is | Ingestion mappings, staging tables |
| Harmonized | Cleaned, typed, conformed | Update policies, materialized views |
| Serving | Query-ready, thin, parameterizable | Functions, export tables |

Keep functions in a dedicated folder in source control (e.g. `infrastructure/adx/functions/`). Version them and deploy declaratively (ARM/Bicep/Terraform / Kusto scripts).

## 2. Defining Functions

Two main kinds:
- Inline (lightweight) functions
- Stored functions created once in the database

Example: `GetBrewEvents(startTime:datetime, endTime:datetime, site:string)`

```kusto
.create-or-alter function with (docstring = 'Brew events filtered by time & site', folder='serving/brew')
GetBrewEvents(startTime:datetime, endTime:datetime, site:string)
{
BrewerEvents
| where Timestamp between (startTime .. endTime)
| where Site == site
| project Timestamp, Site, BatchId, EventType, DurationSeconds
}
```

### Best Practices
- Use PascalCase for function names, camelCase for parameters.
- Provide `docstring` and `folder` for discoverability.
- Avoid hard-coded constants; expose as parameters.

## 3. Composing Functions

Create small focused functions and compose:

```kusto
// Base filter
.create-or-alter function GetBatches(startTime:datetime, endTime:datetime){
Batches | where StartTime between (startTime .. endTime)
| project BatchId, Product, StartTime, EndTime
}

// Duration enrichment
.create-or-alter function GetBatchDurations(startTime:datetime, endTime:datetime){
GetBatches(startTime, endTime)
| extend DurationMinutes = datetime_diff('minute', EndTime, StartTime)
}

// Aggregated KPI
.create-or-alter function GetOutputKPI(startTime:datetime, endTime:datetime){
GetBatchDurations(startTime, endTime)
| summarize AvgDuration = avg(DurationMinutes), TotalBatches = count()
}
```

## 4. Parameter Patterns for Power BI

When embedding, the web-app can compute a time window & site, then pass them as query parameters.

Pattern: a single root function that accepts all Power BI-relevant parameters and outputs a narrow dataset.

```kusto
.create-or-alter function Dashboard_Fact(startTime:datetime, endTime:datetime, site:string, product:string){
GetBrewEvents(startTime, endTime, site)
| where product == '' or Product == product
| summarize Events=count(), TotalDuration=sum(DurationSeconds)
}
```

In Power BI M query (parameterized):

```m
let
StartTime = DateTimeZone.UtcNow() - #duration(0,12,0,0),
EndTime = DateTimeZone.UtcNow(),
Site = Text.From(EnvSiteParameter),
Product = "" ,
Source = Kusto.Contents("https://<cluster>.<region>.kusto.windows.net", "<database>",
"Dashboard_Fact(datetime({" & DateTimeZone.ToText(StartTime, "yyyy-MM-dd HH:mm:ss") & "}), datetime({" & DateTimeZone.ToText(EndTime, "yyyy-MM-dd HH:mm:ss") & "}), '" & Site & "', '" & Product & "')")
in
Source
```

(Replace the dynamic site & product with actual PBI parameters linked to slicers.)

## 5. Performance Tactics

| Goal | Tactic | Notes |
|------|--------|-------|
| Min scan | Use narrower projection early | Always `project` right after filters |
| Fast filter | Use ingestion time partitioning & `between` | Align queries to partition keys |
| Reuse | Materialized views | For heavy joins/expensive lookups |
| Lower transfer | Summarize before export to PBI | Return aggregated rows, not raw events |
| Adaptive freshness | Tiered functions (raw vs MV) | Switch via a boolean parameter |

### Example: Hybrid Live + Historical

```kusto
.create-or-alter function Dashboard_Fact_Live(startAgo:timespan, site:string){
GetBrewEvents(now()-startAgo, now(), site)
}

.create-or-alter function Dashboard_Fact_History(startTime:datetime, endTime:datetime, site:string){
Materialized_BrewAgg
| where WindowStart between (startTime .. endTime)
| where Site == site
}

.create-or-alter function Dashboard_Fact_Union(startTime:datetime, endTime:datetime, liveAgo:timespan, site:string){
union isfuzzy=true
(Dashboard_Fact_History(startTime, endTime, site))
(Dashboard_Fact_Live(liveAgo, site))
| summarize Events=sum(Events), TotalDuration=sum(TotalDuration)
}
```

## 6. Monitoring & Diagnostics

```kusto
.show functions
.show commands-and-queries | where StartedOn > ago(1h)
```

Enable Query Store & alerts for:
- Long-running queries > 15s
- High data scanned vs returned ratio

## 7. Deployment Automation

In this project, functions are deployed by the backend CI using the authoritative Kusto scripts. Prefer adding/altering function definitions in the backend repository and let the pipeline apply them idempotently.

If you need local validation, run them against a dev database from your admin tools, but do not hand-deploy to shared environments.

### CI Hints

- Store each function in its own `.kql` file with `create-or-alter`
- Group in folders by domain (e.g., `serving/brew`)
- Add lightweight unit checks (syntax validation) in CI
- Tag releases with a migration note

## 8. Checklist

- [ ] All dashboard queries go through a root serving function
- [ ] Parameters documented (name, type, default)
- [ ] Functions have docstrings & folder metadata
- [ ] Query completion p50 < 3s
- [ ] No function returns > 100k rows to Power BI
- [ ] Materialized views used for joins > 10M rows

## 9. Next Steps

Continue with Power BI parameterization to bind these functions to dynamic dashboards.

> See: [Power BI Parameters & Reuse](./power-bi-parameters.md)
Loading