From 4820ff26f6966f7c298ebb411aede87f511e6534 Mon Sep 17 00:00:00 2001 From: Shima Date: Wed, 12 Nov 2025 14:05:47 +0100 Subject: [PATCH] Update group-by file. --- examples/group-by.md | 42 +++++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/examples/group-by.md b/examples/group-by.md index 66b34500c..a366a00ee 100644 --- a/examples/group-by.md +++ b/examples/group-by.md @@ -1,39 +1,51 @@ # Group check results by category with Soda Core -You can use a SQL query in a failed row check to group failed check results by one or more categories using Soda Core. +You can use a SQL query in a failed row check to group failed check results by one or more categories using Soda Core. This approach is particularly useful when you want to identify specific subsets of data that fail quality checks and understand patterns in your data quality issues. Use a SQL editor to build and test a SQL query with your data source, then add the query to a failed rows check to execute it during a Soda scan. -The following example illustrates how to build a query that identifies the countries where the average age of people is less than 25. +## Example: Identifying countries with low average age + +The following example demonstrates how to build a query that identifies countries where the average age of people is less than 25. This step-by-step approach helps you develop and test your query before implementing it in Soda Core. + +1. Beginning with a basic query, the output shows the data this example works with. -1. Begining with a basic query, the output shows the data this example works with. ```sql SELECT * FROM Customers; ``` -![group-by-1](/docs/assets/images/group-by-1.png){:height="600px" width="600px"} + +![group-by-1](/docs/assets/images/group-by-1.png) + 2. Build a query to select groups with the relevant aggregations. + ```sql SELECT country, AVG(age) as avg_age FROM Customers GROUP BY country ``` -![group-by-2](/docs/assets/images/group-by-2.png){:height="600px" width="600px"} + +![group-by-2](/docs/assets/images/group-by-2.png) + 3. Identify the "bad" group (where the average age is less than 25) from among the grouped results. + ```sql - SELECT country, AVG(age) as avg_age - FROM Customers - GROUP BY country - HAVING AVG(age) < 25 +SELECT country, AVG(age) as avg_age +FROM Customers +GROUP BY country +HAVING AVG(age) < 25 ``` -![group-by-3](/docs/assets/images/group-by-3.png){:height="600px" width="600px"} + +![group-by-3](/docs/assets/images/group-by-3.png) + 4. Now that the query yields the expected results, add the query to a failed row check, as per the following example. + ```yaml checks for dim_customers: - failed rows: name: Average age of citizens is less than 25 fail query: | - SELECT country, AVG(age) as avg_age - FROM Customers - GROUP BY country - HAVING AVG(age) < 25 -``` \ No newline at end of file + SELECT country, AVG(age) as avg_age + FROM Customers + GROUP BY country + HAVING AVG(age) < 25 +```