Skip to content

Commit 1dabd66

Browse files
authored
Merge pull request #10652 from BohuTANG/doc-groupby
docs(group by): add group by query syntax
2 parents 7ad8c21 + b7fa0dc commit 1dabd66

18 files changed

+428
-129
lines changed

docs/doc/14-sql-commands/00-ddl/20-table/10-ddl-create-table.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ AS SELECT query
8080

8181
Creates a transient table.
8282

83-
Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the [AT](./../../20-query-syntax/03-dml-at.md) clause in the SELECT statement will not work for transient tables. Please note that you can still [drop](./20-ddl-drop-table.md) and [undrop](./21-ddl-undrop-table.md) a transient table.
83+
Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the [AT](./../../20-query-syntax/03-query-at.md) clause in the SELECT statement will not work for transient tables. Please note that you can still [drop](./20-ddl-drop-table.md) and [undrop](./21-ddl-undrop-table.md) a transient table.
8484

8585
Transient tables help save your storage expenses because they do not need extra space for historical data compared to non-transient tables. See [example](#create-transient-table-1) for detailed explanations.
8686

docs/doc/14-sql-commands/00-ddl/20-table/60-optimize-table.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Snapshot, segment, and block are the concepts Databend uses for data storage. Da
1414

1515
Databend automatically creates table snapshots upon data updates. A snapshot represents a version of the table's segment metadata.
1616

17-
When working with Databend, you're most likely to access a snapshot with the snapshot ID when you retrieve and query a previous version of the table's data with the [AT](../../20-query-syntax/03-dml-at.md) clause.
17+
When working with Databend, you're most likely to access a snapshot with the snapshot ID when you retrieve and query a previous version of the table's data with the [AT](../../20-query-syntax/03-query-at.md) clause.
1818

1919
A snapshot is a JSON file that does not save the table's data but indicate the segments the snapshot links to. If you run [FUSE_SNAPSHOT](../../../15-sql-functions/111-system-functions/fuse_snapshot.md) against a table, you can find the saved snapshots for the table.
2020

docs/doc/14-sql-commands/00-ddl/20-table/70-ddl-restore-table.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The capability to restore a table is subject to these conditions:
1414

1515
- You cannot roll back after restoring a table to a prior state, but you can restore the table again to an earlier state.
1616

17-
- Databend recommends this command for emergency recovery only. To query the history data of a table, use the [AT](../../20-query-syntax/03-dml-at.md) clause.
17+
- Databend recommends this command for emergency recovery only. To query the history data of a table, use the [AT](../../20-query-syntax/03-query-at.md) clause.
1818

1919
## Syntax
2020

docs/doc/14-sql-commands/20-query-syntax/01-dml-select.md renamed to docs/doc/14-sql-commands/20-query-syntax/01-query-select.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ SELECT number FROM numbers(3) AS a;
9898

9999
## AT Clause
100100

101-
The AT clause enables you to query previous versions of your data. For more information, see [AT](./03-dml-at.md).
101+
The AT clause enables you to query previous versions of your data. For more information, see [AT](./03-query-at.md).
102102

103103
## WHERE Clause
104104

@@ -143,7 +143,7 @@ SELECT number%2 as c1, number%3 as c2, MAX(number) FROM numbers(10000) GROUP BY
143143
```
144144

145145

146-
`GROUP BY` can be extended with [GROUPING SETS](./21-grouping-sets.md) to do more complex grouping operations.
146+
`GROUP BY` can be extended with [GROUPING SETS](./07-query-group-by-grouping-sets.md) to do more complex grouping operations.
147147

148148
## HAVING Clause
149149

docs/doc/14-sql-commands/20-query-syntax/02-dml-with.md renamed to docs/doc/14-sql-commands/20-query-syntax/02-query-with.md

File renamed without changes.

docs/doc/14-sql-commands/20-query-syntax/03-dml-at.md renamed to docs/doc/14-sql-commands/20-query-syntax/03-query-at.md

File renamed without changes.

docs/doc/14-sql-commands/20-query-syntax/04-dml-join.md renamed to docs/doc/14-sql-commands/20-query-syntax/04-query-join.md

File renamed without changes.
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: GROUP BY
3+
---
4+
5+
The GROUP BY clause in Databend SQL allows you to group rows sharing the same group-by-item expressions and apply aggregate functions to the resulting groups. A group-by-item expression can be a column name, a number referencing a position in the [SELECT](./01-query-select.md) list, or a general expression.
6+
7+
Extensions include [GROUP BY CUBE](./08-query-group-by-cube.md), [GROUP BY GROUPING SETS](./07-query-group-by-grouping-sets.md), and [GROUP BY ROLLUP](./09-query-group-by-rollup.md).
8+
9+
## Syntax
10+
11+
```sql
12+
SELECT ...
13+
FROM ...
14+
[ ... ]
15+
GROUP BY groupItem [ , groupItem [ , ... ] ]
16+
[ ... ]
17+
```
18+
19+
Where:
20+
```sql
21+
groupItem ::= { <column_alias> | <position> | <expr> }
22+
```
23+
24+
- `<column_alias>`: Column alias appearing in the query block’s SELECT list
25+
26+
- `<position>`: Position of an expression in the SELECT list
27+
28+
- `<expr>`: Any expression on tables in the current scope
29+
30+
31+
## Examples
32+
33+
Sample Data Setup:
34+
```sql
35+
-- Create a sample employees table
36+
CREATE TABLE employees (
37+
id INT,
38+
first_name VARCHAR(50),
39+
last_name VARCHAR(50),
40+
department_id INT,
41+
job_id INT,
42+
hire_date DATE
43+
);
44+
45+
-- Insert sample data into the employees table
46+
INSERT INTO employees (id, first_name, last_name, department_id, job_id, hire_date)
47+
VALUES (1, 'John', 'Doe', 1, 101, '2021-01-15'),
48+
(2, 'Jane', 'Smith', 1, 101, '2021-02-20'),
49+
(3, 'Alice', 'Johnson', 1, 102, '2021-03-10'),
50+
(4, 'Bob', 'Brown', 2, 201, '2021-03-15'),
51+
(5, 'Charlie', 'Miller', 2, 202, '2021-04-10'),
52+
(6, 'Eve', 'Davis', 2, 202, '2021-04-15');
53+
```
54+
55+
### Group By One Column
56+
57+
This query groups employees by their `department_id` and counts the number of employees in each department:
58+
```sql
59+
SELECT department_id, COUNT(*) AS num_employees
60+
FROM employees
61+
GROUP BY department_id;
62+
```
63+
64+
Output:
65+
```sql
66+
+---------------+---------------+
67+
| department_id | num_employees |
68+
+---------------+---------------+
69+
| 1 | 3 |
70+
| 2 | 3 |
71+
+---------------+---------------+
72+
```
73+
74+
### Group By Multiple Columns
75+
76+
This query groups employees by `department_id` and `job_id`, then counts the number of employees in each group:
77+
```sql
78+
SELECT department_id, job_id, COUNT(*) AS num_employees
79+
FROM employees
80+
GROUP BY department_id, job_id;
81+
```
82+
83+
Output:
84+
```sql
85+
+---------------+--------+---------------+
86+
| department_id | job_id | num_employees |
87+
+---------------+--------+---------------+
88+
| 1 | 101 | 2 |
89+
| 1 | 102 | 1 |
90+
| 2 | 201 | 1 |
91+
| 2 | 202 | 2 |
92+
+---------------+--------+---------------+
93+
```
94+
95+
### Group By Position
96+
97+
This query is equivalent to the "Group By One Column" example above. The position 1 refers to the first item in the SELECT list, which is `department_id`:
98+
```sql
99+
SELECT department_id, COUNT(*) AS num_employees
100+
FROM employees
101+
GROUP BY 1;
102+
```
103+
104+
Output:
105+
```sql
106+
+---------------+---------------+
107+
| department_id | num_employees |
108+
+---------------+---------------+
109+
| 1 | 3 |
110+
| 2 | 3 |
111+
+---------------+---------------+
112+
```
113+
114+
115+
### Group By Expression
116+
117+
This query groups employees by the year they were hired and counts the number of employees hired in each year:
118+
```sql
119+
SELECT EXTRACT(YEAR FROM hire_date) AS hire_year, COUNT(*) AS num_hires
120+
FROM employees
121+
GROUP BY EXTRACT(YEAR FROM hire_date);
122+
```
123+
124+
Output:
125+
```sql
126+
+-----------+-----------+
127+
| hire_year | num_hires |
128+
+-----------+-----------+
129+
| 2021 | 6 |
130+
+-----------+-----------+
131+
```
132+
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: GROUP BY GROUPING SETS
3+
---
4+
5+
`GROUP BY GROUPING SETS` is a powerful extension of the [GROUP BY](./06-query-group-by.md) clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.
6+
7+
`GROUP BY GROUPING SETS` is equivalent to the UNION of two or more GROUP BY operations in the same result set:
8+
9+
- `GROUP BY GROUPING SETS((a))` is equivalent to the single grouping set operation `GROUP BY a`.
10+
11+
- `GROUP BY GROUPING SETS((a),(b))` is equivalent to `GROUP BY a UNION ALL GROUP BY b`.
12+
13+
## Syntax
14+
15+
```sql
16+
SELECT ...
17+
FROM ...
18+
[ ... ]
19+
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
20+
[ ... ]
21+
```
22+
23+
Where:
24+
```sql
25+
groupSet ::= { <column_alias> | <position> | <expr> }
26+
```
27+
28+
- `<column_alias>`: Column alias appearing in the query block’s SELECT list
29+
30+
- `<position>`: Position of an expression in the SELECT list
31+
32+
- `<expr>`: Any expression on tables in the current scope
33+
34+
35+
## Examples
36+
37+
Sample Data Setup:
38+
```sql
39+
-- Create a sample sales table
40+
CREATE TABLE sales (
41+
id INT,
42+
sale_date DATE,
43+
product_id INT,
44+
store_id INT,
45+
quantity INT
46+
);
47+
48+
-- Insert sample data into the sales table
49+
INSERT INTO sales (id, sale_date, product_id, store_id, quantity)
50+
VALUES (1, '2021-01-01', 101, 1, 5),
51+
(2, '2021-01-01', 102, 1, 10),
52+
(3, '2021-01-01', 101, 2, 15),
53+
(4, '2021-01-02', 102, 1, 8),
54+
(5, '2021-01-02', 101, 2, 12),
55+
(6, '2021-01-02', 103, 2, 20);
56+
```
57+
58+
### GROUP BY GROUPING SETS with column aliases
59+
60+
```sql
61+
SELECT product_id AS pid,
62+
store_id AS sid,
63+
SUM(quantity) AS total_quantity
64+
FROM sales
65+
GROUP BY GROUPING SETS((pid), (sid));
66+
```
67+
68+
This query is equivalent to:
69+
70+
```sql
71+
SELECT product_id AS pid,
72+
NULL AS sid,
73+
SUM(quantity) AS total_quantity
74+
FROM sales
75+
GROUP BY pid
76+
UNION ALL
77+
SELECT NULL AS pid,
78+
store_id AS sid,
79+
SUM(quantity) AS total_quantity
80+
FROM sales
81+
GROUP BY sid;
82+
```
83+
84+
Output:
85+
```sql
86+
+------+------+----------------+
87+
| pid | sid | total_quantity |
88+
+------+------+----------------+
89+
| 102 | NULL | 18 |
90+
| NULL | 2 | 47 |
91+
| 101 | NULL | 32 |
92+
| 103 | NULL | 20 |
93+
| NULL | 1 | 23 |
94+
+------+------+----------------+
95+
```
96+
97+
### GROUP BY GROUPING SETS with positions
98+
99+
```sql
100+
SELECT product_id,
101+
store_id,
102+
SUM(quantity) AS total_quantity
103+
FROM sales
104+
GROUP BY GROUPING SETS((1), (2));
105+
```
106+
107+
This query is equivalent to:
108+
109+
```sql
110+
SELECT product_id,
111+
NULL AS store_id,
112+
SUM(quantity) AS total_quantity
113+
FROM sales
114+
GROUP BY product_id
115+
UNION ALL
116+
SELECT NULL AS product_id,
117+
store_id,
118+
SUM(quantity) AS total_quantity
119+
FROM sales
120+
GROUP BY store_id;
121+
```
122+
123+
Output:
124+
```sql
125+
+------------+----------+----------------+
126+
| product_id | store_id | total_quantity |
127+
+------------+----------+----------------+
128+
| 102 | NULL | 18 |
129+
| NULL | 2 | 47 |
130+
| 101 | NULL | 32 |
131+
| 103 | NULL | 20 |
132+
| NULL | 1 | 23 |
133+
+------------+----------+----------------+
134+
```
135+
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
title: GROUP BY CUBE
3+
---
4+
5+
`GROUP BY CUBE` is an extension of the [GROUP BY](./06-query-group-by.md) clause similar to [GROUP BY ROLLUP](./09-query-group-by-rollup.md). In addition to producing all the rows of a `GROUP BY ROLLUP`, `GROUP BY CUBE` adds all the "cross-tabulations" rows. Sub-total rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
6+
7+
A `CUBE` grouping is equivalent to a series of grouping sets and is essentially a shorter specification. The N elements of a CUBE specification correspond to `2^N GROUPING SETS`.
8+
9+
## Syntax
10+
11+
```sql
12+
SELECT ...
13+
FROM ...
14+
[ ... ]
15+
GROUP BY CUBE ( groupCube [ , groupCube [ , ... ] ] )
16+
[ ... ]
17+
```
18+
19+
Where:
20+
```sql
21+
groupCube ::= { <column_alias> | <position> | <expr> }
22+
```
23+
24+
- `<column_alias>`: Column alias appearing in the query block’s SELECT list
25+
26+
- `<position>`: Position of an expression in the SELECT list
27+
28+
- `<expr>`: Any expression on tables in the current scope
29+
30+
31+
## Examples
32+
33+
Let's assume we have a sales_data table with the following schema and sample data:
34+
35+
```sql
36+
CREATE TABLE sales_data (
37+
region VARCHAR(255),
38+
product VARCHAR(255),
39+
sales_amount INT
40+
);
41+
42+
INSERT INTO sales_data (region, product, sales_amount) VALUES
43+
('North', 'WidgetA', 200),
44+
('North', 'WidgetB', 300),
45+
('South', 'WidgetA', 400),
46+
('South', 'WidgetB', 100),
47+
('West', 'WidgetA', 300),
48+
('West', 'WidgetB', 200);
49+
```
50+
51+
Now, let's use the `GROUP BY CUBE` clause to get the total sales amount for each region and product, along with all possible aggregations:
52+
53+
```sql
54+
SELECT region, product, SUM(sales_amount) AS total_sales
55+
FROM sales_data
56+
GROUP BY CUBE (region, product);
57+
```
58+
59+
The result will be:
60+
```sql
61+
+--------+---------+-------------+
62+
| region | product | total_sales |
63+
+--------+---------+-------------+
64+
| South | NULL | 500 |
65+
| NULL | WidgetB | 600 |
66+
| West | NULL | 500 |
67+
| North | NULL | 500 |
68+
| West | WidgetB | 200 |
69+
| NULL | NULL | 1500 |
70+
| North | WidgetB | 300 |
71+
| South | WidgetA | 400 |
72+
| North | WidgetA | 200 |
73+
| NULL | WidgetA | 900 |
74+
| West | WidgetA | 300 |
75+
| South | WidgetB | 100 |
76+
+--------+---------+-------------+
77+
```

0 commit comments

Comments
 (0)