Skip to content

Commit 56ad116

Browse files
docs: DOC-913 groovy group agg redo (#7329)
Co-authored-by: elijahpetty <[email protected]>
1 parent 3335e8e commit 56ad116

File tree

60 files changed

+1056
-926
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1056
-926
lines changed
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading

docs/groovy/how-to-guides/combined-aggregations.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Aggregators are applied to data by the [`aggBy`](../reference/table-operations/g
1818

1919
The general syntax follows:
2020

21-
```groovy skip-test
21+
```groovy syntax
2222
import static io.deephaven.api.agg.Aggregation.AggAvg
2323
import static io.deephaven.api.agg.Aggregation.AggLast
2424
@@ -27,7 +27,7 @@ agg_list = [
2727
AggLast("inputColumn = outputColumn") // second aggregation
2828
]
2929
30-
result = source.aggBy(agg_list, groupingColumns...) // apply the aggregations to data .aggBy
30+
result = source.aggBy(agg_list, groupingColumns...) // apply the aggregations to data
3131
```
3232

3333
## What aggregations are available?
@@ -49,12 +49,12 @@ A number of built-in aggregations are available:
4949
- [`AggMin`](../reference/table-operations/group-and-aggregate/AggMin.md) - Minimum value for each group.
5050
- [`AggPartition`](../reference/table-operations/group-and-aggregate/AggPartition.md) - Creates partition for the aggregation group.
5151
- [`AggPct`](../reference/table-operations/group-and-aggregate/AggPct.md) - Percentile of values for each group.
52-
- [`AggSortedFirst`](../reference/table-operations/group-and-aggregate/AggSortedFirst.md) - First value of each column within an aggregation group, sorted.
53-
- [`AggSortedLast`](../reference/table-operations/group-and-aggregate/AggSortedLast.md) - Last value of each column within an aggregation group, sorted.
54-
- [`AggStd`](../reference/table-operations/group-and-aggregate/AggStd.md) - Standard deviation for each group.
52+
- [`AggSortedFirst`](../reference/table-operations/group-and-aggregate/AggSortedFirst.md) - Sorts in ascending order, then computes the first value for each group.
53+
- [`AggSortedLast`](../reference/table-operations/group-and-aggregate/AggSortedLast.md) - Sorts in descending order, then computes the last value for each group.
54+
- [`AggStd`](../reference/table-operations/group-and-aggregate/AggStd.md) - Sample standard deviation for each group.
5555
- [`AggSum`](../reference/table-operations/group-and-aggregate/AggSum.md) - Sum of values for each group.
5656
- [`AggUnique`](../reference/table-operations/group-and-aggregate/AggUnique.md) - Returns one single value for a column, or a default.
57-
- [`AggVar`](../reference/table-operations/group-and-aggregate/AggVar.md) - Variance for each group.
57+
- [`AggVar`](../reference/table-operations/group-and-aggregate/AggVar.md) - Sample variance for each group.
5858
- [`AggWAvg`](../reference/table-operations/group-and-aggregate/AggWAvg.md) - Weighted average for each group.
5959
- [`AggWSum`](../reference/table-operations/group-and-aggregate/AggWSum.md) - Weighted sum for each group.
6060

docs/groovy/how-to-guides/dedicated-aggregations.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
11
---
2-
title: Perform dedicated aggregations for groups
3-
sidebar_label: Dedicated aggregations
2+
title: Single aggregation
43
---
54

6-
<!--TODO: will be retitled "Single Aggregation"-->
7-
8-
This guide will show you how to programmatically compute summary information on groups of data using dedicated data aggregations.
5+
This guide will show you how to compute summary information on groups of data using dedicated data aggregations.
96

107
Often when working with data, you will want to break the data into subgroups and then perform calculations on the grouped data. For example, a large multi-national corporation may want to know their average employee salary by country, or a teacher might want to calculate grade information for groups of students or in certain subject areas.
118

@@ -17,29 +14,37 @@ Deephaven provides many dedicated aggregations, such as [`maxBy`](../reference/t
1714

1815
The general syntax follows:
1916

17+
```groovy skip-test
18+
result = source.DEDICATED_AGG(columnNames)
19+
```
20+
2021
The `columnNames` parameter determines the column(s) by which to group data.
2122

22-
- `NULL` uses the whole table as a single group
23+
- `DEDICATED_AGG` should be substituted with one of the chosen aggregations below
24+
- `NULL` uses the whole table as a single group.
2325
- `"X"` will output the desired value for each group in column `X`.
2426
- `"X", "Y"` will output the desired value for each group designated from the `X` and `Y` columns.
2527

2628
## Single aggregators
2729

2830
Each dedicated aggregator performs one calculation at a time:
2931

32+
- [`absSumBy`](../reference/table-operations/group-and-aggregate/absSumBy.md) - Sum of absolute values of each group.
3033
- [`avgBy`](../reference/table-operations/group-and-aggregate/avgBy.md) - Average (mean) of each group.
3134
- [`countBy`](../reference/table-operations/group-and-aggregate/countBy.md) - Number of rows in each group.
3235
- [`firstBy`](../reference/table-operations/group-and-aggregate/firstBy.md) - First row of each group.
33-
- [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) - Array of values in each group.
36+
- [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) - Group column content into vectors.
3437
- [`headBy`](../reference/table-operations/group-and-aggregate/headBy.md) - First `n` rows of each group.
3538
- [`lastBy`](../reference/table-operations/group-and-aggregate/lastBy.md) - Last row of each group.
3639
- [`maxBy`](../reference/table-operations/group-and-aggregate/maxBy.md) - Maximum value of each group.
3740
- [`medianBy`](../reference/table-operations/group-and-aggregate/medianBy.md) - Median of each group.
3841
- [`minBy`](../reference/table-operations/group-and-aggregate/minBy.md) - Minimum value of each group.
39-
- [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md) - Standard deviation of each group.
42+
- [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md) - Sample standard deviation of each group.
4043
- [`sumBy`](../reference/table-operations/group-and-aggregate/sumBy.md) - Sum of each group.
4144
- [`tailBy`](../reference/table-operations/group-and-aggregate/tailBy.md) - Last `n` rows of each group.
42-
- [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md) - Variance of each group.
45+
- [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md) - Sample variance of each group.
46+
- [`weightedAvgBy`](../reference/table-operations/group-and-aggregate/wavgBy.md) - Weighted average of each group.
47+
- [`weightedSumBy`](../reference/table-operations/group-and-aggregate/wsumBy.md) - Weighted sum of each group.
4348

4449
In the following examples, we have test results in various subjects for some students. We want to summarize this information to see if students perform better in one class or another.
4550

@@ -189,15 +194,15 @@ mean = source.dropColumns("Subject").avgBy("Name")
189194

190195
### `stdBy`
191196

192-
In this example, [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md) calculates the standard deviation of test scores for each `Name`. Because a standard deviation cannot be computed for the string column `Subject`, this column is dropped before applying [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md).
197+
In this example, [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md) calculates the sample standard deviation of test scores for each `Name`. Because a sample standard deviation cannot be computed for the string column `Subject`, this column is dropped before applying [`stdBy`](../reference/table-operations/group-and-aggregate/stdBy.md).
193198

194199
```groovy test-set=1
195200
stdDev = source.dropColumns("Subject").stdBy("Name")
196201
```
197202

198203
### `varBy`
199204

200-
In this example, [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md) calculates the variance of test scores for each `Name`. Because a variance cannot be computed for the string column `Subject`, this column is dropped before applying [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md).
205+
In this example, [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md) calculates the sample variance of test scores for each `Name`. Because sample variance cannot be computed for the string column `Subject`, this column is dropped before applying [`varBy`](../reference/table-operations/group-and-aggregate/varBy.md).
201206

202207
```groovy test-set=1
203208
var = source.dropColumns("Subject").varBy("Name")

docs/groovy/how-to-guides/formulas.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ In joins, formulas are used to define the columns to join the data on, and in ag
2323
- Aggregations
2424
- [Dedicated](./dedicated-aggregations.md)
2525
- [Combined](./combined-aggregations.md)
26-
- [Rolling](./rolling-calculations.md)
26+
- [Rolling](./rolling-aggregations.md)
2727

2828
Additionally, formulas can be used in [partitioned table](./partitioned-tables.md) operations.
2929

docs/groovy/how-to-guides/grouping-data.md

Lines changed: 44 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ apples = newTable(
1717
)
1818
```
1919

20-
## Group data with `groupBy`
20+
## `groupBy`
2121

22-
[`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) groups columnar data into [arrays](../reference/query-language/types/arrays.md). A list of grouping column names defines grouping keys. All rows from the input table with the same key values are grouped together.
22+
The [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) method groups columnar data into [arrays](../reference/query-language/types/arrays.md). A list of grouping column names defines grouping keys. All rows from the input table with the same key values are grouped together. The values in the arrays for each group in the output table maintain their order from the input table.
2323

2424
If no input is supplied to [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md), then there will be one group, which contains all of the data. The resultant table will contain a single row, where column data is grouped into a single [array](../reference/query-language/types/arrays.md). This is shown in the example below:
2525

@@ -48,9 +48,28 @@ applesByClassAndDiet = apples.updateView(
4848
.groupBy("Class", "Diet")
4949
```
5050

51+
## `AggGroup`
52+
53+
The [`AggGroup`](../reference/table-operations/group-and-aggregate/AggGroup.md) method returns an aggregator that computes an array of all values within an aggregation group, for each column. Like the other aggregation methods, it is used in conjunction with the [`aggBy`](../reference/table-operations/group-and-aggregate/aggBy.md) method.
54+
55+
> [!NOTE]
56+
> Unlike [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md), [`AggGroup`](../reference/table-operations/group-and-aggregate/AggGroup.md) throws an error if you don't supply any column names.
57+
58+
In this example, we will group `Color`, `WeightGrams`, and `Calories` by `Type`:
59+
60+
```groovy test-set=1 order=applesByType
61+
applesByType = apples.aggBy(AggGroup("WeightGrams", "Calories", "Color"), "Type")
62+
```
63+
64+
If the `by` parameter is not supplied, the `AggGroup` method will group all the values from each column:
65+
66+
```groovy test-set=1 order=applesByNoColumn2
67+
applesByNoColumn2 = apples.aggBy(AggGroup("Type", "Color", "WeightGrams", "Calories"))
68+
```
69+
5170
## Ungroup data with `ungroup`
5271

53-
The [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) method is the reverse of [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md). It expands content from [arrays](../reference/query-language/types/arrays.md) or vectors and builds a new set of rows from it. The method takes optional columns as input. If no inputs are supplied, all [array](../reference/query-language/types/arrays.md) or vector columns are expanded. If one or more columns are given as input, only those columns will have their [array](../reference/query-language/types/arrays.md) values expanded into new rows.
72+
The [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) method is the opposite of [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md). It expands content from [arrays](../reference/query-language/types/arrays.md) or vectors into columns of singular values and builds a new set of rows from it. The method takes optional columns as input. If no inputs are supplied, all [array](../reference/query-language/types/arrays.md) or vector columns are expanded. If one or more columns are given as input, only those columns will have their [array](../reference/query-language/types/arrays.md) values expanded into new rows.
5473

5574
The example below shows how [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) reverses the [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) operation used to create `applesByClassAndDiet` when no columns are given as input. Notice how all [array](../reference/query-language/types/arrays.md) columns have been expanded, leaving a single element in each row of the resultant table:
5675

@@ -85,19 +104,23 @@ t = newTable(
85104
t_ungrouped = t.ungroup()
86105
```
87106

88-
## Different array lengths
107+
## Handling different array lengths
89108

90109
The [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) method cannot unpack a row that contains [arrays](../reference/query-language/types/arrays.md) of different length.
91110

92-
The example below uses the [`emptyTable`](../reference/table-operations/create/emptyTable.md) method to create a table with two columns and one row. Each column contains a Java array, but one has three elements and the other has two. Calling [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) without an input column will result in an error.
111+
To demonstrate this, we'll start by creating a table with two columns and one row.
93112

94-
```groovy skip-test
113+
```groovy test-set=2 order=t
95114
t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5}")
96-
t_ungrouped = t.ungroup() // This results in an error
97115
```
98116

99-
![The above table with a different array length in each column](../assets/how-to/t_diffArrayLengths.png)
100-
![The error message generated by Deephaven upon running the above `t_ungrouped`](../assets/how-to/t_ungrouped_Error.png)
117+
Each column in the above table contains a Java array, but one has three elements and the other has two. Since the arrays are not the same size, calling [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) without an input column will result in an error.
118+
119+
```groovy test-set=2 should-fail
120+
t_ungrouped = t.ungroup() // This results in an error
121+
```
122+
123+
![A collapsed error message highlighted in the Deephaven IDE](../assets/how-to/t_ungrouped_Error.png)
101124

102125
It is only possible to ungroup columns of the same length. [Arrays](../reference/query-language/types/arrays.md) of different lengths must be ungrouped separately.
103126

@@ -109,7 +132,7 @@ t_ungroupedByZ = t.ungroup("Z")
109132

110133
## Null values
111134

112-
Using [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) on a table with null values will work properly. Null values will appear as empty [array](../reference/query-language/types/arrays.md) elements when grouped with [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md). Null [array](../reference/query-language/types/arrays.md) elements unwrapped using [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) will appear as null (empty) row entries in the corresponding column.
135+
Using [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md) on a table with null values will work properly. Null values will appear as empty [array](../reference/query-language/types/arrays.md) elements when grouped with [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md). Null [array](../reference/query-language/types/arrays.md) elements expanded using [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md) will appear as null (empty) row entries in the corresponding column.
113136

114137
The example below uses the [`emptyTable`](../reference/table-operations/create/emptyTable.md) method and the [ternary operator](../how-to-guides/ternary-if-how-to.md) to create a table with two columns of 5 rows. The first and second rows contain null values. Null values behave as expected during grouping and ungrouping.
115138

@@ -126,13 +149,21 @@ t = emptyTable(1).update("X = (int[])(null)")
126149
t_ungrouped = t.ungroup()
127150
```
128151

152+
## Use of grouping in table operations
153+
154+
Many Deephaven table operations use grouping internally. For example, [`aggBy`](../reference/table-operations/group-and-aggregate/aggBy.md) creates groups specified by the key column(s) given in the `by` parameter. The grouping is done automatically, and the resultant table shows summary statistics calculated for each group.
155+
156+
Table operations that require grouping do the grouping internally. It is always more performant to use these table operations than to group data first and then apply some calculations over the groups.
157+
129158
## Related documentation
130159

131-
- [Create new and empty tables](./new-and-empty-table.md)
132-
- [Choose the right selection method](../how-to-guides/use-select-view-update.md#choose-the-right-column-selection-method)
160+
- [Create a new table](./new-and-empty-table.md#newtable)
161+
- [Choose the right selection method](./use-select-view-update.md#choose-the-right-column-selection-method)
162+
- [Formulas in query strings](./formulas.md)
163+
- [Filters in query strings](./filters.md)
164+
- [Operators in query strings](./operators.md)
133165
- [Arrays](../reference/query-language/types/arrays.md)
134166
- [`emptyTable`](../reference/table-operations/create/emptyTable.md)
135167
- [`groupBy`](../reference/table-operations/group-and-aggregate/groupBy.md)
136168
- [`newTable`](../reference/table-operations/create/newTable.md)
137-
- [ternary-if](../how-to-guides/ternary-if-how-to.md)
138169
- [`ungroup`](../reference/table-operations/group-and-aggregate/ungroup.md)

0 commit comments

Comments
 (0)