You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Source](./src/transforms/group.js) · [Examples](https://observablehq.com/@data-workflows/plot-group) · Aggregates ordinal or categorical data — such as names — into groups and then computes summary statistics for each group such as a count or sum. The group transform is like a discrete [bin transform](#bin). There are separate transforms depending on which dimensions need grouping: [Plot.groupZ](#plotgroupzoutputs-options) for *z*; [Plot.groupX](#plotgroupxoutputs-options) for *x* and *z*; [Plot.groupY](#plotgroupyoutputs-options) for *y* and *z*; and [Plot.group](#plotgroupoutputs-options) for *x*, *y*, and *z*.
1053
1053
1054
-
The group transforms take two arguments: *outputs* and *inputs*. The input data is grouped on one or several input channels (for example on *x*), and a new data array is created for each group. Each property set in the *outputs* object creates an aggregation channel, that receives as input the groups, and reduces them to a value for each group. A value channel is defined for each aggregation channel, for example *y* when grouping on *x*.
1054
+
Given input *data* = [*d₀*, *d₁*, *d₂*, …], by default the resulting grouped data is an array of arrays where each inner array is a subset of the input data [[*d₀₀*, *d₀₁*, …], [*d₁₀*, *d₁₁*, …], [*d₂₀*, *d₂₁*, …], …]. Each inner array is in input order. The outer array is in natural ascending order according to the associated dimension (*x* then *y*). Empty groups are skipped. By specifying a different aggregation method for the *data* output, as described below, you can change how the grouped data is computed.
1055
1055
1056
-
Supported reducers:
1056
+
While it is possible to compute channel values on the grouped data by defining channel values as a function, more commonly channel values are computed directly by the group transform, either implicitly or explicitly. In addition to data, the following channels are automatically aggregated:
1057
1057
1058
-
**first* - first element of the group, in input order
1059
-
**last* - last element of the group, in input order
1060
-
**count* - number of elements in the group
1061
-
**sum* - sum of the values of the elements in the group; defaults to* the *count* if the value channel is not defined
1062
-
**proportion* - *sum* of the group divided by the total *sum* of all groups
1063
-
**proportion-facet* - *sum* of the group divided by the total *sum* of groups in the current facet
1064
-
**deviation* - standard deviation of the values in the group
1065
-
**min* - minimum of the values in the group
1066
-
**max* - maximum of the values in the group
1067
-
**mean* - mean of the values in the group
1068
-
**median* - median of the values in the group
1069
-
**variance* - variance of the values in the group
1058
+
***x** - the horizontal position of the group
1059
+
***y** - the vertical position of the group
1060
+
***z** - the first value of the *z* channel, if any
1061
+
***fill** - the first value of the *fill* channel, if any
1062
+
***stroke** - the first value of the *stroke* channel, if any
1070
1063
1071
-
#### Plot.group(*outputs*, *options*)
1064
+
The **x** output channel is only computed by the Plot.groupX and Plot.group transform; similarly the **y** output channel is only computed by the Plot.groupY and Plot.group transform.
1072
1065
1073
-
Groups on *x*, *y*, and the first of *z*, *fill*, or *stroke*, if any. The value channel is the input with the same name as the aggregation channel.
1066
+
You can declare additional channels to aggregate by specifying the channel name and desired aggregation method in the *outputs* object which is the first argument to the transform. For example, to use [Plot.groupX](#plotgroupxoutputs-options) to generate a **y** channel of group counts as in a frequency histogram:
1074
1067
1075
1068
```js
1076
-
Plot.group({fill:"count"}, {
1077
-
x:"island",
1078
-
y:"species"
1079
-
})
1069
+
Plot.groupX({y:"count"}, {x:"species"})
1080
1070
```
1081
1071
1072
+
The following aggregation methods are supported:
1073
+
1074
+
**first* - the first value, in input order
1075
+
**last* - the last value, in input order
1076
+
**count* - the number of elements (frequency)
1077
+
**sum* - the sum of values
1078
+
**proportion* - the sum proportional to the overall total (weighted frequency)
1079
+
**proportion-facet* - the sum proportional to the facet total
1080
+
**deviation* - the standard deviation
1081
+
**min* - the minimum value
1082
+
**max* - the maximum value
1083
+
**mean* - the mean value (average)
1084
+
**median* - the median value
1085
+
**variance* - the variance per [Welford’s algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)
1086
+
* a function - passed the array of values for each group
1087
+
* an object with a *reduce* method - passed the index for each group, and all values
1088
+
1089
+
Most aggregation methods require binding the output channel to an input channel; for example, if you want the **y** output channel to be a *sum* (not merely a count), there should be a corresponding **y** input channel specifying which values to sum. If there is not, *sum* will be equivalent to *count*.
If any of **z**, **fill**, or **stroke** is a channel, the first of these channels is considered the *z* dimension and will be used to subdivide groups.
Groups on *y* and the first of *z*, *fill*, or *stroke*, if any.
1097
1120
1098
1121
#### Plot.groupZ(*outputs*, *options*)
1099
1122
1100
-
Groups on the first of *z*, *fill*, or *stroke*, if any; if none of *z*, *fill*, or *stroke* are channels, then all data (within each facet) is placed into a single group. The value channel is the input with the same name as the aggregation channel.
1123
+
```js
1124
+
Plot.groupZ({x:"proportion"}, {fill:"species"})
1125
+
```
1126
+
1127
+
Groups on the first of *z*, *fill*, or *stroke*, if any. If none of *z*, *fill*, or *stroke* are channels, then all data (within each facet) is placed into a single group.
Returns a function that formats a week day (numbered from 0—Sunday to 6—Saturday) according to the *locale* and *format*.
1314
+
Returns a function that formats a week day number (from 0 = Sunday to 6 = Saturday) according to the *locale* and *format*.
1288
1315
-*locale*: any valid [BCP 47 language tag](https://tools.ietf.org/html/bcp47); defaults to "en-US". Use navigator.language to respect the browser’s setting.
1289
1316
-*format*: any valid [weekday format](https://tc39.es/ecma402/#datetimeformat-objects), *i.e.* one of "narrow", "short", "long"; defaults to "short".
1290
1317
@@ -1304,7 +1331,7 @@ This function is periodic: day -1 is Saturday, and day 8 is Sunday.
1304
1331
1305
1332
#### Plot.formatMonth(*locale*, *format*)
1306
1333
1307
-
Returns a function that formats a month (numbered from 0—January to 11—December) according to the *locale* and *format*.
1334
+
Returns a function that formats a month number (from 0 = January to 11 = December) according to the *locale* and *format*.
1308
1335
-*locale*: any valid [BCP 47 language tag](https://tools.ietf.org/html/bcp47); defaults to "en-US". Use navigator.language to respect the browser’s setting.
1309
1336
-*format*: any valid [month format](https://tc39.es/ecma402/#datetimeformat-objects), *i.e.* one of "2-digit", "numeric", "narrow", "short", "long"; defaults to "short".
0 commit comments