Skip to content

Commit 38d0c8a

Browse files
committed
Update README
1 parent ff6a856 commit 38d0c8a

File tree

1 file changed

+58
-31
lines changed

1 file changed

+58
-31
lines changed

README.md

Lines changed: 58 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1049,55 +1049,82 @@ Bins on *y*. Groups on on *x* and first of *z*, *fill*, or *stroke*, if any.
10491049

10501050
[<img src="./img/group.png" width="320" height="198" alt="a histogram of penguins by species">](https://observablehq.com/@data-workflows/plot-group)
10511051

1052-
[Source](./src/transforms/group.js) · [Examples](https://observablehq.com/@data-workflows/plot-group)
1052+
[Source](./src/transforms/group.js) · [Examples](https://observablehq.com/@data-workflows/plot-group) · Aggregates ordinal or categorical data — such as names — into groups and then computes summary statistics for each group such as a count or sum. The group transform is like a discrete [bin transform](#bin). There are separate transforms depending on which dimensions need grouping: [Plot.groupZ](#plotgroupzoutputs-options) for *z*; [Plot.groupX](#plotgroupxoutputs-options) for *x* and *z*; [Plot.groupY](#plotgroupyoutputs-options) for *y* and *z*; and [Plot.group](#plotgroupoutputs-options) for *x*, *y*, and *z*.
10531053

1054-
The group transforms take two arguments: *outputs* and *inputs*. The input data is grouped on one or several input channels (for example on *x*), and a new data array is created for each group. Each property set in the *outputs* object creates an aggregation channel, that receives as input the groups, and reduces them to a value for each group. A value channel is defined for each aggregation channel, for example *y* when grouping on *x*.
1054+
Given input *data* = [*d₀*, *d₁*, *d₂*, …], by default the resulting grouped data is an array of arrays where each inner array is a subset of the input data [[*d₀₀*, *d₀₁*, …], [*d₁₀*, *d₁₁*, …], [*d₂₀*, *d₂₁*, …], …]. Each inner array is in input order. The outer array is in natural ascending order according to the associated dimension (*x* then *y*). Empty groups are skipped. By specifying a different aggregation method for the *data* output, as described below, you can change how the grouped data is computed.
10551055

1056-
Supported reducers:
1056+
While it is possible to compute channel values on the grouped data by defining channel values as a function, more commonly channel values are computed directly by the group transform, either implicitly or explicitly. In addition to data, the following channels are automatically aggregated:
10571057

1058-
* *first* - first element of the group, in input order
1059-
* *last* - last element of the group, in input order
1060-
* *count* - number of elements in the group
1061-
* *sum* - sum of the values of the elements in the group; defaults to* the *count* if the value channel is not defined
1062-
* *proportion* - *sum* of the group divided by the total *sum* of all groups
1063-
* *proportion-facet* - *sum* of the group divided by the total *sum* of groups in the current facet
1064-
* *deviation* - standard deviation of the values in the group
1065-
* *min* - minimum of the values in the group
1066-
* *max* - maximum of the values in the group
1067-
* *mean* - mean of the values in the group
1068-
* *median* - median of the values in the group
1069-
* *variance* - variance of the values in the group
1058+
* **x** - the horizontal position of the group
1059+
* **y** - the vertical position of the group
1060+
* **z** - the first value of the *z* channel, if any
1061+
* **fill** - the first value of the *fill* channel, if any
1062+
* **stroke** - the first value of the *stroke* channel, if any
10701063

1071-
#### Plot.group(*outputs*, *options*)
1064+
The **x** output channel is only computed by the Plot.groupX and Plot.group transform; similarly the **y** output channel is only computed by the Plot.groupY and Plot.group transform.
10721065

1073-
Groups on *x*, *y*, and the first of *z*, *fill*, or *stroke*, if any. The value channel is the input with the same name as the aggregation channel.
1066+
You can declare additional channels to aggregate by specifying the channel name and desired aggregation method in the *outputs* object which is the first argument to the transform. For example, to use [Plot.groupX](#plotgroupxoutputs-options) to generate a **y** channel of group counts as in a frequency histogram:
10741067

10751068
```js
1076-
Plot.group({fill: "count"}, {
1077-
x: "island",
1078-
y: "species"
1079-
})
1069+
Plot.groupX({y: "count"}, {x: "species"})
10801070
```
10811071

1072+
The following aggregation methods are supported:
1073+
1074+
* *first* - the first value, in input order
1075+
* *last* - the last value, in input order
1076+
* *count* - the number of elements (frequency)
1077+
* *sum* - the sum of values
1078+
* *proportion* - the sum proportional to the overall total (weighted frequency)
1079+
* *proportion-facet* - the sum proportional to the facet total
1080+
* *deviation* - the standard deviation
1081+
* *min* - the minimum value
1082+
* *max* - the maximum value
1083+
* *mean* - the mean value (average)
1084+
* *median* - the median value
1085+
* *variance* - the variance per [Welford’s algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)
1086+
* a function - passed the array of values for each group
1087+
* an object with a *reduce* method - passed the index for each group, and all values
1088+
1089+
Most aggregation methods require binding the output channel to an input channel; for example, if you want the **y** output channel to be a *sum* (not merely a count), there should be a corresponding **y** input channel specifying which values to sum. If there is not, *sum* will be equivalent to *count*.
1090+
10821091
```js
1083-
Plot.group({fill: "max"}, {
1084-
x: d => d.date.getUTCDate(),
1085-
y: d => d.date.getUTCMonth(),
1086-
fill: "temp_max"
1087-
})
1092+
Plot.groupX({y: "sum"}, {x: "species", y: "body_mass_g"})
1093+
```
1094+
1095+
If any of **z**, **fill**, or **stroke** is a channel, the first of these channels is considered the *z* dimension and will be used to subdivide groups.
1096+
1097+
#### Plot.group(*outputs*, *options*)
1098+
1099+
```js
1100+
Plot.group({fill: "count"}, {x: "island", y: "species"})
10881101
```
10891102

1103+
Groups on *x*, *y*, and the first of *z*, *fill*, or *stroke*, if any.
1104+
10901105
#### Plot.groupX(*outputs*, *options*)
10911106

1092-
Groups on *x* and the first of *z*, *fill*, or *stroke*, if any. The value channel is *y*.
1107+
```js
1108+
Plot.groupX({y: "sum"}, {x: "species", y: "body_mass_g"})
1109+
```
1110+
1111+
Groups on *x* and the first of *z*, *fill*, or *stroke*, if any.
10931112

10941113
#### Plot.groupY(*outputs*, *options*)
10951114

1096-
Groups on *y* and the first of *z*, *fill*, or *stroke*, if any. The value channel is *x*.
1115+
```js
1116+
Plot.groupY({x: "sum"}, {y: "species", x: "body_mass_g"})
1117+
```
1118+
1119+
Groups on *y* and the first of *z*, *fill*, or *stroke*, if any.
10971120

10981121
#### Plot.groupZ(*outputs*, *options*)
10991122

1100-
Groups on the first of *z*, *fill*, or *stroke*, if any; if none of *z*, *fill*, or *stroke* are channels, then all data (within each facet) is placed into a single group. The value channel is the input with the same name as the aggregation channel.
1123+
```js
1124+
Plot.groupZ({x: "proportion"}, {fill: "species"})
1125+
```
1126+
1127+
Groups on the first of *z*, *fill*, or *stroke*, if any. If none of *z*, *fill*, or *stroke* are channels, then all data (within each facet) is placed into a single group.
11011128

11021129
### Map
11031130

@@ -1284,7 +1311,7 @@ Plot.formatIsoDate(new Date(Date.UTC(2020, 0, 1, 12, 23))) // "2020-01-01T12:23Z
12841311

12851312
#### Plot.formatWeekday(*locale*, *format*)
12861313

1287-
Returns a function that formats a week day (numbered from 0Sunday to 6Saturday) according to the *locale* and *format*.
1314+
Returns a function that formats a week day number (from 0 = Sunday to 6 = Saturday) according to the *locale* and *format*.
12881315
- *locale*: any valid [BCP 47 language tag](https://tools.ietf.org/html/bcp47); defaults to "en-US". Use navigator.language to respect the browser’s setting.
12891316
- *format*: any valid [weekday format](https://tc39.es/ecma402/#datetimeformat-objects), *i.e.* one of "narrow", "short", "long"; defaults to "short".
12901317

@@ -1304,7 +1331,7 @@ This function is periodic: day -1 is Saturday, and day 8 is Sunday.
13041331

13051332
#### Plot.formatMonth(*locale*, *format*)
13061333

1307-
Returns a function that formats a month (numbered from 0January to 11December) according to the *locale* and *format*.
1334+
Returns a function that formats a month number (from 0 = January to 11 = December) according to the *locale* and *format*.
13081335
- *locale*: any valid [BCP 47 language tag](https://tools.ietf.org/html/bcp47); defaults to "en-US". Use navigator.language to respect the browser’s setting.
13091336
- *format*: any valid [month format](https://tc39.es/ecma402/#datetimeformat-objects), *i.e.* one of "2-digit", "numeric", "narrow", "short", "long"; defaults to "short".
13101337

0 commit comments

Comments
 (0)