You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-17Lines changed: 47 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -940,9 +940,54 @@ Plot’s option transforms, listed below, do more than populate the **transform*
940
940
941
941
[<imgsrc="./img/bin.png"width="320"height="198"alt="a histogram of athletes by weight">](https://observablehq.com/@data-workflows/plot-bin)
942
942
943
-
[Source](./src/transforms/bin.js) · [Examples](https://observablehq.com/@data-workflows/plot-bin) · Aggregates continuous, quantitative data — such as temperatures or times — into discrete bins. You can then compute summary statistics for each bin, such as a count or sum. The bin transform is like a [group transform](#group)for quantitative data, and is most often used to make histograms or heatmaps.
943
+
[Source](./src/transforms/bin.js) · [Examples](https://observablehq.com/@data-workflows/plot-bin) · Aggregates continuousdata — quantitative or temporal values such as temperatures or times — into discrete bins, and then computes summary statistics for each bin such as a count or sum. The bin transform is like a continuous [group transform](#group) and is often used to make histograms.
944
944
945
-
TODO Describe how the binning dimensions and output channels are specified. Describe the resulting binned data.
945
+
There are several variants of the bin transform depending on which dimensions need binning: [Plot.binX](#plotbinxoutputs-options) for *x*; [Plot.binY](#plotbinyoutputs-options) for *y*; and [Plot.bin](#plotbinoutputs-options) for both.
946
+
947
+
Given input *data* = [*d₀*, *d₁*, *d₂*, …], by default the resulting binned data is an array of arrays where each inner array is a subset of the input data [[*d₀₀*, *d₀₁*, …], [*d₁₀*, *d₁₁*, …], [*d₂₀*, *d₂₁*, …], …]. Each inner array is in input order, while the outer array is in natural order according to the associated dimension (*x* then *y*). Empty bins are skipped. By specifying a different aggregation method for the *data* output, as described next, you can change how the binned data is computed.
948
+
949
+
While it is possible to compute channel values on the binned data by defining channel values as a function, more commonly channel values are computed by the bin transform, either implicitly or explicitly. The following channels are automatically computed by the bin transform:
950
+
951
+
***x1** - the starting horizontal position of the bin
952
+
***x2** - the ending horizontal position of the bin
953
+
***x** - the horizontal center of the bin
954
+
***y1** - the starting vertical position of the bin
955
+
***y2** - the ending vertical position of the bin
956
+
***y** - the vertical center of the bin
957
+
***z** - the first value of the *z* channel, if any
958
+
***fill** - the first value of the *fill* channel, if any
959
+
***stroke** - the first value of the *stroke* channel, if any
960
+
961
+
The **x1**, **x2**, and **x** output channels are only computed by the Plot.binX and Plot.bin transform; similarly the **y1**, **y2**, and **y** output channels are only computed by the Plot.binY and Plot.bin transform.
962
+
963
+
In addition to the automatically binned channels, you can declare additional channels to bin by specifying the desired aggregation method in the *outputs* object which is the first argument to the transform. For example, to use [Plot.binX](#plotbinxoutputs-options) to generate a **y** channel of bin counts as in a frequency histogram:
964
+
965
+
```js
966
+
Plot.binX({y:"count"}, {x:"culmen_length_mm"})
967
+
```
968
+
969
+
The following aggregation methods are supported:
970
+
971
+
**first* - the first value, in input order
972
+
**last* - the last value, in input order
973
+
**count* - the number of elements (frequency)
974
+
**sum* - the sum of values
975
+
**proportion* - the sum proportional to the overall total (weighted frequency)
976
+
**proportion-facet* - the sum proportional to the facet total
977
+
**deviation* - the standard deviation
978
+
**min* - the minimum value
979
+
**max* - the maximum value
980
+
**mean* - the mean value (average)
981
+
**median* - the median value
982
+
**variance* - the variance per [Welford’s algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)
983
+
* a function - passed the array of values for each bin
984
+
* an object with a *reduce* method - passed the index for each bin, and all values
985
+
986
+
Most aggregation methods require binding the output channel to an input channel; for example, if you want the **y** output channel to be a *sum* (not merely a count), there should be a corresponding **y** input channel specifying which values to sum. If there is not, *sum* will be equivalent to *count*.
0 commit comments