Skip to content

d3.stack for tidy data? #158

@mbostock

Description

@mbostock

d3.stack is designed to work with non-tidy data where each row corresponds to a “group” (the set of observations for all layers, e.g., year) with properties for each “layer” a.k.a. series (e.g., format) recording the observed value (e.g., revenue).

Year 8 - Track Cassette Cassette Single
1973 2699600000 419600000 0
1974 2730600000 433600000 0

In the tidy format, in contrast, rows correspond to observations and columns correspond to variables. (This is less efficient as the layer names are repeated, but oh well.)

Year Format Revenue
1973 8 - Track 2699600000
1973 Cassette 419600000
1973 Cassette Single 0
1974 8 - Track 2730600000
1974 Cassette 433600000
1974 Cassette Single 0

It’s possible to use tidy data with d3.stack, but it’s a little convoluted.

series = d3.stack()
    .keys(d3.group(data, d => d.name).keys())
    .value((group, key) => group.get(key).value)
    .order(d3.stackOrderReverse)
  (d3.rollup(data, ([d]) => d, d => d.year, d => d.name).values())
    .map(s => (s.forEach(d => d.data = d.data.get(s.key)), s))

It’d be nice if were more convenient to give d3.stack tidy data, say like so:

series = d3.stack()
    .key(d => [d.name, d.year])
    .value(d => d.value)
    .order(d3.stackOrderReverse)
  (data)

Here the key accessor would return a two-part key: the layer key and the group key. And the value accessor wouldn’t need to know the current keys. (Because the data is tidy, the value accessor is the same for all observations.)

An implication of the proposed design is that the data can be sparse: some layers may be missing observations for some groups (and equivalently vice versa). That’s not possible with the current design because the layer keys (stack.keys) and group keys (data) are specified as separate arrays, but it should be easy enough for d3.stack to compute the union of layer keys and the union of group keys to fill in the missing data. d3.stack probably will also need some facility for ordering the group keys, as the order may not be consistent across layers.

I imagine it’ll be difficult to make this backwards-compatible, but maybe it’s possible, or maybe it could be under a new name such as d3.stackTidy.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions