Skip to content

Commit b76c04f

Browse files
authored
[BREAKING] Refactor unstack (#2494)
1 parent f4db95f commit b76c04f

File tree

4 files changed

+444
-219
lines changed

4 files changed

+444
-219
lines changed

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@
4141
* in `describe` the specification of custom aggregation is now `function => name`;
4242
old `name => function` order is now deprecated
4343
([#2401](https://github.com/JuliaData/DataFrames.jl/pull/2401))
44+
* `unstack` now produces row and column keys in the order of their first appearance
45+
and has two new keyword arguments `allowmissing` and `allowduplicates`
46+
([#2494](https://github.com/JuliaData/DataFrames.jl/pull/2494))
4447

4548
## New functionalities
4649

docs/src/man/reshaping_and_pivoting.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,9 @@ julia> last(d, 6)
5858
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
5959
```
6060

61-
The second optional argument to `stack` indicates the columns to be stacked. These are normally referred to as the measured variables. Column names can also be given:
61+
The second optional argument to `stack` indicates the columns to be stacked.
62+
These are normally referred to as the measured variables. Column names can also
63+
be given:
6264

6365
```jldoctest reshape
6466
julia> d = stack(iris, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]);
@@ -88,11 +90,18 @@ julia> last(d, 6)
8890
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
8991
```
9092

91-
Note that all columns can be of different types. Type promotion follows the rules of `vcat`.
93+
Note that all columns can be of different types. Type promotion follows the
94+
rules of `vcat`.
9295

93-
The stacked `DataFrame` that results includes all of the columns not specified to be stacked. These are repeated for each stacked column. These are normally refered to as identifier (id) columns. In addition to the id columns, two additional columns labeled `:variable` and `:values` contain the column identifier and the stacked columns.
96+
The stacked `DataFrame` that results includes all of the columns not specified
97+
to be stacked. These are repeated for each stacked column. These are normally
98+
refered to as identifier (id) columns. In addition to the id columns, two
99+
additional columns labeled `:variable` and `:values` contain the column
100+
identifier and the stacked columns.
94101

95-
A third optional argument to `stack` represents the id columns that are repeated. This makes it easier to specify which variables you want included in the long format:
102+
A third optional argument to `stack` represents the id columns that are
103+
repeated. This makes it easier to specify which variables you want included in
104+
the long format:
96105

97106
```jldoctest reshape
98107
julia> d = stack(iris, [:SepalLength, :SepalWidth], :Species);
@@ -152,7 +161,9 @@ julia> last(d, 6)
152161
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
153162
```
154163

155-
`unstack` converts from a long format to a wide format. The default is requires specifying which columns are an id variable, column variable names, and column values:
164+
`unstack` converts from a long format to a wide format.
165+
The default is requires specifying which columns are an id variable,
166+
column variable names, and column values:
156167

157168
```jldoctest reshape
158169
julia> iris.id = 1:size(iris, 1)
@@ -267,7 +278,8 @@ julia> last(widedf, 6)
267278
│ 6 │ Iris-virginica │ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │
268279
```
269280

270-
You can even skip passing the `:variable` and `:value` values as positional arguments, as they will be used by default, and write:
281+
You can even skip passing the `:variable` and `:value` values as positional
282+
arguments, as they will be used by default, and write:
271283
```jldoctest reshape
272284
julia> widedf = unstack(longdf);
273285
@@ -296,7 +308,8 @@ julia> last(widedf, 6)
296308
│ 6 │ Iris-virginica │ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │
297309
```
298310

299-
Passing `view=true` to `stack` returns a data frame whose columns are views into the original wide data frame. Here is an example:
311+
Passing `view=true` to `stack` returns a data frame whose columns are views into
312+
the original wide data frame. Here is an example:
300313

301314
```jldoctest reshape
302315
julia> d = stack(iris, view=true);
@@ -337,7 +350,9 @@ This is provides a view of the original columns stacked together.
337350
Id columns -- `RepeatedVector`
338351
This repeats the original columns N times where N is the number of columns stacked.
339352

340-
None of these reshaping functions perform any aggregation. To do aggregation, use the split-apply-combine functions in combination with reshaping. Here is an example:
353+
None of these reshaping functions perform any aggregation. To do aggregation,
354+
use the split-apply-combine functions in combination with reshaping. Here is an
355+
example:
341356

342357
```jldoctest reshape
343358
julia> using Statistics
@@ -356,7 +371,7 @@ julia> first(d, 6)
356371
│ 5 │ Iris-setosa │ SepalLength │ 5.0 │
357372
│ 6 │ Iris-setosa │ SepalLength │ 5.4 │
358373
359-
julia> x = by(d, [:variable, :Species], :value => mean => :vsum);
374+
julia> x = combine(groupby(d, [:variable, :Species]), :value => mean => :vsum);
360375
361376
julia> first(x, 6)
362377
│ Row │ variable │ Species │ vsum │

0 commit comments

Comments
 (0)