Skip to content

Commit d23077a

Browse files
authored
add Cols support (#2495)
1 parent 40c368d commit d23077a

21 files changed

+171
-21
lines changed

NEWS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@
7777
* passing empty sets of columns in `filter`/`filter!` and in `select`/`transform`/`combine`
7878
with `ByRow` is now accepted ([#2476](https://github.com/JuliaData/DataFrames.jl/pull/2476))
7979
* add `permutedims` method for `AbstractDataFrame` ([#2447](https://github.com/JuliaData/DataFrames.jl/pull/2447))
80+
* add support for `Cols` from DataAPI.jl ([#2495](https://github.com/JuliaData/DataFrames.jl/pull/2495))
8081

8182
## Deprecated
8283

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ test = ["DataStructures", "DataValues", "Dates", "Logging", "Random", "Test"]
3737
julia = "1"
3838
CategoricalArrays = "0.8.3"
3939
Compat = "3.17"
40-
DataAPI = "1.2"
40+
DataAPI = "1.3"
4141
InvertedIndices = "1"
4242
IteratorInterfaceExtensions = "0.1.1, 1"
4343
Missings = "0.4.2"

docs/src/lib/indexing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The rules for a valid type of index into a column are the following:
2626
* a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
2727
* a regular expression, which gets expanded to a vector of matching column names;
2828
* a `Not` expression (see [InvertedIndices.jl](https://github.com/mbauman/InvertedIndices.jl));
29-
* an `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
29+
* an `Cols`, `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
3030
* a colon literal `:`.
3131

3232
The rules for a valid type of index into a row are the following:

docs/src/man/getting_started.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -456,7 +456,11 @@ julia> df[!, :A] == df[:, :A]
456456
true
457457
```
458458

459-
In the first case, `[:A]` is a vector, indicating that the resulting object should be a `DataFrame`. On the other hand, `:A` is a single symbol, indicating that a single column vector should be extracted. Note that in the first case a vector is required to be passed (not just any iterable), so e.g. `df[:, (:x1, :x2)]` is not allowed, but `df[:, [:x1, :x2]]` is valid.
459+
In the first case, `[:A]` is a vector, indicating that the resulting object
460+
should be a `DataFrame`. On the other hand, `:A` is a single symbol, indicating
461+
that a single column vector should be extracted. Note that in the first case a
462+
vector is required to be passed (not just any iterable), so e.g. `df[:, (:x1,
463+
:x2)]` is not allowed, but `df[:, [:x1, :x2]]` is valid.
460464

461465
It is also possible to use a regular expression as a selector of columns matching it:
462466
```jldoctest dataframe
@@ -475,7 +479,9 @@ julia> df[!, r"x"]
475479
│ 1 │ 1 │ 2 │
476480
```
477481

478-
A `Not` selector (from the [InvertedIndices](https://github.com/mbauman/InvertedIndices.jl) package) can be used to select all columns excluding a specific subset:
482+
A `Not` selector (from the
483+
[InvertedIndices](https://github.com/mbauman/InvertedIndices.jl) package) can be
484+
used to select all columns excluding a specific subset:
479485

480486
```jldoctest dataframe
481487
julia> df[!, Not(:x1)]
@@ -486,8 +492,13 @@ julia> df[!, Not(:x1)]
486492
│ 1 │ 2 │ 3 │
487493
```
488494

489-
Finally, you can use `Not`, `Between`, and `All` selectors in more complex column selection scenarios.
490-
The following examples move all columns whose names match `r"x"` regular expression respectively to the front and to the end of a data frame:
495+
Finally, you can use `Not`, `Between`, `Cols` and `All` selectors in more
496+
complex column selection scenarios (note that `Cols()` selects no columns while
497+
`All()` selects all columns therefore `Cols` is a preferred selector if you
498+
write generic code). The following examples move all columns whose names match
499+
`r"x"` regular expression respectively to the front and to the end of a data
500+
frame:
501+
491502
```
492503
julia> df = DataFrame(r=1, x1=2, x2=3, y=4)
493504
1×4 DataFrame
@@ -496,14 +507,14 @@ julia> df = DataFrame(r=1, x1=2, x2=3, y=4)
496507
├─────┼───────┼───────┼───────┼───────┤
497508
│ 1 │ 1 │ 2 │ 3 │ 4 │
498509
499-
julia> df[:, All(r"x", :)]
510+
julia> df[:, Cols(r"x", :)]
500511
1×4 DataFrame
501512
│ Row │ x1 │ x2 │ r │ y │
502513
│ │ Int64 │ Int64 │ Int64 │ Int64 │
503514
├─────┼───────┼───────┼───────┼───────┤
504515
│ 1 │ 2 │ 3 │ 1 │ 4 │
505516
506-
julia> df[:, All(Not(r"x"), :)]
517+
julia> df[:, Cols(Not(r"x"), :)]
507518
1×4 DataFrame
508519
│ Row │ r │ y │ x1 │ x2 │
509520
│ │ Int64 │ Int64 │ Int64 │ Int64 │

docs/src/man/sorting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ julia> last(iris, 4)
128128
Keywords used above include `rev` (to sort in reverse),
129129
and `by` (to apply a function to values before comparing them).
130130
Each keyword can either be a single value, a vector with values corresponding to
131-
individual columns, or a selector: `:`, `All`, `Not`, `Between`, or `Regex`.
131+
individual columns, or a selector: `:`, `Cols`, `All`, `Not`, `Between`, or `Regex`.
132132

133133
As an alternative to using a vector values you can use `order` to specify
134134
an ordering for a particular column within a set of columns.

docs/src/man/split_apply_combine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Operations can then be applied on each group using one of the following function
2727
All these functions take a specification of one or more functions to apply to
2828
each subset of the `DataFrame`. This specification can be of the following forms:
2929
1. standard column selectors (integers, symbols, vectors of integers, vectors of symbols,
30-
`All`, `:`, `Between`, `Not` and regular expressions)
30+
`All`, `Cols`, `:`, `Between`, `Not` and regular expressions)
3131
2. a `cols => function` pair indicating that `function` should be called with
3232
positional arguments holding columns `cols`, which can be a any valid column selector
3333
3. a `cols => function => target_col` form additionally

src/DataFrames.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ using Markdown
1111
import DataAPI,
1212
DataAPI.All,
1313
DataAPI.Between,
14+
DataAPI.Cols,
1415
DataAPI.describe,
1516
Tables,
1617
Tables.columnindex,
@@ -21,6 +22,7 @@ export AbstractDataFrame,
2122
AsTable,
2223
Between,
2324
ByRow,
25+
Cols,
2426
DataFrame,
2527
DataFrameRow,
2628
GroupedDataFrame,

src/abstractdataframe/abstractdataframe.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ abstract type AbstractDataFrame end
6767
Return a freshly allocated `Vector{String}` of names of columns contained in `df`.
6868
6969
If `cols` is passed then restrict returned column names to those matching the
70-
selector (this is useful in particular with regular expressions, `Not`, and `Between`).
70+
selector (this is useful in particular with regular expressions, `Cols`, `Not`, and `Between`).
7171
`cols` can be any column selector ($COLUMNINDEX_STR; $MULTICOLUMNINDEX_STR)
7272
or a `Type`, in which case columns whose `eltype` is a subtype of `cols` are returned.
7373

src/abstractdataframe/selection.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1106,6 +1106,6 @@ function manipulate(dfv::SubDataFrame, @nospecialize(args...); copycols::Bool, k
11061106
push!(newinds, newind)
11071107
end
11081108
end
1109-
return view(dfv, :, isempty(newinds) ? [] : All(newinds...))
1109+
return view(dfv, :, Cols(newinds...))
11101110
end
11111111
end

src/dataframerow/dataframerow.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ end
224224

225225
Base.@propagate_inbounds Base.getindex(r::DataFrameRow, ::Colon) = r
226226

227-
for T in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
227+
for T in MULTICOLUMNINDEX_TUPLE
228228
@eval function Base.setindex!(df::DataFrame,
229229
v::Union{DataFrameRow, NamedTuple, AbstractDict},
230230
row_ind::Integer,

0 commit comments

Comments
 (0)