Skip to content

Commit df1b417

Browse files
authored
Allow predicate in Cols (#2881)
1 parent 15bcaae commit df1b417

File tree

6 files changed

+57
-6
lines changed

6 files changed

+57
-6
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@
7070
* the `DataFrame` constructor when matrix is passed to it as a first
7171
argument now allows `copycols` keyword argument
7272
([#2859](https://github.com/JuliaData/DataFrames.jl/pull/2859))
73+
* `Cols` now accepts a predicate accepting column names as strings.
74+
([#2881](https://github.com/JuliaData/DataFrames.jl/pull/2881))
7375

7476
## Bug fixes
7577

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
2525
[compat]
2626
CategoricalArrays = "0.10.0"
2727
Compat = "3.17"
28-
DataAPI = "1.8"
28+
DataAPI = "1.9"
2929
InvertedIndices = "1"
3030
IteratorInterfaceExtensions = "0.1.1, 1"
3131
Missings = "0.4.2, 1"

docs/src/lib/indexing.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,17 @@ The rules for a valid type of index into a column are the following:
2626
* a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
2727
* a regular expression, which gets expanded to a vector of matching column names;
2828
* a `Not` expression (see [InvertedIndices.jl](https://github.com/mbauman/InvertedIndices.jl));
29-
* an `Cols`, `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
29+
the `Not(idx)` selects all indices not in the passed `idx`;
30+
* a `Cols` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
31+
`Cols(idxs...)` selects the union of the selections in `idxs`; in particular
32+
`Cols()` selects no columns and `Cols(:)` selects all columns; a special rule is
33+
`Cols(predicate)`, where `predicate` is a predicate function; in this case
34+
the columns whose names passed to `predicate` as strings return `true`
35+
are selected.
36+
* a `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
37+
`Between(first, last)` selects the columns between `first` and `last`;
38+
* an `All` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
39+
`All()` selects all columns, equivalent to `:`;
3040
* a colon literal `:`.
3141

3242
The rules for a valid type of index into a row are the following:

docs/src/man/working_with_dataframes.md

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -255,18 +255,49 @@ julia> df[!, Not(:x1)]
255255
Finally, you can use `Not`, `Between`, `Cols` and `All` selectors in more
256256
complex column selection scenarios (note that `Cols()` selects no columns while
257257
`All()` selects all columns therefore `Cols` is a preferred selector if you
258-
write generic code). The following examples move all columns whose names match
259-
`r"x"` regular expression respectively to the front and to the end of a data
260-
frame:
258+
write generic code). Here are examples of using each of these selectors:
261259

262-
```
260+
```jldoctest dataframe
263261
julia> df = DataFrame(r=1, x1=2, x2=3, y=4)
264262
1×4 DataFrame
265263
Row │ r x1 x2 y
266264
│ Int64 Int64 Int64 Int64
267265
─────┼────────────────────────────
268266
1 │ 1 2 3 4
269267
268+
julia> df[:, Not(:r)] # drop :r column
269+
1×3 DataFrame
270+
Row │ x1 x2 y
271+
│ Int64 Int64 Int64
272+
─────┼─────────────────────
273+
1 │ 2 3 4
274+
275+
julia> df[:, Between(:r, :x2)] # keep columns between :r and :x2
276+
1×3 DataFrame
277+
Row │ r x1 x2
278+
│ Int64 Int64 Int64
279+
─────┼─────────────────────
280+
1 │ 1 2 3
281+
282+
julia> df[:, All()] # keep all columns
283+
1×4 DataFrame
284+
Row │ r x1 x2 y
285+
│ Int64 Int64 Int64 Int64
286+
─────┼────────────────────────────
287+
1 │ 1 2 3 4
288+
289+
julia> df[:, Cols(x -> startswith(x, "x"))] # keep columns whose name starts with "x"
290+
1×2 DataFrame
291+
Row │ x1 x2
292+
│ Int64 Int64
293+
─────┼──────────────
294+
1 │ 2 3
295+
```
296+
297+
The following examples show a more complex use of the `Cols` selector, which moves all
298+
columns whose names match `r"x"` regular expression respectively to the front
299+
and to the end of the data frame:
300+
```jldoctest dataframe
270301
julia> df[:, Cols(r"x", :)]
271302
1×4 DataFrame
272303
Row │ x1 x2 r y

src/other/index.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,9 @@ end
221221
isempty(idx.cols) ? (1:length(x)) : throw(ArgumentError("All(args...) is not supported: use Cols(args...) instead"))
222222
@inline Base.getindex(x::AbstractIndex, idx::Cols) =
223223
isempty(idx.cols) ? Int[] : union(getindex.(Ref(x), idx.cols)...)
224+
@inline Base.getindex(x::AbstractIndex, idx::Cols{Tuple{typeof(:)}}) = x[:]
225+
@inline Base.getindex(x::AbstractIndex, idx::Cols{<:Tuple{Function}}) =
226+
findall(idx.cols[1], names(x))
224227

225228
@inline function Base.getindex(x::AbstractIndex, idx::AbstractVector{<:Integer})
226229
if any(v -> v isa Bool, idx)

test/index.jl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -474,6 +474,11 @@ end
474474
df = DataFrame(a1=1, a2=2, b1=3, b2=4)
475475
@test df[:, Cols(r"a", Not(r"1"))] == df[:, [1, 2, 4]]
476476
@test df[:, Cols(Not(r"1"), r"a")] == df[:, [2, 4, 1]]
477+
@test df[:, Cols(x -> x[1] == 'a')] == df[:, [1, 2]]
478+
@test df[:, Cols(x -> x[end] == '1')] == df[:, [1, 3]]
479+
@test df[:, Cols(x -> x[end] == '3')] == DataFrame()
480+
@test_throws MethodError df[:, Cols(x -> true, 1)] == DataFrame()
481+
@test_throws MethodError df[:, Cols(1, x -> true)] == DataFrame()
477482
end
478483

479484
@testset "views" begin

0 commit comments

Comments
 (0)