3
3
4
4
Return the total number of observations contained in `data`.
5
5
6
- If `data` does not have `numobs` defined, then this function
7
- falls back to `length(data)`.
6
+ If `data` does not have `numobs` defined,
7
+ then in the case of `Tables.table(data) == true`
8
+ returns the number of rows, otherwise returns `length(data)`.
9
+
8
10
Authors of custom data containers should implement
9
11
`Base.length` for their type instead of `numobs`.
10
12
`numobs` should only be implemented for types where there is a
11
13
difference between `numobs` and `Base.length`
12
14
(such as multi-dimensional arrays).
13
15
14
- See also [`getobs`](@ref)
16
+ `getobs` supports by default nested combinations of array, tuple,
17
+ named tuples, and dictionaries.
18
+
19
+ See also [`getobs`](@ref).
20
+
21
+ # Examples
22
+ ```jldoctest
23
+
24
+ # named tuples
25
+ x = (a = [1, 2, 3], b = rand(6, 3))
26
+ numobs(x) == 3
27
+
28
+ # dictionaries
29
+ x = Dict(:a => [1, 2, 3], :b => rand(6, 3))
30
+ numobs(x) == 3
31
+ ```
32
+ All internal containers must have the same number of observations:
33
+ ```juliarepl
34
+ julia> x = (a = [1, 2, 3, 4], b = rand(6, 3));
35
+
36
+ julia> numobs(x)
37
+ ERROR: DimensionMismatch: All data containers must have the same number of observations.
38
+ Stacktrace:
39
+ [1] _check_numobs_error()
40
+ @ MLUtils ~/.julia/dev/MLUtils/src/observation.jl:163
41
+ [2] _check_numobs
42
+ @ ~/.julia/dev/MLUtils/src/observation.jl:130 [inlined]
43
+ [3] numobs(data::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Matrix{Float64}}})
44
+ @ MLUtils ~/.julia/dev/MLUtils/src/observation.jl:177
45
+ [4] top-level scope
46
+ @ REPL[35]:1
47
+ ```
15
48
"""
16
49
function numobs end
17
50
18
51
# Generic Fallbacks
19
- numobs (data) = length (data)
52
+ @traitfn numobs (data:: X ) where {X; IsTable{X}} = DataAPI. nrow (data)
53
+ @traitfn numobs (data:: X ) where {X; ! IsTable{X}} = length (data)
54
+
20
55
21
56
"""
22
57
getobs(data, [idx])
23
58
24
- Return the observations corresponding to the observation- index `idx`.
59
+ Return the observations corresponding to the observation index `idx`.
25
60
Note that `idx` can be any type as long as `data` has defined
26
- `getobs` for that type.
61
+ `getobs` for that type. If `idx` is not provided, then materialize
62
+ all observations in `data`.
63
+
64
+ If `data` does not have `getobs` defined,
65
+ then in the case of `Tables.table(data) == true`
66
+ returns the row(s) in position `idx`, otherwise returns `data[idx]`.
27
67
28
- If `data` does not have `getobs` defined, then this function
29
- falls back to `data[idx]`.
30
68
Authors of custom data containers should implement
31
69
`Base.getindex` for their type instead of `getobs`.
32
70
`getobs` should only be implemented for types where there is a
@@ -40,13 +78,37 @@ Every author behind some custom data container can make this
40
78
decision themselves.
41
79
The output should be consistent when `idx` is a scalar vs vector.
42
80
43
- See also [`getobs!`](@ref) and [`numobs`](@ref)
81
+ `getobs` supports by default nested combinations of array, tuple,
82
+ named tuples, and dictionaries.
83
+
84
+ See also [`getobs!`](@ref) and [`numobs`](@ref).
85
+
86
+ # Examples
87
+
88
+ ```jldoctest
89
+ # named tuples
90
+ x = (a = [1, 2, 3], b = rand(6, 3))
91
+
92
+ getobs(x, 2) == (a = 2, b = x.b[:, 2])
93
+ getobs(x, [1, 3]) == (a = [1, 3], b = x.b[:, [1, 3]])
94
+
95
+
96
+ # dictionaries
97
+ x = Dict(:a => [1, 2, 3], :b => rand(6, 3))
98
+
99
+ getobs(x, 2) == Dict(:a => 2, :b => x[:b][:, 2])
100
+ getobs(x, [1, 3]) == Dict(:a => [1, 3], :b => x[:b][:, [1, 3]])
101
+ ```
44
102
"""
45
103
function getobs end
46
104
47
105
# Generic Fallbacks
106
+
48
107
getobs (data) = data
49
- getobs (data, idx) = data[idx]
108
+
109
+ @traitfn getobs (data:: X , idx) where {X; IsTable{X}} = Tables. subset (data, idx, viewhint= false )
110
+ @traitfn getobs (data:: X , idx) where {X; ! IsTable{X}} = data[idx]
111
+
50
112
51
113
"""
52
114
getobs!(buffer, data, idx)
@@ -61,6 +123,8 @@ method is provided for the type of `data`, then `buffer` will be
61
123
because the type of `data` may not lend itself to the concept
62
124
of `copy!`. Thus, supporting a custom `getobs!` is optional
63
125
and not required.
126
+
127
+ See also [`getobs`](@ref) and [`numobs`](@ref).
64
128
"""
65
129
function getobs! end
66
130
# getobs!(buffer, data) = getobs(data)
@@ -161,3 +225,5 @@ function getobs!(buffers, data::Dict, i)
161
225
162
226
return buffers
163
227
end
228
+
229
+
0 commit comments