Skip to content

Commit 115bcc9

Browse files
authored
improve missings documentation (#2899)
1 parent d6d37ab commit 115bcc9

File tree

1 file changed

+122
-20
lines changed

1 file changed

+122
-20
lines changed

docs/src/man/missing.md

Lines changed: 122 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,19 @@
11
# Missing Data
22

3-
In Julia, missing values in data are represented using the special object `missing`, which is the single instance of the type `Missing`.
3+
In Julia, missing values in data are represented using the special object
4+
`missing`, which is the single instance of the type `Missing`.
45

56
```jldoctest
67
julia> missing
78
missing
89
910
julia> typeof(missing)
1011
Missing
11-
1212
```
1313

14-
The `Missing` type lets users create `Vector`s and `DataFrame` columns with missing values. Here we create a vector with a missing value and the element-type of the returned vector is `Union{Missing, Int64}`.
14+
The `Missing` type lets users create vectors and `DataFrame` columns with
15+
missing values. Here we create a vector with a missing value and the
16+
element-type of the returned vector is `Union{Missing, Int64}`.
1517

1618
```jldoctest missings
1719
julia> x = [1, 2, missing]
@@ -28,18 +30,19 @@ Union{Missing, Int64}
2830
2931
julia> eltype(x) == Union{Missing, Int}
3032
true
31-
3233
```
3334

34-
`missing` values can be excluded when performing operations by using `skipmissing`, which returns a memory-efficient iterator.
35+
`missing` values can be excluded when performing operations by using
36+
`skipmissing`, which returns a memory-efficient iterator.
3537

3638
```jldoctest missings
3739
julia> skipmissing(x)
3840
skipmissing(Union{Missing, Int64}[1, 2, missing])
39-
4041
```
4142

42-
The output of `skipmissing` can be passed directly into functions as an argument. For example, we can find the `sum` of all non-missing values or `collect` the non-missing values into a new missing-free vector.
43+
The output of `skipmissing` can be passed directly into functions as an
44+
argument. For example, we can find the `sum` of all non-missing values or
45+
`collect` the non-missing values into a new missing-free vector.
4346

4447
```jldoctest missings
4548
julia> sum(skipmissing(x))
@@ -49,21 +52,23 @@ julia> collect(skipmissing(x))
4952
2-element Vector{Int64}:
5053
1
5154
2
52-
5355
```
5456

55-
The function `coalesce` can be used to replace missing values with another value (note the dot, indicating that the replacement should be applied to all entries in `x`):
57+
The function `coalesce` can be used to replace missing values with another value
58+
(note the dot, indicating that the replacement should be applied to all entries
59+
in `x`):
5660

5761
```jldoctest missings
5862
julia> coalesce.(x, 0)
5963
3-element Vector{Int64}:
6064
1
6165
2
6266
0
63-
6467
```
6568

66-
The functions `dropmissing` and `dropmissing!` can be used to remove the rows containing `missing` values from a `DataFrame` and either create a new `DataFrame` or mutate the original in-place respectively.
69+
The functions [`dropmissing`](@ref) and [`dropmissing!`](@ref) can be used to
70+
remove the rows containing `missing` values from a data frame and either create
71+
a new `DataFrame` or mutate the original in-place respectively.
6772

6873
```jldoctest missings
6974
julia> using DataFrames
@@ -90,7 +95,8 @@ julia> dropmissing(df)
9095
2 │ 5 1 e
9196
```
9297

93-
One can specify the column(s) in which to search for rows containing `missing` values to be removed.
98+
One can specify the column(s) in which to search for rows containing `missing`
99+
values to be removed.
94100

95101
```jldoctest missings
96102
julia> dropmissing(df, :x)
@@ -103,10 +109,10 @@ julia> dropmissing(df, :x)
103109
3 │ 5 1 e
104110
```
105111

106-
By default the `dropmissing` and `dropmissing!` functions keep the
107-
`Union{T, Missing}` element type in columns selected for row removal. To remove
108-
the `Missing` part, if present, set the `disallowmissing` option to `true` (it
109-
will become the default behavior in the future).
112+
By default the [`dropmissing`](@ref) and [`dropmissing!`](@ref) functions keep
113+
the `Union{T, Missing}` element type in columns selected for row removal. To
114+
remove the `Missing` part, if present, set the `disallowmissing` keyword
115+
argument to `true` (it will become the default behavior in the future).
110116

111117
```jldoctest missings
112118
julia> dropmissing(df, disallowmissing=true)
@@ -118,9 +124,107 @@ julia> dropmissing(df, disallowmissing=true)
118124
2 │ 5 1 e
119125
```
120126

127+
Sometimes it is useful to allow or disallow support of missing values in some
128+
columns of a data frame. These operations are supported by the
129+
[`allowmissing`](@ref), [`allowmissing!`](@ref), [`disallowmissing`](@ref), and
130+
[`disallowmissing!`](@ref) functions. Here is an example:
131+
132+
```jldoctest missings
133+
julia> df = DataFrame(x=1:3, y=4:6)
134+
3×2 DataFrame
135+
Row │ x y
136+
│ Int64 Int64
137+
─────┼──────────────
138+
1 │ 1 4
139+
2 │ 2 5
140+
3 │ 3 6
141+
142+
julia> allowmissing!(df)
143+
3×2 DataFrame
144+
Row │ x y
145+
│ Int64? Int64?
146+
─────┼────────────────
147+
1 │ 1 4
148+
2 │ 2 5
149+
3 │ 3 6
150+
```
151+
152+
Now `df` allows missing values in all its columns. We can take advantage of this
153+
fact and set some of the values in `df` to `missing`, e.g.:
154+
155+
```jldoctest missings
156+
julia> df[1, 1] = missing
157+
missing
158+
159+
julia> df
160+
3×2 DataFrame
161+
Row │ x y
162+
│ Int64? Int64?
163+
─────┼─────────────────
164+
1 │ missing 4
165+
2 │ 2 5
166+
3 │ 3 6
167+
```
168+
169+
Note that a column selector can be passed as the second positional argument to
170+
[`allowmissing`](@ref) and [`allowmissing!`](@ref) to restrict the change to
171+
only some columns in our data frame.
172+
173+
Now let us perform the reverse operation by disallowing missing values in `df`. We
174+
know that column `:y` does not contain missing values so we can use the
175+
[`disallowmissing`](@ref) function passing a column selector as the second
176+
positional argument:
177+
178+
```jldoctest missings
179+
julia> disallowmissing(df, :y)
180+
3×2 DataFrame
181+
Row │ x y
182+
│ Int64? Int64
183+
─────┼────────────────
184+
1 │ missing 4
185+
2 │ 2 5
186+
3 │ 3 6
187+
```
188+
189+
This operation created a new `DataFrame`. If we wanted to update the `df`
190+
in-place the [`disallowmissing!`](@ref) function should be used.
191+
192+
If we tried to disallow missings in the whole data frame using
193+
`disallowmissing(df)` we would get an error. However, it is often useful to
194+
disallow missings in all columns that actually do not contain them but keep the
195+
columns that have some `missing` values unchanged without having to list them
196+
explicitly. This can be accomplished by passing the `error=false` keyword argument:
197+
198+
```jldoctest missings
199+
julia> disallowmissing(df, error=false)
200+
3×2 DataFrame
201+
Row │ x y
202+
│ Int64? Int64
203+
─────┼────────────────
204+
1 │ missing 4
205+
2 │ 2 5
206+
3 │ 3 6
207+
```
208+
121209
The [Missings.jl](https://github.com/JuliaData/Missings.jl) package provides a
122210
few convenience functions to work with missing values.
123211

212+
One of the most commonly used is `passmissing`. It is a higher order function
213+
that takes some function `f` as its argument and returns a new function
214+
which returns `missing` if any of its positional arguments are `missing`
215+
and otherwise applies the function `f` to these arguments. This functionality
216+
is useful in combination with functions that do not support passing `missing`
217+
values as their arguments. For example, trying `uppercase(missing)` would
218+
produce an error, while the following works:
219+
220+
```jldoctest missings
221+
julia> passmissing(uppercase)("a")
222+
"A"
223+
224+
julia> passmissing(uppercase)(missing)
225+
missing
226+
```
227+
124228
The function `Missings.replace` returns an iterator which replaces `missing`
125229
elements with another value:
126230

@@ -138,7 +242,6 @@ julia> collect(Missings.replace(x, 1))
138242
139243
julia> collect(Missings.replace(x, 1)) == coalesce.(x, 1)
140244
true
141-
142245
```
143246

144247
The function `nonmissingtype` returns the element-type `T` in `Union{T, Missing}`.
@@ -149,7 +252,6 @@ Union{Missing, Int64}
149252
150253
julia> nonmissingtype(eltype(x))
151254
Int64
152-
153255
```
154256

155257
The `missings` function constructs `Vector`s and `Array`s supporting missing
@@ -173,7 +275,7 @@ julia> missings(1, 3)
173275
julia> missings(Int, 1, 3)
174276
1×3 Matrix{Union{Missing, Int64}}:
175277
missing missing missing
176-
177278
```
178279

179-
See the [Julia manual](https://docs.julialang.org/en/v1/manual/missing/) for more information about missing values.
280+
See the [Julia manual](https://docs.julialang.org/en/v1/manual/missing/) for
281+
more information about missing values.

0 commit comments

Comments
 (0)