Skip to content

Commit 99a9daa

Browse files
committed
Support non-interpolating quantile definitions
Add a `type` argument to `quantile` to support the three remaining (non-interpolating) types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.
1 parent 0ce3149 commit 99a9daa

File tree

2 files changed

+220
-81
lines changed

2 files changed

+220
-81
lines changed

src/Statistics.jl

Lines changed: 160 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -854,8 +854,12 @@ median!(v::AbstractArray) = median!(vec(v))
854854
median(itr)
855855
856856
Compute the median of all elements in a collection.
857-
For an even number of elements no exact median element exists, so the result is
858-
equivalent to calculating mean of two median elements.
857+
858+
For an even number of elements no exact median element exists, so the
859+
mean of two median elements is returned.
860+
This is equivalent to [`quantile(itr, 0.5, type=2)`](@ref).
861+
Use `quantile` with `type=1` or `type=3` to compute median of types
862+
with limited or no support for arithmetic operations, such as `Date`.
859863
860864
!!! note
861865
If `itr` contains `NaN` or [`missing`](@ref) values, the result is also
@@ -923,31 +927,44 @@ julia> median([√1, √3, √2])
923927
median(f::Function, v) = median!(f.(v))
924928

925929
"""
926-
quantile!([q::AbstractArray, ] v::AbstractVector, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
930+
quantile!([q::AbstractArray, ] v::AbstractVector, p;
931+
sorted=false, type::Integer=7, alpha::Real=1.0, beta::Real=alpha)
927932
928933
Compute the quantile(s) of a vector `v` at a specified probability or vector or tuple of
929934
probabilities `p` on the interval [0,1]. If `p` is a vector, an optional
930935
output array `q` may also be specified. (If not provided, a new output array is created.)
931936
The keyword argument `sorted` indicates whether `v` can be assumed to be sorted; if
932937
`false` (the default), then the elements of `v` will be partially sorted in-place.
933938
934-
Samples quantile are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
935-
where `x[j]` is the j-th order statistic of `v`, `j = floor(n*p + m)`,
936-
`m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
937-
938-
By default (`alpha = beta = 1`), quantiles are computed via linear interpolation between the points
939-
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `n = length(v)`. This corresponds to Definition 7
939+
By default (`type=7`, or equivalently `alpha = beta = 1`),
940+
quantiles are computed via linear interpolation between the points
941+
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `x[j]` is the j-th order statistic of `itr`
942+
and `n = length(itr)`. This corresponds to Definition 7
940943
of Hyndman and Fan (1996), and is the same as the R and NumPy default.
941944
942-
The keyword arguments `alpha` and `beta` correspond to the same parameters in Hyndman and Fan,
943-
setting them to different values allows to calculate quantiles with any of the methods 4-9
944-
defined in this paper:
945-
- Def. 4: `alpha=0`, `beta=1`
946-
- Def. 5: `alpha=0.5`, `beta=0.5` (MATLAB default)
947-
- Def. 6: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
948-
- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
949-
- Def. 8: `alpha=1/3`, `beta=1/3`
950-
- Def. 9: `alpha=3/8`, `beta=3/8`
945+
The keyword argument `type` can be used to choose among the 9 definitions
946+
in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
947+
any of the methods 4-9 defined in this paper. It is not allowed to specify both
948+
kinds of arguments at the same time.
949+
950+
Definitions 1 to 3 are discontinuous:
951+
- `type=1`: `Q(p) = x[ceil(n*p)]` (SAS-3)
952+
- `type=2`: `Q(p) = middle(x[ceil(n*p), floor(n*p + 1)])` (SAS-5, Stata)
953+
- `type=3`: `Q(p) = x[round(n*p)]` (SAS-2)
954+
955+
Definitions 4 to 9 use linear interpolation between consecutive order statistics.
956+
Samples quantiles are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
957+
where `j = floor(n*p + m)`, `m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
958+
- `type=4`: `alpha=0`, `beta=1` (SAS-1)
959+
- `type=5`: `alpha=0.5`, `beta=0.5` (MATLAB default)
960+
- `type=6`: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
961+
- `type=7`: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and
962+
`PERCENTILE.INC`, Python `'inclusive'`)
963+
- `type=8`: `alpha=1/3`, `beta=1/3`
964+
- `type=9`: `alpha=3/8`, `beta=3/8`
965+
966+
Definitions 1 and 3 have the advantage that they work with types that do not support
967+
all arithmetic operations, such as `Date`.
951968
952969
!!! note
953970
An `ArgumentError` is thrown if `v` contains `NaN` or [`missing`](@ref) values.
@@ -956,7 +973,8 @@ defined in this paper:
956973
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
957974
*The American Statistician*, Vol. 50, No. 4, pp. 361-365
958975
959-
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details the different quantile definitions
976+
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details
977+
the different quantile definitions
960978
961979
# Examples
962980
```jldoctest
@@ -986,7 +1004,8 @@ julia> y
9861004
```
9871005
"""
9881006
function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
989-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha)
1007+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1008+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha)
9901009
require_one_based_indexing(q, v, p)
9911010
if size(p) != size(q)
9921011
throw(DimensionMismatch("size of p, $(size(p)), must equal size of q, $(size(q))"))
@@ -997,29 +1016,34 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
9971016
_quantilesort!(v, sorted, minp, maxp)
9981017

9991018
for (i, j) in zip(eachindex(p), eachindex(q))
1000-
@inbounds q[j] = _quantile(v,p[i], alpha=alpha, beta=beta)
1019+
@inbounds q[j] = _quantile(v,p[i], type=type, alpha=alpha, beta=beta)
10011020
end
10021021
return q
10031022
end
10041023

10051024
function quantile!(v::AbstractVector, p::Union{AbstractArray, Tuple{Vararg{Real}}};
1006-
sorted::Bool=false, alpha::Real=1., beta::Real=alpha)
1025+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1026+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha)
10071027
if !isempty(p)
10081028
minp, maxp = extrema(p)
10091029
_quantilesort!(v, sorted, minp, maxp)
10101030
end
1011-
return map(x->_quantile(v, x, alpha=alpha, beta=beta), p)
1031+
return map(x->_quantile(v, x, type=type, alpha=alpha, beta=beta), p)
10121032
end
10131033
quantile!(a::AbstractArray, p::Union{AbstractArray,Tuple{Vararg{Real}}};
1014-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1015-
quantile!(vec(a), p, sorted=sorted, alpha=alpha, beta=alpha)
1034+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1035+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1036+
quantile!(vec(a), p, sorted=sorted, type=type, alpha=alpha, beta=alpha)
10161037

10171038
quantile!(q::AbstractArray, a::AbstractArray, p::Union{AbstractArray,Tuple{Vararg{Real}}};
1018-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1019-
quantile!(q, vec(a), p, sorted=sorted, alpha=alpha, beta=alpha)
1039+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1040+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1041+
quantile!(q, vec(a), p, sorted=sorted, type=type, alpha=alpha, beta=alpha)
10201042

1021-
quantile!(v::AbstractVector, p::Real; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1022-
_quantile(_quantilesort!(v, sorted, p, p), p, alpha=alpha, beta=beta)
1043+
quantile!(v::AbstractVector, p::Real;
1044+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1045+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1046+
_quantile(_quantilesort!(v, sorted, p, p), p, type=type, alpha=alpha, beta=beta)
10231047

10241048
# Function to perform partial sort of v for quantiles in given range
10251049
function _quantilesort!(v::AbstractVector, sorted::Bool, minp::Real, maxp::Real)
@@ -1042,65 +1066,112 @@ function _quantilesort!(v::AbstractVector, sorted::Bool, minp::Real, maxp::Real)
10421066
end
10431067

10441068
# Core quantile lookup function: assumes `v` sorted
1045-
@inline function _quantile(v::AbstractVector, p::Real; alpha::Real=1.0, beta::Real=alpha)
1069+
@inline function _quantile(v::AbstractVector, p::Real;
1070+
type::Union{Integer, Nothing},
1071+
alpha::Union{Real, Nothing}, beta::Union{Real, Nothing})
10461072
0 <= p <= 1 || throw(ArgumentError("input probability out of [0,1] range"))
1047-
0 <= alpha <= 1 || throw(ArgumentError("alpha parameter out of [0,1] range"))
1048-
0 <= beta <= 1 || throw(ArgumentError("beta parameter out of [0,1] range"))
10491073
require_one_based_indexing(v)
10501074

1075+
if alpha !== nothing || beta !== nothing
1076+
type === nothing ||
1077+
throw(ArgumentError("it is not allowed to pass both `type` and `alpha` or `beta`"))
1078+
1079+
alpha === nothing && (alpha = 1.0)
1080+
beta === nothing && (beta = alpha)
1081+
1082+
0 <= alpha <= 1 || throw(ArgumentError("alpha parameter out of [0,1] range"))
1083+
0 <= beta <= 1 || throw(ArgumentError("beta parameter out of [0,1] range"))
1084+
elseif type === nothing
1085+
alpha = beta = 1.0
1086+
elseif 4 <= type <= 9
1087+
alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
1088+
beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
1089+
elseif !(1 <= type <= 3)
1090+
throw(ArgumentError("`type` must be between 1 and 9"))
1091+
end
1092+
10511093
n = length(v)
10521094

10531095
@assert n > 0 # this case should never happen here
10541096

1055-
m = alpha + p * (one(alpha) - alpha - beta)
1056-
# Using fma here avoids some rounding errors when aleph is an integer
1057-
# The use of oftype supresses the promotion caused by alpha and beta
1058-
aleph = fma(n, p, oftype(p, m))
1059-
j = clamp(trunc(Int, aleph), 1, n - 1)
1060-
γ = clamp(aleph - j, 0, 1)
1061-
1062-
if n == 1
1063-
a = v[1]
1064-
b = v[1]
1097+
if type == 1
1098+
return v[clamp(ceil(Int, n*p), 1, n)]
1099+
elseif type == 2
1100+
i = clamp(ceil(Int, n*p), 1, n)
1101+
j = clamp(floor(Int, n*p + 1), 1, n)
1102+
return middle(v[i], v[j])
1103+
elseif type == 3
1104+
return v[clamp(round(Int, n*p), 1, n)]
10651105
else
1066-
a = v[j]
1067-
b = v[j + 1]
1068-
end
1106+
m = alpha + p * (one(alpha) - alpha - beta)
1107+
# Using fma here avoids some rounding errors when aleph is an integer
1108+
# The use of oftype supresses the promotion caused by alpha and beta
1109+
aleph = fma(n, p, oftype(p, m))
1110+
j = clamp(trunc(Int, aleph), 1, n - 1)
1111+
γ = clamp(aleph - j, 0, 1)
1112+
1113+
if n == 1
1114+
a = v[1]
1115+
b = v[1]
1116+
else
1117+
a = v[j]
1118+
b = v[j + 1]
1119+
end
10691120

1070-
# When a ≉ b, b-a may overflow
1071-
# When a ≈ b, (1-γ)*a + γ*b may not be increasing with γ due to rounding
1072-
if isfinite(a) && isfinite(b) &&
1073-
(!(a isa Number) || !(b isa Number) || a b)
1074-
return a + γ*(b-a)
1075-
else
1076-
return (1-γ)*a + γ*b
1121+
try
1122+
# When a ≉ b, b-a may overflow
1123+
# When a ≈ b, (1-γ)*a + γ*b may not be increasing with γ due to rounding
1124+
if isfinite(a) && isfinite(b) &&
1125+
(!(a isa Number) || !(b isa Number) || a b)
1126+
return a + γ*(b-a)
1127+
else
1128+
return (1-γ)*a + γ*b
1129+
end
1130+
catch e
1131+
throw(ArgumentError("error when computing quantile between two data values. " *
1132+
"Pass `type=1` or `type=3` to compute quantiles on types with " *
1133+
"no or limited support for arithmetic operations."))
1134+
end
10771135
end
10781136
end
10791137

10801138
"""
1081-
quantile(itr, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
1139+
quantile(itr, p;
1140+
sorted=false, type::Integer=7, alpha::Real=1.0, beta::Real=alpha)
10821141
1083-
Compute the quantile(s) of a collection `itr` at a specified probability or vector or tuple of
1142+
Compute the quantile(s) of a collection `qitr` at a specified probability or vector or tuple of
10841143
probabilities `p` on the interval [0,1]. The keyword argument `sorted` indicates whether
10851144
`itr` can be assumed to be sorted.
10861145
1087-
Samples quantile are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
1088-
where `x[j]` is the j-th order statistic of `itr`, `j = floor(n*p + m)`,
1089-
`m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
1090-
1091-
By default (`alpha = beta = 1`), quantiles are computed via linear interpolation between the points
1092-
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `n = length(itr)`. This corresponds to Definition 7
1146+
By default (`type=7`, or equivalently `alpha = beta = 1`),
1147+
quantiles are computed via linear interpolation between the points
1148+
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `x[j]` is the j-th order statistic of `itr`
1149+
and `n = length(itr)`. This corresponds to Definition 7
10931150
of Hyndman and Fan (1996), and is the same as the R and NumPy default.
10941151
1095-
The keyword arguments `alpha` and `beta` correspond to the same parameters in Hyndman and Fan,
1096-
setting them to different values allows to calculate quantiles with any of the methods 4-9
1097-
defined in this paper:
1098-
- Def. 4: `alpha=0`, `beta=1`
1099-
- Def. 5: `alpha=0.5`, `beta=0.5` (MATLAB default)
1100-
- Def. 6: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
1101-
- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
1102-
- Def. 8: `alpha=1/3`, `beta=1/3`
1103-
- Def. 9: `alpha=3/8`, `beta=3/8`
1152+
The keyword argument `type` can be used to choose among the 9 definitions
1153+
in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
1154+
any of the methods 4-9 defined in this paper. It is not allowed to specify both
1155+
kinds of arguments at the same time.
1156+
1157+
Definitions 1 to 3 are discontinuous:
1158+
- `type=1`: `Q(p) = x[ceil(n*p)]` (SAS-3)
1159+
- `type=2`: `Q(p) = middle(x[ceil(n*p), floor(n*p + 1)])` (SAS-5, Stata)
1160+
- `type=3`: `Q(p) = x[round(n*p)]` (SAS-2)
1161+
1162+
Definitions 4 to 9 use linear interpolation between consecutive order statistics.
1163+
Samples quantiles are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
1164+
where `j = floor(n*p + m)`, `m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
1165+
- `type=4`: `alpha=0`, `beta=1` (SAS-1)
1166+
- `type=5`: `alpha=0.5`, `beta=0.5` (MATLAB default)
1167+
- `type=6`: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
1168+
- `type=7`: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and
1169+
`PERCENTILE.INC`, Python `'inclusive'`)
1170+
- `type=8`: `alpha=1/3`, `beta=1/3`
1171+
- `type=9`: `alpha=3/8`, `beta=3/8`
1172+
1173+
Definitions 1 and 3 have the advantage that they work with types that do not support
1174+
all arithmetic operations, such as `Date`.
11041175
11051176
!!! note
11061177
An `ArgumentError` is thrown if `v` contains `NaN` or [`missing`](@ref) values.
@@ -1111,7 +1182,8 @@ defined in this paper:
11111182
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
11121183
*The American Statistician*, Vol. 50, No. 4, pp. 361-365
11131184
1114-
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details the different quantile definitions
1185+
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details
1186+
the different quantile definitions
11151187
11161188
# Examples
11171189
```jldoctest
@@ -1130,17 +1202,22 @@ julia> quantile(skipmissing([1, 10, missing]), 0.5)
11301202
5.5
11311203
```
11321204
"""
1133-
quantile(itr, p; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1134-
quantile!(collect(itr), p, sorted=sorted, alpha=alpha, beta=beta)
1205+
quantile(itr, p; sorted::Bool=false,
1206+
type::Union{Integer, Nothing}=nothing,
1207+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1208+
quantile!(collect(itr), p, sorted=sorted, type=type, alpha=alpha, beta=beta)
11351209

11361210

11371211
"""
1138-
quantile(f, v)
1212+
quantile(f, v;
1213+
sorted=false, type::Integer=7, alpha::Real=1.0, beta::Real=alpha)
11391214
11401215
Apply the function `f` to each element of collection `v`
11411216
and then compute the quantile(s) at a specified probability
11421217
or vector or tuple of probabilities `p` on the interval [0,1].
11431218
1219+
See the other method for documentation on keyword arguments.
1220+
11441221
```jldoctest
11451222
julia> using Statistics
11461223
@@ -1157,11 +1234,16 @@ julia> quantile(.√[1, 3, 2], (0.3, 0.4, 0.5))
11571234
(1.248528137423857, 1.3313708498984762, 1.4142135623730951)
11581235
```
11591236
"""
1160-
quantile(f::Function, v, p; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1161-
quantile!(f.(v), p; sorted=sorted, alpha=alpha, beta=beta)
1162-
1163-
quantile(v::AbstractVector, p; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1164-
quantile!(sorted ? v : Base.copymutable(v), p; sorted=sorted, alpha=alpha, beta=beta)
1237+
quantile(f::Function, v, p;
1238+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1239+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1240+
quantile!(f.(v), p; sorted=sorted, type=type, alpha=alpha, beta=beta)
1241+
1242+
quantile(v::AbstractVector, p;
1243+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1244+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1245+
quantile!(sorted ? v : Base.copymutable(v), p;
1246+
sorted=sorted, type=type, alpha=alpha, beta=beta)
11651247

11661248
# If package extensions are not supported in this Julia version
11671249
if !isdefined(Base, :get_extension)

0 commit comments

Comments
 (0)