Skip to content

Commit 02e40c2

Browse files
committed
Support non-interpolating quantile definitions
Add a `type` argument to `quantile` to support the three remaining (non-interpolating) types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.
1 parent bfa5c6b commit 02e40c2

File tree

2 files changed

+211
-77
lines changed

2 files changed

+211
-77
lines changed

src/Statistics.jl

Lines changed: 151 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -854,8 +854,12 @@ median!(v::AbstractArray) = median!(vec(v))
854854
median(itr)
855855
856856
Compute the median of all elements in a collection.
857-
For an even number of elements no exact median element exists, so the result is
858-
equivalent to calculating mean of two median elements.
857+
858+
For an even number of elements no exact median element exists, so the
859+
mean of two median elements is returned.
860+
This is equivalent to [`quantile(itr, 0.5, type=2)`](@ref).
861+
Use `quantile` with `type=1` or `type=3` to compute median of types
862+
with limited or no support for arithmetic operations, such as `Date`.
859863
860864
!!! note
861865
If `itr` contains `NaN` or [`missing`](@ref) values, the result is also
@@ -905,31 +909,44 @@ _median(v::AbstractArray{T}, ::Colon) where {T} = median!(copyto!(Array{T,1}(und
905909
median(r::AbstractRange{<:Real}) = mean(r)
906910

907911
"""
908-
quantile!([q::AbstractArray, ] v::AbstractVector, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
912+
quantile!([q::AbstractArray, ] v::AbstractVector, p;
913+
sorted=false, type::Integer=7, alpha::Real=1.0, beta::Real=alpha)
909914
910915
Compute the quantile(s) of a vector `v` at a specified probability or vector or tuple of
911916
probabilities `p` on the interval [0,1]. If `p` is a vector, an optional
912917
output array `q` may also be specified. (If not provided, a new output array is created.)
913918
The keyword argument `sorted` indicates whether `v` can be assumed to be sorted; if
914919
`false` (the default), then the elements of `v` will be partially sorted in-place.
915920
916-
Samples quantile are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
917-
where `x[j]` is the j-th order statistic of `v`, `j = floor(n*p + m)`,
918-
`m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
919-
920-
By default (`alpha = beta = 1`), quantiles are computed via linear interpolation between the points
921-
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `n = length(v)`. This corresponds to Definition 7
921+
By default (`type=7`, or equivalently `alpha = beta = 1`),
922+
quantiles are computed via linear interpolation between the points
923+
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `x[j]` is the j-th order statistic of `itr`
924+
and `n = length(itr)`. This corresponds to Definition 7
922925
of Hyndman and Fan (1996), and is the same as the R and NumPy default.
923926
924-
The keyword arguments `alpha` and `beta` correspond to the same parameters in Hyndman and Fan,
925-
setting them to different values allows to calculate quantiles with any of the methods 4-9
926-
defined in this paper:
927-
- Def. 4: `alpha=0`, `beta=1`
928-
- Def. 5: `alpha=0.5`, `beta=0.5` (MATLAB default)
929-
- Def. 6: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
930-
- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
931-
- Def. 8: `alpha=1/3`, `beta=1/3`
932-
- Def. 9: `alpha=3/8`, `beta=3/8`
927+
The keyword argument `type` can be used to choose among the 9 definitions
928+
in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
929+
any of the methods 4-9 defined in this paper. It is not allowed to specify both
930+
kinds of arguments at the same time.
931+
932+
Definitions 1 to 3 are discontinuous:
933+
- `type=1`: `Q(p) = x[ceil(n*p)]` (SAS-3)
934+
- `type=2`: `Q(p) = middle(x[ceil(n*p), floor(n*p + 1)])` (SAS-5, Stata)
935+
- `type=3`: `Q(p) = x[round(n*p)]` (SAS-2)
936+
937+
Definitions 4 to 9 use linear interpolation between consecutive order statistics.
938+
Samples quantiles are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
939+
where `j = floor(n*p + m)`, `m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
940+
- `type=4`: `alpha=0`, `beta=1` (SAS-1)
941+
- `type=5`: `alpha=0.5`, `beta=0.5` (MATLAB default)
942+
- `type=6`: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
943+
- `type=7`: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and
944+
`PERCENTILE.INC`, Python `'inclusive'`)
945+
- `type=8`: `alpha=1/3`, `beta=1/3`
946+
- `type=9`: `alpha=3/8`, `beta=3/8`
947+
948+
Definitions 1 and 3 have the advantage that they work with types that do not support
949+
all arithmetic operations, such as `Date`.
933950
934951
!!! note
935952
An `ArgumentError` is thrown if `v` contains `NaN` or [`missing`](@ref) values.
@@ -938,7 +955,8 @@ defined in this paper:
938955
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
939956
*The American Statistician*, Vol. 50, No. 4, pp. 361-365
940957
941-
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details the different quantile definitions
958+
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details
959+
the different quantile definitions
942960
943961
# Examples
944962
```jldoctest
@@ -968,7 +986,8 @@ julia> y
968986
```
969987
"""
970988
function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
971-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha)
989+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
990+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha)
972991
require_one_based_indexing(q, v, p)
973992
if size(p) != size(q)
974993
throw(DimensionMismatch("size of p, $(size(p)), must equal size of q, $(size(q))"))
@@ -979,29 +998,34 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
979998
_quantilesort!(v, sorted, minp, maxp)
980999

9811000
for (i, j) in zip(eachindex(p), eachindex(q))
982-
@inbounds q[j] = _quantile(v,p[i], alpha=alpha, beta=beta)
1001+
@inbounds q[j] = _quantile(v,p[i], type=type, alpha=alpha, beta=beta)
9831002
end
9841003
return q
9851004
end
9861005

9871006
function quantile!(v::AbstractVector, p::Union{AbstractArray, Tuple{Vararg{Real}}};
988-
sorted::Bool=false, alpha::Real=1., beta::Real=alpha)
1007+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1008+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha)
9891009
if !isempty(p)
9901010
minp, maxp = extrema(p)
9911011
_quantilesort!(v, sorted, minp, maxp)
9921012
end
993-
return map(x->_quantile(v, x, alpha=alpha, beta=beta), p)
1013+
return map(x->_quantile(v, x, type=type, alpha=alpha, beta=beta), p)
9941014
end
9951015
quantile!(a::AbstractArray, p::Union{AbstractArray,Tuple{Vararg{Real}}};
996-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
997-
quantile!(vec(a), p, sorted=sorted, alpha=alpha, beta=alpha)
1016+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1017+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1018+
quantile!(vec(a), p, sorted=sorted, type=type, alpha=alpha, beta=alpha)
9981019

9991020
quantile!(q::AbstractArray, a::AbstractArray, p::Union{AbstractArray,Tuple{Vararg{Real}}};
1000-
sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1001-
quantile!(q, vec(a), p, sorted=sorted, alpha=alpha, beta=alpha)
1021+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1022+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1023+
quantile!(q, vec(a), p, sorted=sorted, type=type, alpha=alpha, beta=alpha)
10021024

1003-
quantile!(v::AbstractVector, p::Real; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1004-
_quantile(_quantilesort!(v, sorted, p, p), p, alpha=alpha, beta=beta)
1025+
quantile!(v::AbstractVector, p::Real;
1026+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1027+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1028+
_quantile(_quantilesort!(v, sorted, p, p), p, type=type, alpha=alpha, beta=beta)
10051029

10061030
# Function to perform partial sort of v for quantiles in given range
10071031
function _quantilesort!(v::AbstractVector, sorted::Bool, minp::Real, maxp::Real)
@@ -1024,65 +1048,112 @@ function _quantilesort!(v::AbstractVector, sorted::Bool, minp::Real, maxp::Real)
10241048
end
10251049

10261050
# Core quantile lookup function: assumes `v` sorted
1027-
@inline function _quantile(v::AbstractVector, p::Real; alpha::Real=1.0, beta::Real=alpha)
1051+
@inline function _quantile(v::AbstractVector, p::Real;
1052+
type::Union{Integer, Nothing},
1053+
alpha::Union{Real, Nothing}, beta::Union{Real, Nothing})
10281054
0 <= p <= 1 || throw(ArgumentError("input probability out of [0,1] range"))
1029-
0 <= alpha <= 1 || throw(ArgumentError("alpha parameter out of [0,1] range"))
1030-
0 <= beta <= 1 || throw(ArgumentError("beta parameter out of [0,1] range"))
10311055
require_one_based_indexing(v)
10321056

1057+
if alpha !== nothing || beta !== nothing
1058+
type === nothing ||
1059+
throw(ArgumentError("it is not allowed to pass both `type` and `alpha` or `beta`"))
1060+
1061+
alpha === nothing && (alpha = 1.0)
1062+
beta === nothing && (beta = alpha)
1063+
1064+
0 <= alpha <= 1 || throw(ArgumentError("alpha parameter out of [0,1] range"))
1065+
0 <= beta <= 1 || throw(ArgumentError("beta parameter out of [0,1] range"))
1066+
elseif type === nothing
1067+
alpha = beta = 1.0
1068+
elseif 4 <= type <= 9
1069+
alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
1070+
beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
1071+
elseif !(1 <= type <= 3)
1072+
throw(ArgumentError("`type` must be between 1 and 9"))
1073+
end
1074+
10331075
n = length(v)
10341076

10351077
@assert n > 0 # this case should never happen here
10361078

1037-
m = alpha + p * (one(alpha) - alpha - beta)
1038-
# Using fma here avoids some rounding errors when aleph is an integer
1039-
# The use of oftype supresses the promotion caused by alpha and beta
1040-
aleph = fma(n, p, oftype(p, m))
1041-
j = clamp(trunc(Int, aleph), 1, n - 1)
1042-
γ = clamp(aleph - j, 0, 1)
1043-
1044-
if n == 1
1045-
a = v[1]
1046-
b = v[1]
1079+
if type == 1
1080+
return v[clamp(ceil(Int, n*p), 1, n)]
1081+
elseif type == 2
1082+
i = clamp(ceil(Int, n*p), 1, n)
1083+
j = clamp(floor(Int, n*p + 1), 1, n)
1084+
return middle(v[i], v[j])
1085+
elseif type == 3
1086+
return v[clamp(round(Int, n*p), 1, n)]
10471087
else
1048-
a = v[j]
1049-
b = v[j + 1]
1050-
end
1088+
m = alpha + p * (one(alpha) - alpha - beta)
1089+
# Using fma here avoids some rounding errors when aleph is an integer
1090+
# The use of oftype supresses the promotion caused by alpha and beta
1091+
aleph = fma(n, p, oftype(p, m))
1092+
j = clamp(trunc(Int, aleph), 1, n - 1)
1093+
γ = clamp(aleph - j, 0, 1)
1094+
1095+
if n == 1
1096+
a = v[1]
1097+
b = v[1]
1098+
else
1099+
a = v[j]
1100+
b = v[j + 1]
1101+
end
10511102

1052-
# When a ≉ b, b-a may overflow
1053-
# When a ≈ b, (1-γ)*a + γ*b may not be increasing with γ due to rounding
1054-
if isfinite(a) && isfinite(b) &&
1055-
(!(a isa Number) || !(b isa Number) || a b)
1056-
return a + γ*(b-a)
1057-
else
1058-
return (1-γ)*a + γ*b
1103+
try
1104+
# When a ≉ b, b-a may overflow
1105+
# When a ≈ b, (1-γ)*a + γ*b may not be increasing with γ due to rounding
1106+
if isfinite(a) && isfinite(b) &&
1107+
(!(a isa Number) || !(b isa Number) || a b)
1108+
return a + γ*(b-a)
1109+
else
1110+
return (1-γ)*a + γ*b
1111+
end
1112+
catch e
1113+
throw(ArgumentError("error when computing quantile between two data values. " *
1114+
"Pass `type=1` or `type=3` to compute quantiles on types with " *
1115+
"no or limited support for arithmetic operations."))
1116+
end
10591117
end
10601118
end
10611119

10621120
"""
1063-
quantile(itr, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
1121+
quantile(itr, p;
1122+
sorted=false, type::Integer=7, alpha::Real=1.0, beta::Real=alpha)
10641123
10651124
Compute the quantile(s) of a collection `itr` at a specified probability or vector or tuple of
10661125
probabilities `p` on the interval [0,1]. The keyword argument `sorted` indicates whether
10671126
`itr` can be assumed to be sorted.
10681127
1069-
Samples quantile are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
1070-
where `x[j]` is the j-th order statistic of `itr`, `j = floor(n*p + m)`,
1071-
`m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
1072-
1073-
By default (`alpha = beta = 1`), quantiles are computed via linear interpolation between the points
1074-
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `n = length(itr)`. This corresponds to Definition 7
1128+
By default (`type=7`, or equivalently `alpha = beta = 1`),
1129+
quantiles are computed via linear interpolation between the points
1130+
`((k-1)/(n-1), x[k])`, for `k = 1:n` where `x[j]` is the j-th order statistic of `itr`
1131+
and `n = length(itr)`. This corresponds to Definition 7
10751132
of Hyndman and Fan (1996), and is the same as the R and NumPy default.
10761133
1077-
The keyword arguments `alpha` and `beta` correspond to the same parameters in Hyndman and Fan,
1078-
setting them to different values allows to calculate quantiles with any of the methods 4-9
1079-
defined in this paper:
1080-
- Def. 4: `alpha=0`, `beta=1`
1081-
- Def. 5: `alpha=0.5`, `beta=0.5` (MATLAB default)
1082-
- Def. 6: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
1083-
- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
1084-
- Def. 8: `alpha=1/3`, `beta=1/3`
1085-
- Def. 9: `alpha=3/8`, `beta=3/8`
1134+
The keyword argument `type` can be used to choose among the 9 definitions
1135+
in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
1136+
any of the methods 4-9 defined in this paper. It is not allowed to specify both
1137+
kinds of arguments at the same time.
1138+
1139+
Definitions 1 to 3 are discontinuous:
1140+
- `type=1`: `Q(p) = x[ceil(n*p)]` (SAS-3)
1141+
- `type=2`: `Q(p) = middle(x[ceil(n*p), floor(n*p + 1)])` (SAS-5, Stata)
1142+
- `type=3`: `Q(p) = x[round(n*p)]` (SAS-2)
1143+
1144+
Definitions 4 to 9 use linear interpolation between consecutive order statistics.
1145+
Samples quantiles are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`,
1146+
where `j = floor(n*p + m)`, `m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.
1147+
- `type=4`: `alpha=0`, `beta=1` (SAS-1)
1148+
- `type=5`: `alpha=0.5`, `beta=0.5` (MATLAB default)
1149+
- `type=6`: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
1150+
- `type=7`: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and
1151+
`PERCENTILE.INC`, Python `'inclusive'`)
1152+
- `type=8`: `alpha=1/3`, `beta=1/3`
1153+
- `type=9`: `alpha=3/8`, `beta=3/8`
1154+
1155+
Definitions 1 and 3 have the advantage that they work with types that do not support
1156+
all arithmetic operations, such as `Date`.
10861157
10871158
!!! note
10881159
An `ArgumentError` is thrown if `v` contains `NaN` or [`missing`](@ref) values.
@@ -1093,7 +1164,8 @@ defined in this paper:
10931164
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
10941165
*The American Statistician*, Vol. 50, No. 4, pp. 361-365
10951166
1096-
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details the different quantile definitions
1167+
- [Quantile on Wikipedia](https://en.wikipedia.org/wiki/Quantile) details
1168+
the different quantile definitions
10971169
10981170
# Examples
10991171
```jldoctest
@@ -1112,11 +1184,16 @@ julia> quantile(skipmissing([1, 10, missing]), 0.5)
11121184
5.5
11131185
```
11141186
"""
1115-
quantile(itr, p; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1116-
quantile!(collect(itr), p, sorted=sorted, alpha=alpha, beta=beta)
1117-
1118-
quantile(v::AbstractVector, p; sorted::Bool=false, alpha::Real=1.0, beta::Real=alpha) =
1119-
quantile!(sorted ? v : Base.copymutable(v), p; sorted=sorted, alpha=alpha, beta=beta)
1187+
quantile(itr, p; sorted::Bool=false,
1188+
type::Union{Integer, Nothing}=nothing,
1189+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1190+
quantile!(collect(itr), p, sorted=sorted, type=type, alpha=alpha, beta=beta)
1191+
1192+
quantile(v::AbstractVector, p;
1193+
sorted::Bool=false, type::Union{Integer, Nothing}=nothing,
1194+
alpha::Union{Real, Nothing}=nothing, beta::Union{Real, Nothing}=alpha) =
1195+
quantile!(sorted ? v : Base.copymutable(v), p;
1196+
sorted=sorted, type=type, alpha=alpha, beta=beta)
11201197

11211198
# If package extensions are not supported in this Julia version
11221199
if !isdefined(Base, :get_extension)

0 commit comments

Comments
 (0)