-
Notifications
You must be signed in to change notification settings - Fork 13
Add isordered #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add isordered #26
Changes from 4 commits
105972e
63b920b
22776a0
c32bc8b
6ca10ba
de66603
87c664c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,10 @@ | ||
name = "DataAPI" | ||
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a" | ||
authors = ["quinnj <[email protected]>"] | ||
version = "1.3.0" | ||
version = "1.4.0" | ||
|
||
[deps] | ||
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a" | ||
|
||
[compat] | ||
julia = "1" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
module DataAPI | ||
|
||
import Dates | ||
|
||
""" | ||
defaultarray(T, N) | ||
|
||
|
@@ -89,6 +91,9 @@ If the collection is not sortable then the order of levels is unspecified. | |
|
||
Contrary to [`unique`](@ref), this function may return values which do not | ||
actually occur in the data, and does not preserve their order of appearance in `x`. | ||
|
||
If a type of `A` implements `levels` with a meaningful order then it should also | ||
implement `isordered` as by defualt `isordered` returns `false`. | ||
""" | ||
function levels(x) | ||
T = Base.nonmissingtype(eltype(x)) | ||
|
@@ -100,6 +105,26 @@ function levels(x) | |
levs | ||
end | ||
|
||
""" | ||
isordered(A) | ||
|
||
Test whether entries in `A` can be compared using `<`, `>` and similar operators, | ||
using the ordering of `levels`. | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
If a type of `A` implements `levels` with a meaningful order then it should also | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nalimilan - could you please comment in the implementation how your proposal should be incorporated? Thank you! |
||
implement `isordered` as by defualt `isordered` returns `false`. | ||
|
||
""" | ||
function isordered end | ||
|
||
isordered(::AbstractArray{<:Union{Missing, AbstractString}}) = true | ||
isordered(::AbstractArray{<:Union{Missing, Real}}) = true | ||
isordered(::AbstractArray{<:Union{Missing, Symbol}}) = true | ||
isordered(::AbstractArray{<:Union{Missing, AbstractChar}}) = true | ||
isordered(::AbstractArray{<:Union{Missing, Dates.Period}}) = true | ||
isordered(::AbstractArray{<:Union{Missing, Dates.TimeType}}) = true | ||
isordered(::AbstractArray{Missing}) = true | ||
isordered(::AbstractArray{<:Any}) = false | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
""" | ||
Between(first, last) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,5 +84,17 @@ end | |
end | ||
|
||
end | ||
|
||
@testset "isordered" begin | ||
|
||
@test isordered([1]) | ||
|
||
@test isordered(Union{Real,Missing}[1]) | ||
@test isordered(["1"]) | ||
@test isordered(['1']) | ||
@test !isordered(Any[1]) | ||
@test !isordered(["1", '1']) | ||
@test !isordered(Union{Char, String}["1", '1']) | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
end | ||
|
||
end # @testset "DataAPI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence raises many issues about what we really want
isordered
to mean. Types that implement levels areAbstractArray
s, and in general they don't know with what eltypes they will be used. So the ordering doesn't depend on them.CategoricalArray
is a very special case, but even there I would say thatisordered
works because the element type isCategoricalValue
, which supports<
. So everything is defined by the element type.Likewise, below, I wouldn't say that
<
is consistent withlevels
. Since<
is defined on elements, not on arrays, andlevels
is defined on arrays, there's only one way to make them consistent: havelevels
return values sorted according to<
/isless
, which is what thelevels
fallback does. So actually it's<
andisless
that have to be consistent (this is probably always the case).What could happen is that a special array type could define
levels
to return something which isn't consistent with<
/isless
. But that doesn't mean that entries in the arrays wouldn't be comparable using<
. Do we want to support this? That sounds a bit weird, but it could happen e.g. with a special type similar toCategoricalArray
but which wouldn't wrap values inCategoricalValue
(IOW a type similar toPooledArray
but for whichlevels
would return the pool). Should this kind of object defineisordered
to returnfalse
? That would be weird since entries could still be compared using<
-- but the result would be different from the order inlevels
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I was thinking how to put it best and I just "kind of" reused what CategoricalArrays.jl defined:
Which implies that
isordered
should tell us if whatlevels(A)
returns is ordered if we use<
for comparison of these elements (so technically ifissorted(levels(A))
returnstrue
).In particular I understood that if
levels
returns something for a type that supports<
that failsissorted(levels(A))
thenisordered
should returnfalse
.This was my understanding of the original intent of what
isordered
should mean i.e. that two conditions must be met:A
supports<
etc.levels
returns values in order consistent with it(but maybe I was wrong).
But if I was right then
levels
andisordered
must be in sync. This is different fromBase.Ordered()
, asBase.Ordered()
is unrelated withlevels
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem is that
isordered
was initially designed with only CategoricalArrays in mind, and for them it happens to be the case thatisordered
,levels
and<
/isless
are all consistent. So now to extend it we have to decide whether that should always be the case (and document it) or not. That would be the simplest and most obvious definition. But it's also quite strict as e.g. a custom array for whichlevels(x) == ["c", "b", "a"]
wouldn't respect it. In short, I think we should know what we want to use this trait for to choose the most useful solution.@quinnj You mentioned at #23 (comment) that Arrow.jl could use this for a custom dict-encoded type. How would that type behave? How would it implement
isordered
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrow spec says:
So it's not really doing much by itself other than preserving the metadata. The
DictEncoded
array type in Arrow.jl is basically just aPooledArray
, but also includes the booleanisOrdered
field; but any further "semantics" would need to be handled by the user or custom comparison code (or in the example of converting to a CategoricalArray, it could query theDataAPI.isordered
property).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be OK with both. The original definition came from CategoricalArrays.jl which are special, so we can be more flexible here I think (in which case
isordered
is giving us roughly the same asOrdered
trait in Julia Base).EDIT: so then the question is if we need both (the benefit of having it in DataAPI.jl is that we can change things more easily than in Julia Base).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
Arrow.DictEncoded
doesn't want to expose the custom order of levels using generic API, then I don't think it makes sense for it to implementisordered
. Otherwise that function will be useless as it won't allow writing generic code that can do something with levels: you would know there's an order, but you wouldn't be able to retrieve it.One issue with
Base.OrderStyle
is that it's supposed to work when passing only the type, but for CategoricalArray it's a runtime property...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But
Arrow.DictEncoded
does want to expose the custom order of levels? I'd like to implementDataAPI.refarray
,DataAPI.refpool
,DataAPI.levels
andDataAPI.isordered
on DictEncoded, so it can participate in all the optimizations built around those.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought you didn't want that as you said:
Then we can require
isordered
to be consistent withlevels
at least, but mention that it may be inconsistent with<
andisless
. But we shouldn't say thatisordered
should returntrue
for types that implementlevels
, as it wouldn't always be the case forCategoricalArray
andArrow.DictEncoded
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, that makes sense to me. Yeah, for
Arrow.DictEncoded
,isordered
is an independent property.