-
Notifications
You must be signed in to change notification settings - Fork 13
Add isordered #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add isordered #26
Changes from 4 commits
105972e
63b920b
22776a0
c32bc8b
6ca10ba
de66603
87c664c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,10 @@ | ||
| name = "DataAPI" | ||
| uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a" | ||
| authors = ["quinnj <[email protected]>"] | ||
| version = "1.3.0" | ||
| version = "1.4.0" | ||
|
|
||
| [deps] | ||
| Dates = "ade2ca70-3891-5945-98fb-dc099432e06a" | ||
|
|
||
| [compat] | ||
| julia = "1" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,7 @@ | ||
| module DataAPI | ||
|
|
||
| import Dates | ||
|
|
||
| """ | ||
| defaultarray(T, N) | ||
|
|
||
|
|
@@ -89,6 +91,9 @@ If the collection is not sortable then the order of levels is unspecified. | |
|
|
||
| Contrary to [`unique`](@ref), this function may return values which do not | ||
| actually occur in the data, and does not preserve their order of appearance in `x`. | ||
|
|
||
| If a type of `A` implements `levels` with a meaningful order then it should also | ||
| implement `isordered` as by defualt `isordered` returns `false`. | ||
| """ | ||
| function levels(x) | ||
| T = Base.nonmissingtype(eltype(x)) | ||
|
|
@@ -100,6 +105,26 @@ function levels(x) | |
| levs | ||
| end | ||
|
|
||
| """ | ||
| isordered(A) | ||
|
|
||
| Test whether entries in `A` can be compared using `<`, `>` and similar operators, | ||
| using the ordering of `levels`. | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If a type of `A` implements `levels` with a meaningful order then it should also | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nalimilan - could you please comment in the implementation how your proposal should be incorporated? Thank you! |
||
| implement `isordered` as by defualt `isordered` returns `false`. | ||
|
||
| """ | ||
| function isordered end | ||
|
|
||
| isordered(::AbstractArray{<:Union{Missing, AbstractString}}) = true | ||
| isordered(::AbstractArray{<:Union{Missing, Real}}) = true | ||
| isordered(::AbstractArray{<:Union{Missing, Symbol}}) = true | ||
| isordered(::AbstractArray{<:Union{Missing, AbstractChar}}) = true | ||
| isordered(::AbstractArray{<:Union{Missing, Dates.Period}}) = true | ||
| isordered(::AbstractArray{<:Union{Missing, Dates.TimeType}}) = true | ||
| isordered(::AbstractArray{Missing}) = true | ||
| isordered(::AbstractArray{<:Any}) = false | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| """ | ||
| Between(first, last) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,5 +84,17 @@ end | |
| end | ||
|
|
||
| end | ||
|
|
||
| @testset "isordered" begin | ||
|
|
||
| @test isordered([1]) | ||
|
||
| @test isordered(Union{Real,Missing}[1]) | ||
| @test isordered(["1"]) | ||
| @test isordered(['1']) | ||
| @test !isordered(Any[1]) | ||
| @test !isordered(["1", '1']) | ||
| @test !isordered(Union{Char, String}["1", '1']) | ||
bkamins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| end | ||
|
|
||
| end # @testset "DataAPI" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence raises many issues about what we really want
isorderedto mean. Types that implement levels areAbstractArrays, and in general they don't know with what eltypes they will be used. So the ordering doesn't depend on them.CategoricalArrayis a very special case, but even there I would say thatisorderedworks because the element type isCategoricalValue, which supports<. So everything is defined by the element type.Likewise, below, I wouldn't say that
<is consistent withlevels. Since<is defined on elements, not on arrays, andlevelsis defined on arrays, there's only one way to make them consistent: havelevelsreturn values sorted according to</isless, which is what thelevelsfallback does. So actually it's<andislessthat have to be consistent (this is probably always the case).What could happen is that a special array type could define
levelsto return something which isn't consistent with</isless. But that doesn't mean that entries in the arrays wouldn't be comparable using<. Do we want to support this? That sounds a bit weird, but it could happen e.g. with a special type similar toCategoricalArraybut which wouldn't wrap values inCategoricalValue(IOW a type similar toPooledArraybut for whichlevelswould return the pool). Should this kind of object defineisorderedto returnfalse? That would be weird since entries could still be compared using<-- but the result would be different from the order inlevels...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I was thinking how to put it best and I just "kind of" reused what CategoricalArrays.jl defined:
Which implies that
isorderedshould tell us if whatlevels(A)returns is ordered if we use<for comparison of these elements (so technically ifissorted(levels(A))returnstrue).In particular I understood that if
levelsreturns something for a type that supports<that failsissorted(levels(A))thenisorderedshould returnfalse.This was my understanding of the original intent of what
isorderedshould mean i.e. that two conditions must be met:Asupports<etc.levelsreturns values in order consistent with it(but maybe I was wrong).
But if I was right then
levelsandisorderedmust be in sync. This is different fromBase.Ordered(), asBase.Ordered()is unrelated withlevels.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem is that
isorderedwas initially designed with only CategoricalArrays in mind, and for them it happens to be the case thatisordered,levelsand</islessare all consistent. So now to extend it we have to decide whether that should always be the case (and document it) or not. That would be the simplest and most obvious definition. But it's also quite strict as e.g. a custom array for whichlevels(x) == ["c", "b", "a"]wouldn't respect it. In short, I think we should know what we want to use this trait for to choose the most useful solution.@quinnj You mentioned at #23 (comment) that Arrow.jl could use this for a custom dict-encoded type. How would that type behave? How would it implement
isordered?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrow spec says:
So it's not really doing much by itself other than preserving the metadata. The
DictEncodedarray type in Arrow.jl is basically just aPooledArray, but also includes the booleanisOrderedfield; but any further "semantics" would need to be handled by the user or custom comparison code (or in the example of converting to a CategoricalArray, it could query theDataAPI.isorderedproperty).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be OK with both. The original definition came from CategoricalArrays.jl which are special, so we can be more flexible here I think (in which case
isorderedis giving us roughly the same asOrderedtrait in Julia Base).EDIT: so then the question is if we need both (the benefit of having it in DataAPI.jl is that we can change things more easily than in Julia Base).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
Arrow.DictEncodeddoesn't want to expose the custom order of levels using generic API, then I don't think it makes sense for it to implementisordered. Otherwise that function will be useless as it won't allow writing generic code that can do something with levels: you would know there's an order, but you wouldn't be able to retrieve it.One issue with
Base.OrderStyleis that it's supposed to work when passing only the type, but for CategoricalArray it's a runtime property...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But
Arrow.DictEncodeddoes want to expose the custom order of levels? I'd like to implementDataAPI.refarray,DataAPI.refpool,DataAPI.levelsandDataAPI.isorderedon DictEncoded, so it can participate in all the optimizations built around those.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought you didn't want that as you said:
Then we can require
isorderedto be consistent withlevelsat least, but mention that it may be inconsistent with<andisless. But we shouldn't say thatisorderedshould returntruefor types that implementlevels, as it wouldn't always be the case forCategoricalArrayandArrow.DictEncoded.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, that makes sense to me. Yeah, for
Arrow.DictEncoded,isorderedis an independent property.