-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Fix inconsistent behaviour for last(::Zip)
with differing iterator size types
#59217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the edits @adienes! Re-committed them via |
seems like a good change to me another pair of eyes is always good; maybe @Seelengrab or @jakobnissen are interested in giving feedback? |
Inconsistent behaviour for `last(::Zip)` with differing iterator size types causes `MethodError`. `last` works with finite zipped iterators of different length, but fails when one of them is an infinite iterator. The desired behaviour is to match the behaviour of `last` with finite iterators of different lengths. Closes #58922
@adienes I had to remove this test: @test last(collect(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3)))) == (10, 10) As julia> collect(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3)))
ERROR: DimensionMismatch: a has axes (3:12,), b has axes (4:13,), mismatch at dim 1 (CI/CD.) We already test @test last(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3))) == (10, 10) So I think that's fine. However, upon re-pushing, the pipeline is now failing because it failed to clone this repository: CI/CD ref. Edit: just as I posted this, it looks like the pipelines are running again 😅 |
base/iterators.jl
Outdated
@@ -1700,6 +1701,9 @@ function _nth(::IteratorSize, itr, n::Integer) | |||
y === nothing && throw(BoundsError(itr, n)) | |||
y[1] | |||
end | |||
|
|||
_nth(::Union{HasShape, HasLength}, z::Zip, n::Integer) = Base.map(nth(n), z.is) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about SizeUnknown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will hit the generic fallback above that just calls iterate
n times; _nth
specializations exist only to be fast paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I didn't consider that case. This does indeed fail on iterators of unknown size. For example,
Iterators.filter(x -> x > 0, -5:5)
As humans, it's obvious that this is synonymous to 1:5
, but it doesn't implement length
.
We might need use _zip_lengths_finite_equal
(we explored using internal Iterator
length functions in #58922 but initially decided against it). Do you think this is sufficient (CC @adienes)?
function last(z::Zip)
n = last(_zip_lengths_finite_equal(z.is))
return nth(z, n)
end
Is it worth keeping the length
implementation for iterators whose size is known?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that will fail if we have two iterators of unknown size in the Zip
, e.g.,
zip(Iterators.filter(x -> x > 0, -5:5), Iterators.filter(x -> x % 2 == 0, -5:5))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how last
can know the length of the Zip
if all constituent iterators have SizeUnknown
. Do we explicitly error in this case?
nth
supports iterators whose size are unknown. It would be nice to support it in last
but I don't see any efficient way to get the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying @Seelengrab. I agree, I think we don't properly handle the second option. What do you think of 193b6fc? Explicitly throwing an ArgumentError
seems better than letting it throw a MethodError
when trying to compute length
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I'm not even sure SizeUnknown
of a constituent iterator is handled well for IteratorSize
of a Zip
. Like I said, if there is any HasLength
being zipped over, there is at least an upper limit, but length
cannot accurately report that since the SizeUnknown
may finish first. Yet, it is definitely still possible to return the last zipped element, since you can just iterate up to the length of that HasLength
. I can provide a nice MWE showing what I mean later today.
SizeUnknown
is really a special case that needs to be treated differently on a per-case basis, so I'm not sure just erroring is nice from a principal POV. It's probably fine for this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> struct Collatz
start::Int
end
julia> Base.eltype(::Type{Collatz}) = Int
julia> Base.IteratorSize(::Type{Collatz}) = Base.SizeUnknown()
julia> Base.iterate(c::Collatz) = (c.start, c.start)
julia> function Base.iterate(c::Collatz, last::Int)
isone(last) && return nothing
next = if iseven(last)
last ÷ 2
else
3*last + 1
end
(next, next)
end
julia> zip(1:10, Collatz(3)) |> collect
8-element Vector{Tuple{Int64, Int64}}:
(1, 3)
(2, 10)
(3, 5)
(4, 16)
(5, 8)
(6, 4)
(7, 2)
(8, 1)
This is an iterator that just produces the Collatz sequence for a given number. This can do anything from being shorter than the other zipped iterator or longer than the zipped iterator (or loop infinitely on its own, who knows!). Still, if it does finish for a given number, we can definitely zip this with a 1:n
range and use last
on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_nth(::Union{HasShape, HasLength}, z::Zip, n::Integer) = Base.map(nth(n), z.is) | |
_nth(::IteratorSize, z::Zip, n::Integer) = Base.map(nth(n), z.is) |
and defer to the component iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in da76f8c 🙂
This should be explicitly handled (astutely noted by @Seelengrab). Addresses #58922
one wrinkle I just realized is that |
Interesting, I don't think I knew that I don't think we should create an implementation of |
Is that promise still in effect in the current doc? The linked doc seems to be exclusively about collections that are indexable in O(1), which is obviously impossible for general iterators. I don't see an issue with extending that. |
that is unclear to me. although only as a matter of precedent, it does seem that every other implementation of |
Let's put that up to triage then, to get some more eyes on this & make a decision about whether it's ok for |
What of the time complexity of this implementation of taking the last last(itr, n::Integer) = reverse!(collect(Iterators.take(Iterators.reverse(itr), n))) |
the cost of that method will be sensitive to the number of elements taken, but not sensitive to the length of the iterator overall |
base/iterators.jl
Outdated
function last(z::Zip) | ||
IteratorSize(z) == SizeUnknown() && | ||
throw(ArgumentError("Cannot get last element of zipped iterators of undefined lengths")) | ||
return nth(z, length(z)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing
getindex.(z.is, minimum(Base.map(lastindex, z.is)))
to
nth(z, length(z))
is a clear win and an easy merge. Thanks!
Triage response to @Seelengrab's question No there is no prohibition on a slow implementation of However, this PR also doesn't introduce any slow |
I don't mean to disobey triage, but if this is indeed the case then I think the docstring needs updating, as it explicitly says
and I also note the distinction made here that
but that PR's proposed (although, in any case as you note I don't think this discussion is PR blocking) |
Rather than implenenting internal nth method for HasShape and HasLength, dispatch generally on IteratorSize and defer to the component iterators. Suggestion by @LilithHafner
Although it may be useful to through an ArgumentError rather than a MethodError if we can't get the length of an iterator whose size is unknown, SizeUnknown does not guarantee that length will throw. Therefore, we remove this check and defer to a MethodError if the iterator does not implement length. Suggestion by @LilithHafner
…size types (JuliaLang#59217) This PR fixes two bugs relating to `last(::Zip)`, closing JuliaLang#58922. First: there is inconsistent behaviour for `last(::Zip)` with differing iterator size types (a mixture of known-size and not) are passed in the `Zip`, causing a `MethodError`. `last(::Zip)` works with finite zipped iterators of different lengths, but fails when one of them is an infinite iterator. In the case where there are a mixture of known-length (finite) and infinite iterators in the `Zip`, we can know their length statically. (I call these `Zip`s "finite-guarded," because the `Zip` is finite due to its finite component.) The desired behaviour is to match the behaviour of `last` with finite iterators of different lengths. Second: while standardising this behaviour, [a bug in `last(::Zip)` for `OffsetArray`s](JuliaLang#58922 (comment)) is fixed. The issue was subtle and didn't error, but produced the wrong answer (noticed by @adienes). We also add an explicit `ArgumentError` when `last` is called on a `Zip` whose size is unknown (good call by @Seelengrab). Previously this was a `MethodError` for `lastindex`, used by the [previous implementation of `last(::Zip)`](https://github.com/JuliaLang/julia/blob/80f7db8e51b2ba1dd21e913611c23a6d5b75ecab/base/iterators.jl#L476). This PR does **not** give support to any [functionality involving iterators of unknown size](https://github.com/JuliaLang/julia/pull/59217/files#discussion_r2256315847). This may be done in future.
This PR fixes two bugs relating to
last(::Zip)
, closing #58922.First: there is inconsistent behaviour for
last(::Zip)
with differing iterator size types (a mixture of known-size and not) are passed in theZip
, causing aMethodError
.last(::Zip)
works with finite zipped iterators of different lengths, but fails when one of them is an infinite iterator. In the case where there are a mixture of known-length (finite) and infinite iterators in theZip
, we can know their length statically. (I call theseZip
s "finite-guarded," because theZip
is finite due to its finite component.)The desired behaviour is to match the behaviour of
last
with finite iterators of different lengths.Second: while standardising this behaviour, a bug in
last(::Zip)
forOffsetArray
s is fixed. The issue was subtle and didn't error, but produced the wrong answer (noticed by @adienes).We also add an explicit
ArgumentError
whenlast
is called on aZip
whose size is unknown (good call by @Seelengrab). Previously this was aMethodError
forlastindex
, used by the previous implementation oflast(::Zip)
.This PR does not give support to any functionality involving iterators of unknown size. This may be done in future.