Skip to content

Conversation

jakewilliami
Copy link
Contributor

@jakewilliami jakewilliami commented Aug 6, 2025

This PR fixes two bugs relating to last(::Zip), closing #58922.

First: there is inconsistent behaviour for last(::Zip) with differing iterator size types (a mixture of known-size and not) are passed in the Zip, causing a MethodError.

last(::Zip) works with finite zipped iterators of different lengths, but fails when one of them is an infinite iterator. In the case where there are a mixture of known-length (finite) and infinite iterators in the Zip, we can know their length statically. (I call these Zips "finite-guarded," because the Zip is finite due to its finite component.)

The desired behaviour is to match the behaviour of last with finite iterators of different lengths.

Second: while standardising this behaviour, a bug in last(::Zip) for OffsetArrays is fixed. The issue was subtle and didn't error, but produced the wrong answer (noticed by @adienes).

We also add an explicit ArgumentError when last is called on a Zip whose size is unknown (good call by @Seelengrab). Previously this was a MethodError for lastindex, used by the previous implementation of last(::Zip).

This PR does not give support to any functionality involving iterators of unknown size. This may be done in future.

@jakewilliami
Copy link
Contributor Author

Thanks for the edits @adienes! Re-committed them via rebase to keep the git history clean 🙂

@adienes
Copy link
Member

adienes commented Aug 6, 2025

seems like a good change to me

another pair of eyes is always good; maybe @Seelengrab or @jakobnissen are interested in giving feedback?

Add tests for zips where one of the iterators are not bounded but the
zip iterator is bounded by the finite one.

Addresses #58922

@adienes had a good idea to test against OffsetArrays
Inconsistent behaviour for `last(::Zip)` with differing iterator size
types causes `MethodError`.  `last` works with finite zipped iterators
of different length, but fails when one of them is an infinite
iterator.  The desired behaviour is to match the behaviour of `last`
with finite iterators of different lengths.

Closes #58922
@jakewilliami
Copy link
Contributor Author

jakewilliami commented Aug 6, 2025

@adienes I had to remove this test:

@test last(collect(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3)))) == (10, 10)

As collecting failed:

julia> collect(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3)))
ERROR: DimensionMismatch: a has axes (3:12,), b has axes (4:13,), mismatch at dim 1

(CI/CD.) We already test

@test last(zip(OffsetArray(1:10, 2), OffsetArray(1:10, 3))) == (10, 10)

So I think that's fine. However, upon re-pushing, the pipeline is now failing because it failed to clone this repository: CI/CD ref.


Edit: just as I posted this, it looks like the pipelines are running again 😅

@@ -1700,6 +1701,9 @@ function _nth(::IteratorSize, itr, n::Integer)
y === nothing && throw(BoundsError(itr, n))
y[1]
end

_nth(::Union{HasShape, HasLength}, z::Zip, n::Integer) = Base.map(nth(n), z.is)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SizeUnknown?

Copy link
Member

@adienes adienes Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will hit the generic fallback above that just calls iterate n times; _nth specializations exist only to be fast paths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I didn't consider that case. This does indeed fail on iterators of unknown size. For example,

Iterators.filter(x -> x > 0, -5:5)

As humans, it's obvious that this is synonymous to 1:5, but it doesn't implement length.

We might need use _zip_lengths_finite_equal (we explored using internal Iterator length functions in #58922 but initially decided against it). Do you think this is sufficient (CC @adienes)?

function last(z::Zip)
    n = last(_zip_lengths_finite_equal(z.is))
    return nth(z, n)
end

Is it worth keeping the length implementation for iterators whose size is known?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that will fail if we have two iterators of unknown size in the Zip, e.g.,

zip(Iterators.filter(x -> x > 0, -5:5), Iterators.filter(x -> x % 2 == 0, -5:5))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how last can know the length of the Zip if all constituent iterators have SizeUnknown. Do we explicitly error in this case?

nth supports iterators whose size are unknown. It would be nice to support it in last but I don't see any efficient way to get the length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @Seelengrab. I agree, I think we don't properly handle the second option. What do you think of 193b6fc? Explicitly throwing an ArgumentError seems better than letting it throw a MethodError when trying to compute length.

Copy link
Contributor

@Seelengrab Seelengrab Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I'm not even sure SizeUnknown of a constituent iterator is handled well for IteratorSize of a Zip. Like I said, if there is any HasLength being zipped over, there is at least an upper limit, but length cannot accurately report that since the SizeUnknown may finish first. Yet, it is definitely still possible to return the last zipped element, since you can just iterate up to the length of that HasLength. I can provide a nice MWE showing what I mean later today.

SizeUnknown is really a special case that needs to be treated differently on a per-case basis, so I'm not sure just erroring is nice from a principal POV. It's probably fine for this PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> struct Collatz
           start::Int
       end

julia> Base.eltype(::Type{Collatz}) = Int

julia> Base.IteratorSize(::Type{Collatz}) = Base.SizeUnknown()

julia> Base.iterate(c::Collatz) = (c.start, c.start)

julia> function Base.iterate(c::Collatz, last::Int)
           isone(last) && return nothing
           next = if iseven(last)
               last ÷ 2
           else
               3*last + 1
           end
           (next, next)
       end

julia> zip(1:10, Collatz(3)) |> collect
8-element Vector{Tuple{Int64, Int64}}:
 (1, 3)
 (2, 10)
 (3, 5)
 (4, 16)
 (5, 8)
 (6, 4)
 (7, 2)
 (8, 1)

This is an iterator that just produces the Collatz sequence for a given number. This can do anything from being shorter than the other zipped iterator or longer than the zipped iterator (or loop infinitely on its own, who knows!). Still, if it does finish for a given number, we can definitely zip this with a 1:n range and use last on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_nth(::Union{HasShape, HasLength}, z::Zip, n::Integer) = Base.map(nth(n), z.is)
_nth(::IteratorSize, z::Zip, n::Integer) = Base.map(nth(n), z.is)

and defer to the component iterators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in da76f8c 🙂

This should be explicitly handled (astutely noted by @Seelengrab).

Addresses #58922
@adienes
Copy link
Member

adienes commented Aug 6, 2025

one wrinkle I just realized is that last promises to always be O(1), but nth does not so technically speaking maybe shouldn't be used here. I'll have to think about how to get around that

@jakewilliami
Copy link
Contributor Author

Interesting, I don't think I knew that last asserts that it's $O(1)$ (or perhaps I forgot). Nice catch. It's a promise that was made many years ago by @JeffBezanson.

I don't think we should create an implementation of last that isn't $O(1)$, but I do think that there are valid issues addressed in #58922. I will have a think as well. At the very least, we should explicitly error in the case where we would be using nth, and (importantly) fix the bug in last on zipped OffsetArrays.

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 7, 2025

Is that promise still in effect in the current doc? The linked doc seems to be exclusively about collections that are indexable in O(1), which is obviously impossible for general iterators. I don't see an issue with extending that.

@adienes
Copy link
Member

adienes commented Aug 7, 2025

that is unclear to me. although only as a matter of precedent, it does seem that every other implementation of last is O(1) and none of them have iterate-until-done fallbacks

@Seelengrab
Copy link
Contributor

Let's put that up to triage then, to get some more eyes on this & make a decision about whether it's ok for last to be something other than $O(1)$.

@Seelengrab Seelengrab added the triage This should be discussed on a triage call label Aug 7, 2025
@jakewilliami
Copy link
Contributor Author

What of the time complexity of this implementation of taking the last $n$ elements of an iterator?

last(itr, n::Integer) = reverse!(collect(Iterators.take(Iterators.reverse(itr), n)))

@adienes
Copy link
Member

adienes commented Aug 7, 2025

the cost of that method will be sensitive to the number of elements taken, but not sensitive to the length of the iterator overall

function last(z::Zip)
IteratorSize(z) == SizeUnknown() &&
throw(ArgumentError("Cannot get last element of zipped iterators of undefined lengths"))
return nth(z, length(z))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing
getindex.(z.is, minimum(Base.map(lastindex, z.is)))
to
nth(z, length(z))
is a clear win and an easy merge. Thanks!

@LilithHafner
Copy link
Member

Triage response to @Seelengrab's question

No there is no prohibition on a slow implementation of last (similar to length).

However, this PR also doesn't introduce any slow last methods that aren't already accessible by calling last on the component iterators.

@adienes
Copy link
Member

adienes commented Aug 19, 2025

No there is no prohibition on a slow implementation of last

I don't mean to disobey triage, but if this is indeed the case then I think the docstring needs updating, as it explicitly says

last(coll)
Get the last element of an ordered collection, if it can be computed in O(1) time

and I also note the distinction made here that

It might be ok for length to be O(n) provided the iterator indeed has a fixed, known length (e.g. a linked list)

but that PR's proposed iterate-based fallback for length seems to have been fairly negatively received for the general case

(although, in any case as you note I don't think this discussion is PR blocking)

Rather than implenenting internal nth method for HasShape and
HasLength, dispatch generally on IteratorSize and defer to the
component iterators.

Suggestion by @LilithHafner
Although it may be useful to through an ArgumentError rather than a
MethodError if we can't get the length of an iterator whose size is
unknown, SizeUnknown does not guarantee that length will throw.
Therefore, we remove this check and defer to a MethodError if the
iterator does not implement length.

Suggestion by @LilithHafner
@adienes adienes merged commit 822be59 into JuliaLang:master Aug 19, 2025
7 checks passed
xal-0 pushed a commit to xal-0/julia that referenced this pull request Aug 20, 2025
…size types (JuliaLang#59217)

This PR fixes two bugs relating to `last(::Zip)`, closing JuliaLang#58922.

First: there is inconsistent behaviour for `last(::Zip)` with differing
iterator size types (a mixture of known-size and not) are passed in the
`Zip`, causing a `MethodError`.

`last(::Zip)` works with finite zipped iterators of different lengths,
but fails when one of them is an infinite iterator. In the case where
there are a mixture of known-length (finite) and infinite iterators in
the `Zip`, we can know their length statically. (I call these `Zip`s
"finite-guarded," because the `Zip` is finite due to its finite
component.)

The desired behaviour is to match the behaviour of `last` with finite
iterators of different lengths.

Second: while standardising this behaviour, [a bug in `last(::Zip)` for
`OffsetArray`s](JuliaLang#58922 (comment))
is fixed. The issue was subtle and didn't error, but produced the wrong
answer (noticed by @adienes).

We also add an explicit `ArgumentError` when `last` is called on a `Zip`
whose size is unknown (good call by @Seelengrab). Previously this was a
`MethodError` for `lastindex`, used by the [previous implementation of
`last(::Zip)`](https://github.com/JuliaLang/julia/blob/80f7db8e51b2ba1dd21e913611c23a6d5b75ecab/base/iterators.jl#L476).

This PR does **not** give support to any [functionality involving
iterators of unknown
size](https://github.com/JuliaLang/julia/pull/59217/files#discussion_r2256315847).
This may be done in future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behaviours for last(::Zip) with differing iterator size types
5 participants