More transducers

This has been mentioned  in a few scattered places (e.g. here https://github.com/JuliaLang/julia/issues/15648#issuecomment-511048192, https://github.com/JuliaLang/julia/pull/33526, various slack and discourse threads, etc.), but I think it'd be good to have a general tracking issue for this as a large scale goal.

Currently, the majority of our infrastructure is built on the paradigm of iterators. These are essentially state machines that tell you how to progress from one state to the next.

Something @tkf was rightfully a big champion of was the use of **transducers** instead of iterators where possible, with https://github.com/JuliaFolds/Transducers.jl being his gigantic playground for those ideas. 

Fundamentally, the idea with transducers is that you replace `iterate` as your fundamental operation for traversing data, with `foldl`, and this enables a lot of interesting things. To borrow from the docs in Transducers.jl, this is what
```julia
sum(Iterators.filter(iseven, Iterators.map(x -> 2x, xs)))
```
is essentially doing:
```julia
function map_filter_iterators(xs, init)
    ret = iterate(xs)
    ret === nothing && return init
    acc = init
    @goto filter
    local state, x
    while true
        while true                                    # input
            ret = iterate(xs, state)                  #
            ret === nothing && return acc             #
            @label filter                             #
            x, state = ret                            #
            iseven(x) && break             # filter   :
        end                                #          :
        y = 2x              # imap         :          :
        acc += y    # +     :              :          :
    end             # :     :              :          :
    #                 + <-- imap <-------- filter <-- input
end
```
the Transducers.jl equivalent 
```julia
foldxl(+, xs |> Map(x -> 2x) |> Filter(iseven))
```
does this:
```julia
function map_filter_transducers(xs, init)
    acc = init
    #              input -> Filter --> Map --> +
    for x in xs  # input    :          :       :
        if iseven(x)  #     Filter     :       :
            y = 2x    #                Map     :
            acc += y  #                        +
        end
    end
    return acc
end
```
and this is obtained not though clever compiler magic, but just algorithm design. The difference is that with iterators, one writes a loop and pulls values out of the iterator. The loop owns the iterator. With transducers, the transducer owns the loop and pushes out values.

An important practical benefit of transducers is the space of parallelism. Transducers.jl and things built on it like https://github.com/tkf/ThreadsX.jl give really nice APIs for many parallel workflows because whether an algorithm is amenable to parallelism is built into the representation of a transducer. With iterators, many of these things are quite opaque since the fundamental paradigm of iteration is sequential. 

Finally, I'll also mention that because our intermediate representations (IR) represent loops in terms of `while` loops ( `goto`s)  on iterators, this makes it a real pain to take lowered julia code and find out what the structure and intent of the original code was, and a lot of IR level transformations on Julia code need to do a lot of work to rediscover what the original loop layout was. 

When represented in terms of `fold`s though, we could preserve a lot more structured information at the IR level which could make compiler level loop optimizations easier. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

More transducers #49735

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

More transducers #49735

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions