-
Notifications
You must be signed in to change notification settings - Fork 8
Description
I have been doing some work on the package lately and have some thoughts on what needs
improving.
Problems
Here are, as I see it, some problems with the package in its current state.
API Entropy
There are many different ways of doing the same thing which are in some cases distinct in arbitrary
and confusing ways, while in other cases are subtly but curcially different. For example, the
package exports transduce
as well as foldxl
and adds methods to foldl
. transduce
and
foldxl
differ primarily in that transduce
may return a Reduced
, while foldxl
and foldl
are
largely equivalent (I don't think they have exactly the same methods but it's hard to even tell).
Opacity of Folds
Functions such as copy
and collect
arguably don't look like one would reasonably expect based on
the fundamentals such as foldxl
. For example, it seems reasonable to expect something very much
like
collect(xf::Transducer, itr) = foldxl(append!!, xf, itr; init=Empty(Vector))
The truth is much more complicated than this, for good reason. I am not suggesting that we
compromise the convenience of functions such as collect
, however I consider it a problem when
these functions rely on complicated opaque voodoo.
The catch here is that, for obvious performance reasons, one should not simply append!!
separate
arrays, instead Transducers figures out that it can allocate an undef
Vector
and insert objects
into it. This situation is greatly complicated by the need to infer a container type prior to
executing the fold, which Julia itself doesn't provide an API for handling (more on this in a
moment).
In my view, this issue is of more than mere aesthetic interest. It leads to a lot of behavior which
reasonable users might not expect, for example, the following 3 lines all error on latest main
:
copy(Map(x -> x=>(x+1))'(merge!!), Dict, 1:3) # MethodError: copy lacks Reduction argument
copy(Map(x -> SingletonDict(x=>x+1)), Dict, 1:3) # BoundsError, bet you didn't see that coming
copy(Map(x -> SingletonDict(x=>x+1))'(merge!!), Dict, 1:3) # same MethodError as first one
Too Much Dependence on Complicated Type Inference Voodoo
I am worried code using Transducers.jl could easily become too difficult to optimize, and due to use
of undocumented Julia internals might degrade over time. The crux of this is a method of
transduce
used in practically every invocation of Transducers.jl which one immediately runs into
when analyzing code with JET.jl, found
here.
In particular, there can be type instability arising from convert(realtype, ur_result)
. This is
an extremely difficult problem to address because it is more of a fundamental problem with Julia
than Transducers.jl in particular. The issue is of course that Julia does not expose its type
inference machinery, so calls to Core.Compiler.return_type
may be unavoidable for the crucial use
case of inferring an appropriate container type before folding is executed.
Threaded Functions are Inflexible
Threaded functions, i.e. those based on foldxt
, rely on being provided an associative reduction
step. This is reasonable as a default, but unfortunately there is no way to check whether a step is
associative a priori, which can lead to undefined behavior. From a user perspective, this can lead
to some rather confusing results when exchanging foldxl
for foldxt
. The associative requirement
comes from the need for foldxt
to know how to combine results from each thread, but it currently
doesn't allow for any way of specifying non-associative operations in the thread-local case in
addition to a way to combine those results, meaning that users essentially need to write the
threading code "from scratch" if attempting this. Note that non-associative operations such as
push!!
are quite common.
Possible Solutions
I propose that transduce
be made private (this was suggested by TKF in the code) and that package
internals should be rewritten where appropriate, and users should be instructed to rely on
foldxl(step, xf, itr; init=Init(step))
(or something very similar) as a "fundamental" method. I'm not proposing that frequently used
convenience functions such as collect
, copy
and their threaded equivalents be discarded, but
they should be related to foldxl
in a more transparent and understandable way.
One of the prerequisites for achieving this is a more accessible, transparent way of dealing with
cases where, instead of e.g. combining containers with an associative step such as append!!
, one
instead needs to insert results into a pre-allocated container. To that end, I propose that the API
be expanded to allow for something like this
collect(xf::Transducer, itr) = foldxl(insertinto!!, xf, itr; init=Initialized(Vector))
Granted, I'm not yet entirely confident that such an interface is feasible, but if it is it seems
greatly preferrable to the current state of affairs in which the only equivalent to such a call
involves lots of rather inscrutible internals; surely highly undesirable for such a simple and
common use case. Achieving this would likely require a bit of reworking of BangBang.jl. Note that
that the concept of default initialization already exists to some extent, for example there is
Init(op)
, but it is not applied uniformly in Transducers.jl internals.
As for foldxt
, I believe it should be based on a method something like
foldxt(local_step, step, xflocal::Transducer, xf::Transducer, itr; init, kw...)
or perhaps something even more fundamental. We could then create additional methods, more similar
to the current methods with reasonable defaults.
Lastly, while I don't know if it's possible to clean up the type inference problems in a consistent
way, I think we should at least try. For one, I believe the notorious transduce
method with
Core.Compiler.return_type
was written quite a few minor versions of Julia ago. We might also be
able to branch the behavior at compile time so that it does "normal stuff" by default and falls back
to Core.Compiler.return_type
when there is no better option. It might also be worth describing
the issue to some of the core Julia people in detail in the hope of getting useful advice.