-
Notifications
You must be signed in to change notification settings - Fork 75
Description
Current status
Currently a pyconvert rule consists of:
- the source python type
t
, which the rule can convert from; - the target julia type,
T
, which the rule can convert to; - the
priority
of the rule; and - the function
func
implementing the rule.
When pyconvert(R, x)
runs, it first filters the list of rules according to t
and T
(roughly pyisinstance(x, t)
and typeintersect(R, T) != Union{}
). The rules are then ordered first by priority
, then by the specificity of t
, then by the order the rules were defined.
The priorities are:
jlwrap
: for wrapped julia objects by just unwrapping them;array
: for array-like objects (buffers, numpy arrays, ...);canonical
: for the canonical conversion for a type, e.g.float
toFloat64
;normal
: for all other reasonable conversions.
The priorities are a bit of a hack, to work around the fact that ordering by specificity of t
isn't quite right. For example, we always want to convert julia objects by unwrapping them first, so we need their rules to come first, even if the object also happens to be a Mapping
and we are converting to Dict
, we don't want to use the generic Mapping
to Dict
rule. And if the object is array-like, we want to convert by getting at the underlying memory instead of using the generic Sequence
to Array
rule.
The proposal
So my proposal is to remove priority
and replace it with:
- the scope julia type
S
, which must be a supertype ofT
.
We further filter rules by S
(R <: S
, except if R isa Union
then just one component has to match).
For ordering rules, we no longer order by priority, but do order by specificity of S
, i.e. we prefer the smallest S
that contains R
.
You are only allowed to create rules where you "own" either t
or S
.
Discussion
This means you can only have S=Any
if you own t
. PythonCall will continue to "own" the Python standard library, and most rules in PythonCall will have S=Any
. The exception is for some things currently in the normal
priority. For example we convert None
to Nothing
canonically but can also go to Missing
. In the new system, the rules will have T=Nothing, S=Any
and T=S=Missing
, so you generically get Nothing
but can get Missing
if you ask for it. Similarly tuple
canonically converts to Tuple
but can also go to Array
, the rules for which will become T=Tuple, S=Any
and T=Array S=AbstractArray
, so you will get an Array
if you specify Array
or AbstractArray
.
If you don't own t
, then you must own S
. This lets you define e.g. a generic conversion rule for list
to some new MyArray
you invented. But you can only use the rule if you specify pyconvert(MyArray, x)
. Doing pyconvert(AbstractArray, x)
or pyconvert(Any, x)
will not use the rule. Hence we have well-scoped rules, avoid piracy, avoid cases where the conversion rules applied depend on which packages are loaded.
In particular, since passing Python objects to Julia in JuliaCall normally uses pyconvert(Any, x)
, only rules created by the "owner" of pytype(x)
are applied. This makes passing Python values around predictable - some third-party package defining their list
to MyArray
rule will not affect how list
gets passed to Julia by default.
What about jlwrap
and array
?
I don't think this new scheme lets us still enforce doing these rules first. For example, we have a general rule t=JuliaAnyValue, T=Any, S=Any
which unwraps the Julia object and converts it to the target type. We also have a general rule t=Mapping, T=Dict, S=Any
. If we have a JuliaDictValue
, which subtypes both JuliaAnyValue
and Mapping
and do pyconvert(Any, x)
, then filtering keeps both rules, and ordering by specificity of t
is arbitrary - the MRO could be either way around. The only distinction is in T
- does it make sense to order by the least specific T
too??
Let's consider arrays, where we have t=<buffer>, T=PyArray, S=Any
but also t=Sequence, T=PyList, S=Any
. Again the ordering is arbitrary. In this case, we actually insert the pseudo-type <buffer>
into the type ordering ourselves - currently we put it as high (near object
, less specific) but maybe we should put it at the bottom (most specific) so it's always picked?? Or do this but with a second <high-priority-buffer>
pseudo-type.
Or we just continue to use priorities, or special-case handling of rules for jlwrap
and array
- assume they are always high-priority. I don't know. Either way, all of this should be internal - the user-exposed functionality for rules should only specify t
, T
, S
and func
.
Worked examples
Here are some rules for t=list
:
T=PyArray, S=Any: canonical conversion to a
PyArray, used if you specify converting to
PyArrayor
AbstractArrayor
Any`.T=Array, S=DenseArray
: used if you specify converting toArray
orDenseArray
, butAbstractArray
gets you aPyArray
.T=Set, S=AbstractSet
: used if you specify converting toSet
orAbstractSet
.T=Tuple, S=Tuple
: used if you specify converting toTuple
.
Some rules for t=None
:
T=Nothing, S=Any
: canonicalT=Missing, S=Missing
: specifyMissing
(orUnion{Missing, Foo}
)
Some rules for t=float
:
T=Float64, S=Any
: canonicalT=Float32, S=Float32
: specify another float typeT=Number, S=Number
: specify another non-float number type such asInteger
T=Missing, S=Missing
(for NaN)T=Nothing, S=Nothing
(for NaN)
Some examples for converting a float
:
- to
Any
: only rule 1 applies (filtering onS
) - to
Float32
: rules 2 and 3 apply (rule 1 ignored due toT
, others due toS
), andFloat32 <: Number
so rule 2 is tried first. - to
Integer
: only rule 3 applies (filtering onS
). - to
Union{Integer, Missing}
: rules 3 and 4 apply (filtering onS
). No subtype relationship betweenNumber
andMissing
so fall back to definition order so rule 3 is tried first.
Pros and cons
Pros:
- Strict ownership of rules - avoids piracy.
- Return type of
pyconvert
more predictable. - Clearer semantics/rule ordering than currently.
- Rule ordering on
t
andS
will be mostly unique - definition order is mostly for unions. - The number of applicable rules is massively cut down by filtering on
S
(usually to 1). - Easy to "opt in" to a conversion rule by being more specific about what you are converting to (see the
MyArray
example above).
Cons:
- People might still pirate (i.e. make rules with
S=Any
for which they don't ownt
). pyconvert(Union{AbstractArray,MyArray}, x)
does not do what you might expect (use the genericAbstractArray
rules plus the specialMyArray
rule) because the union gets normalised down toAbstractArray
first, so theMyArray
rule is never considered. You need to take more specific unions likeUnion{PyArray,Array,MyArray}
which is annoying. We could make a helper function to create such a union for you.