-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I have often lamented the fact that scitype cannot be a map of machine type to type, instead of object to type, because of the infamous CategoricalValue fly in the ointment. As a workaround to performance problems with arrays, we introduced Scitype, which is a map from type to type. Wouldn't if be simpler if implementing Scitype is the "fallback" responsibility of a convention, and that we only overload scitype for problematic cases like CategoricalValue? What am I missing here?
So, something like (ignoring convention distinctions):
# fallback:
scitype(X) = Scitype(typeof(X))
Scitype(::Type) = Unknown
Scitype(::Type{<:Integer}) = Count
# and so forth
# exceptions:
function scitype(X::CategoricalValue)
N = length(pool(X))
if isordered(X)
return OrderedFactor{N}
end
return Multiclass{N}
endTo be clear, I'm not suggesting a change in the definition of scitype, only how it is implemented, although Scitype is something we may want to make part of the public interface.
What got me thinking about this is the case of parametric types like Sampleable{S} and a type I'd like to introduce, called Iterator{S} for lazy loaded data structures. Here S is the scitype of the objects sampled, or the scitype of the objects iterated, respectively. How to we implement scitypes for these? This is tricky because we may not have an object from which to extract the parameter S, only its machine type. So in this case we are limited to using Scitype.
Thoughts @OkonSamuel @tlienart