Skip to content

Suggestion to simplify implemenation of scitype #155

@ablaom

Description

@ablaom

I have often lamented the fact that scitype cannot be a map of machine type to type, instead of object to type, because of the infamous CategoricalValue fly in the ointment. As a workaround to performance problems with arrays, we introduced Scitype, which is a map from type to type. Wouldn't if be simpler if implementing Scitype is the "fallback" responsibility of a convention, and that we only overload scitype for problematic cases like CategoricalValue? What am I missing here?

So, something like (ignoring convention distinctions):

# fallback:
scitype(X) = Scitype(typeof(X))

Scitype(::Type) = Unknown
Scitype(::Type{<:Integer}) = Count
# and so forth

# exceptions:
function scitype(X::CategoricalValue)
    N = length(pool(X))
    if isordered(X)
       return OrderedFactor{N}
    end
    return Multiclass{N}
end

To be clear, I'm not suggesting a change in the definition of scitype, only how it is implemented, although Scitype is something we may want to make part of the public interface.

What got me thinking about this is the case of parametric types like Sampleable{S} and a type I'd like to introduce, called Iterator{S} for lazy loaded data structures. Here S is the scitype of the objects sampled, or the scitype of the objects iterated, respectively. How to we implement scitypes for these? This is tricky because we may not have an object from which to extract the parameter S, only its machine type. So in this case we are limited to using Scitype.

Thoughts @OkonSamuel @tlienart

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions