Skip to content
Draft
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions src/contrasts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,13 @@ function termnames(C::AbstractContrasts, levels::AbstractVector, baseind::Intege
levels[not_base]
end

Base.getindex(contrasts::ContrastsMatrix{C,T}, rowinds, colinds) where {C,T} =
getindex(contrasts.matrix, getindex.(Ref(contrasts.invindex), rowinds), colinds)
function Base.getindex(contrasts::ContrastsMatrix{C,T}, rowinds, colinds) where {C,T}
# allow rows to be missing
rows = get.(Ref(contrasts.invindex), rowinds, missing)
# create a row of nothing but missings for missing values
mrow = reduce(vcat, [missing for c in getindex(contrasts.matrix, 1, colinds)])
vcat([r === missing ? mrow : getindex(contrasts.matrix, r, colinds) for r in rows])
end

# Making a contrast type T only requires that there be a method for
# contrasts_matrix(T, baseind, n) and optionally termnames(T, levels, baseind)
Expand Down
6 changes: 3 additions & 3 deletions src/modelframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ missing_omit(data::T, formula::AbstractTerm) where T<:ColumnTable =

function ModelFrame(f::FormulaTerm, data::ColumnTable;
model::Type{M}=StatisticalModel, contrasts=Dict{Symbol,Any}()) where M
data, _ = missing_omit(data, f)

sch = schema(f, data, contrasts)
f = apply_schema(f, sch, M)


data, _ = missing_omit(data, f)

ModelFrame(f, sch, data, model)
end

Expand Down
14 changes: 8 additions & 6 deletions src/schema.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Base.haskey(schema::Schema, key) = haskey(schema.schema, key)
Compute all the invariants necessary to fit a model with `terms`. A schema is a dict that
maps `Term`s to their concrete instantiations (either `CategoricalTerm`s or
`ContinuousTerm`s. "Hints" may optionally be supplied in the form of a `Dict` mapping term
names (as `Symbol`s) to term or contrast types. If a hint is not provided for a variable,
names (as `Symbol`s) to term or contrast types. If a hint is not provided for a variable,
the appropriate term type will be guessed based on the data type from the data column: any
numeric data is assumed to be continuous, and any non-numeric data is assumed to be
categorical.
Expand Down Expand Up @@ -91,7 +91,7 @@ StatsModels.Schema with 1 entry:
y => y
```

Note that concrete `ContinuousTerm` and `CategoricalTerm` and un-typed `Term`s print the
Note that concrete `ContinuousTerm` and `CategoricalTerm` and un-typed `Term`s print the
same in a container, but when printed alone are different:

```jldoctest 1
Expand Down Expand Up @@ -176,6 +176,8 @@ concrete_term(t::Term, d) = concrete_term(t, d, nothing)
concrete_term(t, d, hint) = t

concrete_term(t::Term, xs::AbstractVector{<:Number}, ::Nothing) = concrete_term(t, xs, ContinuousTerm)
# and for missing values
concrete_term(t::Term, xs::AbstractVector{Union{Missing,T}} where T<:Number, ::Nothing) = concrete_term(t, xs, ContinuousTerm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't those lines be

concrete_term(t::Term, xs::AbstractVector{<:Union{<:Number,<:Union{Missing,<:Number}}}) =
    concrete_term(t, xs, ContinuousTerm)

I don't think we need/should specialize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't just having AbstractVector{<:Union{Missing, <:Number}} work?

julia> [1,2] isa AbstractVector{<:Union{Missing,<:Number}}
true

julia> [1,2, missing] isa AbstractVector{<:Union{Missing,<:Number}}
true

function concrete_term(t::Term, xs::AbstractVector, ::Type{ContinuousTerm})
μ, σ2 = StatsBase.mean_and_var(xs)
min, max = extrema(xs)
Expand All @@ -196,9 +198,9 @@ end
Return a new term that is the result of applying `schema` to term `t` with
destination model (type) `Mod`. If `Mod` is omitted, `Nothing` will be used.

When `t` is a `ContinuousTerm` or `CategoricalTerm` already, the term will be returned
unchanged _unless_ a matching term is found in the schema. This allows
selective re-setting of a schema to change the contrast coding or levels of a
When `t` is a `ContinuousTerm` or `CategoricalTerm` already, the term will be returned
unchanged _unless_ a matching term is found in the schema. This allows
selective re-setting of a schema to change the contrast coding or levels of a
categorical term, or to change a continuous term to categorical or vice versa.

When defining behavior for custom term types, it's best to dispatch on
Expand Down Expand Up @@ -277,7 +279,7 @@ function apply_schema(t::FormulaTerm, schema::Schema, Mod::Type{<:StatisticalMod
end

# strategy is: apply schema, then "repair" if necessary (promote to full rank
# contrasts).
# contrasts).
#
# to know whether to repair, need to know context a term appears in. main
# effects occur in "own" context.
Expand Down
7 changes: 7 additions & 0 deletions src/terms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,13 @@ modelcols(t::Term, d::NamedTuple) =
modelcols(ft::FunctionTerm{Fo,Fa,Names}, d::NamedTuple) where {Fo,Fa,Names} =
ft.fanon.(getfield.(Ref(d), Names)...)

# this is weird, but using import Base: copy leads to exporting type piracy
# for non missing values, the compiler should hopefully optimize down the extra
# layer of indirection
function copy end
copy(x::Any) = Base.copy(x)
copy(m::Missing) = m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feels like a bad idea. Maybe better to just drop the use of copy (and broadcast identity or use copy on the vector).


modelcols(t::ContinuousTerm, d::NamedTuple) = copy.(d[t.sym])

modelcols(t::CategoricalTerm, d::NamedTuple) = t.contrasts[d[t.sym], :]
Expand Down
11 changes: 11 additions & 0 deletions test/terms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ mimestring(x) = mimestring(MIME"text/plain", x)
@test t0.min == 1.0
@test t0.max == 3.0

vals0m = [3, missing, 1]
t0m = concrete_term(t, vals0m)
@test string(t0m) == "aaa"
@test mimestring(t0m) == "aaa(continuous)"
# compute all these values to make sure the behavior of terms matches
# the behavior of other relevant packages
@test isequal(t0m.mean, mean(vals0m))
@test isequal(t0m.var, var(vals0m))
@test isequal(t0m.min, min(vals0m...))
@test isequal(t0m.max, max(vals0m...))

t1 = concrete_term(t, [:a, :b, :c])
@test t1.contrasts isa StatsModels.ContrastsMatrix{DummyCoding}
@test string(t1) == "aaa"
Expand Down