Skip to content

Weights in pipelines #1005

@olivierlabayle

Description

@olivierlabayle

Hello,

I am trying to use weights in MLJ and surprised by the behaviour in a pipeline which I think is a bug. In the following, the supervised component does not support weights but neither an error is thrown nor a warning displayed.

using MLJBase
using MLJModels
using MLJLinearModels

n = 100
X = (A=categorical(rand([0, 1], n)), B=categorical(rand([0, 1], n)))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]
pipe = Pipeline(MLJModels.ContinuousEncoder(), LogisticClassifier())

# Train without weights: all good
unweighted_mach = machine(pipe, X, y)
fit!(unweighted_mach, verbosity=2)
preds_unweighted = predict(unweighted_mach)

# Train with weights: no warning, no error and weights are ignored
weighted_mach = machine(pipe, X, y, weights)
fit!(weighted_mach, verbosity=2)
preds_weighted = predict(weighted_mach)

[x.prob_given_ref[1] for x in preds_unweighted] == [x.prob_given_ref[1] for x in preds_weighted] # returns true

More generally, outside a pipeline, passing weights to models that don't support it only prints a warning and fit proceeds. I thought it would throw an error:

n = 100
X = (A=rand(n), B=rand(n))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]

weighted_mach = machine(LogisticClassifier(), X, y, weights) # warning here
fit!(weighted_mach)
predict(weighted_mach)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions