Add `setup(::Function, model)` #204

mcabbott · 2024-12-21T19:44:41Z

Quick sketch of one way to easily allow different rules for different arrays, by modifying setup -- see docstring.

PR Checklist

Tests are added
Documentation, if applicable

AntonOresten · 2025-10-21T11:54:22Z

I think this is elegant and useful. I'm working on some improvements to #203. Muon is optimal for linear layers, but doesn't make as much sense for e.g. Flux.Embedding, even though it is linear-like, and since the linear decoder layer in LLMs is often tied to the input encoder layer, it's preferable to disable Muon for that layer as well. I imagine the cleanest way of differentiating between linear layers is with an IdDict inside the setup rule function. You'd for example create the function based on which layers are present in some IdDict, and in the same function embed rules for different array shapes.

AntonOresten · 2025-10-22T12:03:22Z

One could do something like:

function fun_rule(model, rule=Muon(), fallback=Adam())
    skipped = Base.IdSet{Any}([model.encode.weight, model.decode.weight])
    fun(x::AbstractVector) = fallback
    fun(x::AbstractArray) = x in skipped ? fallback : rule
    return fun
end

opt_state = Optimisers.setup(fun_rule(model), model)

such that:

julia> model = (;
           encode=(; weight=rand(2,2)),
           other=(; weight=rand(2,2), bias=rand(2)),
           decode=(; weight=rand(2,2)));

julia> fun_rule(model)(model.encode.weight)
Adam(eta=0.001, beta=(0.9, 0.999), epsilon=1.0e-8)

julia> fun_rule(model)(model.other.weight)
Muon(0.02, 0.95, 0.01, 1.0e-7, true)

julia> fun_rule(model)(model.other.bias)
Adam(eta=0.001, beta=(0.9, 0.999), epsilon=1.0e-8)

julia> fun_rule(model)(model.decode.weight)
Adam(eta=0.001, beta=(0.9, 0.999), epsilon=1.0e-8)

I generally avoid closures, but this has a certain elegance to it. Base.IdSet is private, but the alternative is slightly cursed:

skipped = keys(IdDict([model.encode.weight, model.decode.weight] .=> nothing))

add setup(::Function, model)

e5b85bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `setup(::Function, model)` #204

Add `setup(::Function, model)` #204

Uh oh!

mcabbott commented Dec 21, 2024

Uh oh!

AntonOresten commented Oct 21, 2025

Uh oh!

AntonOresten commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add setup(::Function, model) #204

Are you sure you want to change the base?

Add setup(::Function, model) #204

Uh oh!

Conversation

mcabbott commented Dec 21, 2024

PR Checklist

Uh oh!

AntonOresten commented Oct 21, 2025

Uh oh!

AntonOresten commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `setup(::Function, model)` #204

Add `setup(::Function, model)` #204

AntonOresten commented Oct 22, 2025 •

edited

Loading