Skip to content

Conversation

blegat
Copy link
Member

@blegat blegat commented Jul 22, 2025

Here is a simplified reproducer of the performance issue identified in #4024:

using JuMP

f(x, u) = [sin(x[1]) - x[1] * u, cos(x[2]) + x[1] * u]
function RK4(f, X, u)
    k1 = f(X     , u)
    k2 = f(X+k1/2, u)
    k3 = f(X+k2/2, u)
    k4 = f(X+k3  , u)
    X + (k1 + 2k2 + 2k3 + k4) / 6
end

model = Model()

@variable(model, q[1:2])
@variable(model, u)

x = q
for m = 1:4
    x = RK4(f, x, u)
end
@time moi_function.(x)

Before this PR, this gives

14.610110 seconds (207.45 M allocations: 6.927 GiB, 68.63% gc time)

After this PR, this gives

0.000103 seconds (2.16 k allocations: 74.250 KiB)

Note that the MOI.ScalarNonlinearFunction we generate now share common sub-expression by pointers! If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.

This fixes the slow model-building issue but ReverseAD will still be terribly slow. One thing we could do is to detect, in MOI.Nonlinear, the MOI.ScalarNonlinearFunction that share sub-functions correponding to the same object (with a dictionary). When it detects two functions with the same pointer, it can then create subexpressions and use its existing support for subexpressions.
The nice thing about it is that we're not doing any change to the MOI interface. Creating MOI.ScalarNonlinearFunction was already possible from the beginning, we just didn't treat it any differently. So this change will just be a performance optimization of the AD but the interface is as simple and non-breaking as it gets.

Copy link

codecov bot commented Jul 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (b19c3e7) to head (ee3be7d).

Additional details and impacted files
@@            Coverage Diff            @@
##            master     #4032   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           43        43           
  Lines         6149      6160   +11     
=========================================
+ Hits          6149      6160   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@odow odow marked this pull request as draft July 22, 2025 21:48
@odow
Copy link
Member

odow commented Jul 22, 2025

If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.

In theory we don't, but the can be. This change needs very very careful consideration.

@blegat
Copy link
Member Author

blegat commented Jul 22, 2025

Yes, it has nontrivial consequences. It may very well jeopardize our chances of adding a modification API for nonlinear constraints in the future.

@odow
Copy link
Member

odow commented Aug 3, 2025

I think we can simplify this to:

function my_moi_function(f::GenericNonlinearExpr{V}) where {V}
    cache = Dict{UInt64,MOI.ScalarNonlinearFunction}()
    ret = MOI.ScalarNonlinearFunction(f.head, similar(f.args))
    stack = Tuple{MOI.ScalarNonlinearFunction,Int,GenericNonlinearExpr{V}}[]
    for i in length(f.args):-1:1
        if f.args[i] isa GenericNonlinearExpr{V}
            push!(stack, (ret, i, f.args[i]))
        else
            ret.args[i] = moi_function(f.args[i])
        end
    end
    while !isempty(stack)
        parent, i, arg = pop!(stack)
        parent.args[i] = get!(cache, objectid(arg)) do
            child = MOI.ScalarNonlinearFunction(arg.head, similar(arg.args))
            for j in length(arg.args):-1:1
                if arg.args[j] isa GenericNonlinearExpr{V}
                    push!(stack, (child, j, arg.args[j]))
                else
                    child.args[j] = moi_function(arg.args[j])
                end
            end
            return child
        end
    end
    return ret
end

It doesn't hold state across multiple calls to moi_function, but if there are large nested expressions within a single expression (the common case), then we exploit that.

@odow
Copy link
Member

odow commented Aug 3, 2025

The example gives:

julia> @time moi_function.(x);
  0.000142 seconds (1.53 k allocations: 106.688 KiB)

@blegat
Copy link
Member Author

blegat commented Aug 5, 2025

If this was only about moi_function then I would just get the model from the first JuMP.VariableRef I would see and then use the dictionary inside it. The reason I have to change the API is to merge moi_function and check_belongs_to_model in view of that benchmark: jump-dev/MathOptInterface.jl#2788 (comment)

@odow
Copy link
Member

odow commented Aug 5, 2025

We can fix check_belongs_to_model as well. I don't want to add the model-level cache. The cache should be within the function.

@blegat
Copy link
Member Author

blegat commented Aug 6, 2025

It would be weird to have the subexpressions only work when they are on the same function. Especially since the AD backend supports them being on different ones. I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup.
One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible.
What's the reason for not having a model-level cache ?

@odow
Copy link
Member

odow commented Aug 6, 2025

It would be weird to have the subexpressions only work when they are on the same function.

This is an implementation detail. Nothing should be observable at the JuMP level.

Especially since the AD backend supports them being on different ones.

This is also an implementation detail.

I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup.

This is my strongly preferred option. I would like us to provide the full tape to an AD engine, and then it to turn everything into a single global DAG.

One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible.

Yes, we definitely need benchmarks before merging anything like this

What's the reason for not having a model-level cache ?

It turns every nonlinear expression into a long-lived GC object that will never be freed until the model is.

Second, any expression occurring in multiple constraints seems like much less of an issue than the original example, where nested expressions are programmatically created.

Third, if we start with the internal cache, we can always change to a model-level cache later if needed.

@blegat
Copy link
Member Author

blegat commented Aug 7, 2025

That makes sense, if we only take care of not duplicating aliased subexpression at the level of each function, it's indeed easier.
We then detect subexpressions in the AD backend using hashes so the aliases used by the user have no impact on the subexpressions used by the AD.
It means that the user has no control over subexpressions but on the other hand, relying on the sub-expression used by the user was probably not the ideal way to let the user control it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants