Switch MTK default from FullSpecialize to AutoSpecialize#4335
Switch MTK default from FullSpecialize to AutoSpecialize#4335AayushSabharwal merged 10 commits intoSciML:masterfrom
Conversation
…ecialize The @fallback_iip_specialize macro in ModelingToolkitBase was hardcoding SciMLBase.FullSpecialize as the default specialization for all problem constructors (ODEProblem, SDEProblem, etc.). This meant MTK-generated functions never entered the AutoSpecialize path, bypassing the FunctionWrappersWrapper dispatch that enables compilation cache reuse across different ODE functions. This change makes AutoSpecialize the default, matching SciMLBase's own DEFAULT_SPECIALIZATION. The precompile workload now also warms up the FunctionWrappersWrapper wrapping path (DiffEqBase.wrapfun_iip) so that the first solve() call after loading MTK does not pay this cost. Note: The full solve path cannot be precompiled in ModelingToolkitBase because it does not depend on any ODE solver. A downstream package or extension would need to add a solve() call to its own @compile_workload to fully precompile the solver dispatch. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required for AutoSpecialize support in unwrapped_f and remake. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Required for tgrad/jac wrapping in promote_f and mass matrix AutoSpecialize support. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…tions
When constructing an AutoSpecialize ODEFunction from a System, erase the
concrete initialization_data and nlstep_data type parameters to their
abstract upper bounds (Union{Nothing, OverrideInitData} and
Union{Nothing, ODENLStepData}). This ensures all AutoSpecialize
ODEFunctions share the same type for these parameters regardless of the
model, preventing recompilation of promote_f and solver code for each new
model.
Benchmark results show ~50% reduction in per-model compilation time:
- FullSpecialize M2: 4.3s -> AutoSpecialize M2: 2.1s
- FullSpecialize M3: 4.0s -> AutoSpecialize M3: 2.2s
The type erasure is preserved through SciMLBase's remake pathway via
_has_type_erased_params and _rebuild_preserving_type_erasure.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Update: OverrideInitData type erasure for AutoSpecializeAdded Why this is neededEach MTK model generates unique How it works
Benchmark results (4 MTK models + Rodas5P)
Per-model savings: ~50% reduction (4.3s → 2.1s for M2, 4.7s → 2.2s for M4). Test C shows the default now uses AutoSpecialize, and after the first model compiles, subsequent models are essentially free (0.01s each) due to precompilation and type uniformity. |
Required for `_rebuild_preserving_type_erasure` which preserves the erased OverrideInitData/ODENLStepData type parameters through `remake` calls (SciMLBase PR SciML#1242). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
| `nlstep_data` type parameters. This ensures all AutoSpecialize ODEFunctions have identical | ||
| types, preventing recompilation of `promote_f` and solver code for each model. | ||
| """ | ||
| function _erase_init_data_type(f::SciMLBase.ODEFunction) |
There was a problem hiding this comment.
Doesn't this break if we ever add a field to ODEFunction?
Replace the fragile _erase_init_data_type function that manually listed all ODEFunction type parameters with SciMLBase.widen_bounded_type_params, which automatically detects and widens bounded type parameters. Addresses review comment about breaking when fields are added. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
|
Addressed @AayushSabharwal comment about Moved the logic to SciMLBase as MTKBase now just calls |
TTFX Benchmark ResultsTested locally with dev'd SciMLBase (v2.144.0, includes Benchmark: Two different 2-variable ODE models (Lotka-Volterra and a linear exchange system), measuring the time for the second using ModelingToolkit, OrdinaryDiffEq, SciMLBase
using ModelingToolkit: t_nounits as t, D_nounits as D
# Model 1: Lotka-Volterra
@variables x(t) y(t)
@parameters α β γ δ
eqs1 = [D(x) ~ α*x - β*x*y, D(y) ~ δ*x*y - γ*y]
@named sys1 = ODESystem(eqs1, t)
prob1 = ODEProblem(structural_simplify(sys1), [x => 1.0, y => 1.0], (0.0, 10.0),
[α => 1.5, β => 1.0, γ => 3.0, δ => 1.0])
# Model 2: Linear exchange
@variables z(t) w(t)
@parameters k1 k2
eqs2 = [D(z) ~ -k1*z + k2*w, D(w) ~ k1*z - k2*w]
@named sys2 = ODESystem(eqs2, t)
prob2 = ODEProblem(structural_simplify(sys2), [z => 2.0, w => 0.5], (0.0, 5.0),
[k1 => 0.3, k2 => 0.1])
# Warm up on model 1, then time model 2
solve(prob1, Tsit5())
@elapsed solve(prob2, Tsit5()) # <-- this is what we're measuringResults
Why it worksOn master, each MTK model produces an This PR widens Type analysis |
TTFX Breakdown:
|
| Phase | Time |
|---|---|
init (promote_f + integrator construction) |
3.08s |
solve! (stepping) |
1.07s |
| Total | 4.15s |
Second model, same structure (Tsit5)
| Phase | Time |
|---|---|
init |
0.84s |
solve! |
0.0s |
| Total | 0.84s |
First model (Rodas5P)
| Phase | Time |
|---|---|
init |
1.86s |
solve! |
1.91s |
| Total | 3.77s |
Second model (Rodas5P)
| Phase | Time |
|---|---|
init |
0.57s |
solve! |
0.03s |
| Total | 0.60s |
What AutoSpecialize gives us
solve!drops to ~0s for second model — the solver stepping code is fully shared viaFunctionWrappersWrapper. This is the core AutoSpecialize win.initstill costs ~0.8s per new model because theinitpath sees the fullODEFunction{..., GFW{hash}, ...}type beforepromote_fwraps it. Each model has a uniqueGeneratedFunctionWrapperhash baked into the type, soinitrecompiles per model.
What a @compile_workload with solve() would add
If a downstream package (or MTK extension) added a full solve() to its precompile workload:
- First-solve
solve!cost (1.07s Tsit5, 1.91s Rodas5P) → ~0s (saved to sysimage) - First-solve
initcost (3.08s Tsit5, 1.86s Rodas5P) → partially precompiled, but still model-specific due to uniqueFtype - Estimated first-solve: ~3.1s Tsit5 (down from 4.1s), ~1.9s Rodas5P (down from 3.7s)
- Second-model solve unchanged: ~0.84s Tsit5, ~0.6s Rodas5P
Remaining bottleneck
The init path is the dominant remaining cost. The only differing type parameter between same-structure models is F (param 3) — the GeneratedFunctionWrapper with model-specific RuntimeGeneratedFunction hashes. MTKParameters, u0, and tspan types are all identical across models. To further reduce second-model TTFX below ~0.8s, promote_f wrapping would need to happen before init (i.e., at ODEProblem construction time in MTK), so init only ever sees the uniform FunctionWrappersWrapper type.
Note: the wrapfun_iip precompile currently in precompile.jl (lines 118-126) was measured to have negligible impact (~0.1s within noise), likely because wrapfun_iip alone is cheap and the real cost is in the broader init dispatch chain.
Bump SciMLBase lower bound to 2.144.0 for `widen_bounded_type_params`. Remove dead comment block about downstream precompile workload. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| _u0 = prob_precompile.u0 | ||
| _p = prob_precompile.p | ||
| _t = prob_precompile.tspan[1] | ||
| DiffEqBase.wrapfun_iip(_f_unwrapped, (_u0, _u0, _p, _t)) |
There was a problem hiding this comment.
wrapfun_iip won't handle dual p right? Is there a hook to make that work?
There was a problem hiding this comment.
What if someone calls remake where p has duals?
There was a problem hiding this comment.
At least SciMLBase/DiffEqBase has a bunch of auto-unwrap / rewrap stuff. Worth testing on MTK generated codes because it wraps a bit earlier.
|
| Test | Result |
|---|---|
| Basic solve (AutoSpecialize) | Pass |
ForwardDiffSensitivity gradient |
Pass — identical to FullSpecialize |
SciMLStructures tunable repack path |
Pass |
Manual remake(prob; p = Dual(...)) + solve |
Pass — duals propagate correctly |
| Raw ForwardDiff (no sensealg) | Pass — identical to FullSpecialize |
Gradients match exactly between AutoSpecialize and FullSpecialize: [2.207, -5.119, 0.740, -1.455]
Why it works
With AutoSpecialize, function wrapping is deferred to solve time via promote_f, not done at problem construction:
-
remake(prob; p = dual_p):specialization(f) === AutoSpecialize(notFunctionWrapperSpecialize), so the rewrap block atremake.jl:259does NOT trigger. This is correct becausef.fis still the unwrappedGeneratedFunctionWrapper. -
solve(remade_prob): Callspromote_f(f, Val(AutoSpecialize), u0, dual_p, t, ...)→wrapfun_iip(f.f, (u0, u0, dual_p, t), Val(CS)). This creates FunctionWrappers whereT3(param type =Vector{Dual{...}}) is used in all 4 arglist variants. Dual variants fordu/u/tare generated on top of that for Jacobian computation. -
No rewrap needed in
remakebecause the wrapping hasn't happened yet — it happens fresh at eachsolvecall with the correct types.
Test script
using ModelingToolkit, OrdinaryDiffEq, SciMLBase, ForwardDiff, SciMLSensitivity, SciMLStructures
using ModelingToolkit: t_nounits as t, D_nounits as D
@variables x(t) y(t)
@parameters α β δ γ
eqs = [D(x) ~ α*x - β*x*y, D(y) ~ δ*x*y - γ*y]
@mtkbuild sys = System(eqs, t)
prob = ODEProblem{true, SciMLBase.AutoSpecialize}(
sys, [x => 1.0, y => 1.0], (0.0, 1.0), [α => 1.5, β => 1.0, δ => 3.0, γ => 1.0])
# ForwardDiff through remake + solve
function loss(p_vals)
_prob = remake(prob; p = [α => p_vals[1], β => p_vals[2], δ => p_vals[3], γ => p_vals[4]])
sol = solve(_prob, Tsit5())
return sum(sol[end])
end
grad = ForwardDiff.gradient(loss, [1.5, 1.0, 3.0, 1.0]) # works, matches FullSpecialize
# Direct dual remake
tunable, repack, _ = SciMLStructures.canonicalize(SciMLStructures.Tunable(), prob.p)
dual_p = repack(ForwardDiff.Dual.(tunable, 1.0))
prob_dual = remake(prob; p = dual_p)
sol_dual = solve(prob_dual, Tsit5()) # works, duals propagate correctlyCo-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com
|
One of the tests in ModelingToolkit/InterfaceI (specifically, when running A test in ModelingToolkit/Initialization also fails. The failing test checks the type-stability of |
Investigating reported test failures@AayushSabharwal Thanks for the report. I've been investigating both failures. Failure 2:
|
Benchmark the changes that you suggest will fix the
This is a link to the failing test. Attempt to reproduce the error, identify the root cause, and fix it. |
The type widening broke `@inferred remake` and `@inferred solve` because
Union type parameters (Union{Nothing, OverrideInitData}, Union{Nothing,
ODENLStepData}) are not inferrable. Without widening, the concrete types
flow through unwrapped_f/promote_f naturally, and both `@inferred remake`
and `@inferred solve` work correctly with AutoSpecialize.
The tradeoff is that different models with different initialization_data
types will produce different ODEFunction types after promote_f, but the
RHS function is still type-erased through FunctionWrappersWrapper which
is the primary goal of AutoSpecialize.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix for
|
In this case it's probably best to change the test so that it checks
Try @parameters k₁ k₂ k₃
@variables y₁(t) y₂(t) y₃(t)
# reorder the system just to be a little spicier
eqs = [
D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
0 ~ y₁ + y₂ + y₃ - 1,
D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃ * κ,
]
@named sys = System(eqs, t, initial_conditions = [k₁ => 100, k₂ => 3.0e7, y₁ => 1.0])
sys = complete(sys)
u0 = Pair[]
push!(u0, y₂ => 0.0)
push!(u0, y₃ => 0.0)
p = [
k₁ => 0.04,
k₃ => 1.0e4,
]
p2 = [
k₁ => 0.04,
k₂ => 3.0e7,
k₃ => 1.0e4,
]
tspan = (0.0, 100000.0)
prob1 = ODEProblem(sys, [u0; p], tspan)
prob2 = ODEProblem(sys, [u0; p], tspan, jac = true)
prob3 = ODEProblem(sys, [u0; p], tspan, jac = true, sparsity = true)
for (prob, atol) in [(prob1, 1.0e-12), (prob2, 1.0e-12), (prob3, 1.0e-12)]
local sol
sol = solve(prob, Rodas5())
@test all(x -> ≈(sum(x), 1.0, atol = atol), sol.u)
endDoes this reproduce the error? |
Benchmarks show that removing widen_bounded_type_params causes 3x slower second-model solves (2.3s vs 0.8s for Tsit5, 3.1s vs 0.7s for Rodas5) because different models produce different ODEFunction types, preventing compilation sharing across models. Keep widen_bounded_type_params for AutoSpecialize to preserve compilation sharing. Change the @inferred remake/solve test to use FullSpecialize, since Union type parameters in AutoSpecialize are by design. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Benchmark results and updated approach@AayushSabharwal was right — removing
Without widening, different models produce different ODEFunction types (due to different `initialization_data` types), preventing the solver from reusing compiled code. Changes in this PR (commit f0eef69):
Companion PR: DiffEqBase.jl#1288Fixes the "No matching function wrapper was found!" error for |
Requires the sparse Jacobian FunctionWrapper fix from DiffEqBase#1288 which handles AutoSpecialize with jac=true, sparsity=true. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
|
Somehow MTK/InterfaceII timed out, both on 1 and lts. I don't necessarily see anything problematic in the PR, so I'll rerun the testset in case it was one-off. |
Add package extensions for OrdinaryDiffEqDefault, OrdinaryDiffEqRosenbrock, and OrdinaryDiffEqBDF that precompile the full solve() path for standard MTK ODE problems using PrecompileTools.@compile_workload. Each extension builds a simple 1-state ODE system via mtkcompile and solves it with the respective solver (default, Rodas5P, FBDF) during precompilation. This caches the compiled native code so that the first solve() call at runtime avoids the multi-second compilation cost. This follows the pattern suggested in PR SciML#4335, where the precompile workload in ModelingToolkitBase was limited to ODEProblem construction and wrapfun_iip because it has no solver dependency. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Add package extensions for OrdinaryDiffEqDefault, OrdinaryDiffEqRosenbrock, and OrdinaryDiffEqBDF that precompile the full solve() path for standard MTK ODE problems using PrecompileTools.@compile_workload. Each extension builds a simple 1-state ODE system via mtkcompile and solves it with the respective solver (default, Rodas5P, FBDF) during precompilation. This caches the compiled native code so that the first solve() call at runtime avoids the multi-second compilation cost. This follows the pattern suggested in PR SciML#4335, where the precompile workload in ModelingToolkitBase was limited to ODEProblem construction and wrapfun_iip because it has no solver dependency. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Summary
Switches ModelingToolkit's default specialization level from
FullSpecializetoAutoSpecialize, enabling compilation reuse across different MTK models with the same structure:@fallback_iip_specializemacro: Changed the generated fallback fromODEProblem{iip, FullSpecialize}(...)toODEProblem{iip, AutoSpecialize}(...)inproblem_utils.jl. This is the single line that controls the default specialization for all MTK-generated problems.Precompile workload: Updated
precompile.jlto precompile theFunctionWrappersWrapperconstruction path thatDiffEqBase.promote_fperforms at solve time for AutoSpecialize ODEProblems. This ensures the firstsolve()call after loading MTK doesn't pay the FWW compilation cost. Note: the full solve path (OrdinaryDiffEq dispatch) cannot be precompiled here since ModelingToolkitBase doesn't depend on any ODE solver package.Context
With the companion PRs to SciMLBase and DiffEqBase, AutoSpecialize erases function types in
ODEFunctionso that different models sharing the same structure (same number of states, same solver) reuse compiled code. Currently each MTK model triggers full recompilation of the solver dispatch, costing 3-5s per model. With AutoSpecialize, the second and subsequent models should see significant speedups.The remaining compilation bottleneck after this PR is the initialization system inside
OverrideInitData, which containsRuntimeGeneratedFunctiontypes unique per model and accounts for ~50-60% of solve time. This is a separate optimization target.Dependencies
unwrapped_fandremake)promote_f)Test plan
🤖 Generated with Claude Code
Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com