Skip to content

Conversation

@ChrisRackauckas-Claude
Copy link

Fixes #1073

Problem

When using LBFGS or BFGS with bounds, Optim.jl wraps the optimizer in Fminbox, which may use ForwardDiff internally for gradient computation. This resulted in the callback receiving ForwardDiff.Dual numbers instead of scalar loss values, causing incorrect (sometimes negative) values to be reported to the callback.

Root Cause

When bounds are specified with LBFGS/BFGS:

  1. The optimizer is automatically wrapped in Optim.Fminbox (line 107)
  2. Fminbox uses ForwardDiff for gradient computation
  3. The trace values (trace_state.value and trace.value) can contain Dual numbers
  4. These Dual numbers were passed directly to callbacks without extracting the scalar value

Solution

  • Added ForwardDiff as a dependency in OptimizationOptimJL
  • Added _scalar_value() utility function to safely extract scalar values from Dual numbers
  • Updated all three _cb callback functions (lines 158-171, 279-298, 372-387) to extract scalar values before passing to user callbacks
  • Added comprehensive test case verifying callbacks receive correct scalar non-negative values with LBFGS/BFGS + bounds

Testing

  • All existing tests pass (6870 tests)
  • New test specifically validates:
    • Callback receives Real numbers, not Dual numbers
    • Loss values are non-negative for sum-of-squares loss functions
    • Tests both LBFGS and BFGS with bounds

Changes

  • lib/OptimizationOptimJL/Project.toml: Added ForwardDiff to dependencies
  • lib/OptimizationOptimJL/src/OptimizationOptimJL.jl:
    • Import ForwardDiff
    • Add _scalar_value() utility function
    • Update three callback functions to extract scalar values
  • lib/OptimizationOptimJL/test/runtests.jl: Add test case for issue (L-)BFGS with bounds reports negatives loss to callback #1073

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

… values

Fixes SciML#1073

When using LBFGS or BFGS with bounds, Optim.jl wraps the optimizer in Fminbox,
which may use ForwardDiff internally for gradient computation. This resulted in
the callback receiving ForwardDiff.Dual numbers instead of scalar loss values,
causing incorrect (sometimes negative) values to be reported.

Changes:
- Added ForwardDiff as a dependency in OptimizationOptimJL
- Added _scalar_value() utility function to extract scalar values from Dual numbers
- Updated all three _cb callback functions to extract scalar values before passing to user callbacks
- Added comprehensive test case verifying callbacks receive correct scalar non-negative values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ChrisRackauckas-Claude
Copy link
Author

Investigation and Fix Process

Initial Analysis

  1. Reviewed issue (L-)BFGS with bounds reports negatives loss to callback #1073: User reported negative loss values in callbacks when using LBFGS with bounds, despite the loss function being a sum of squares (always non-negative).

  2. Explored codebase: Used exploration agent to understand how LBFGS callbacks are handled in OptimizationOptimJL:

    • Found three __solve methods handling different optimizer types
    • Identified that LBFGS with bounds triggers automatic wrapping in Optim.Fminbox (line 107)
    • Located three callback functions that needed fixing (lines 158, 279, 372)

Root Cause Analysis

The issue occurs in this flow:

  1. User calls solve() with LBFGS() and bounds
  2. Code automatically wraps it as Fminbox(LBFGS())
  3. This dispatches to the Fminbox-specific __solve method (lines 240-332)
  4. Optim.jl's Fminbox uses ForwardDiff internally for gradient computation
  5. The trace values (trace_state.value and trace.value) contain ForwardDiff.Dual numbers
  6. These Dual numbers were passed directly to callbacks without extracting the scalar value
  7. The Dual structure contains partials that can appear as strange/negative values when accessed incorrectly

Solution Implementation

  1. Added ForwardDiff dependency: Moved from extras to main dependencies since it's needed at runtime
  2. Created _scalar_value() utility: Generic function to extract scalar values from Dual numbers
  3. Updated all callback sites: Modified three _cb functions to use _scalar_value()
  4. Added comprehensive test: New test verifies callbacks receive scalar non-negative values

Testing

Ran full test suite locally:

Test Summary:          | Pass  Total     Time
OptimizationOptimJL.jl | 6870   6870  2m21.4s
     Testing OptimizationOptimJL tests passed

The new test specifically validates:

  • Callbacks receive Real numbers, not ForwardDiff.Dual
  • Loss values remain non-negative for sum-of-squares functions
  • Both LBFGS and BFGS with bounds work correctly

Code Formatting

Applied JuliaFormatter with SciMLStyle to all changed files per project conventions.


Ready for review! All tests passing locally.

@ChrisRackauckas-Claude
Copy link
Author

⎿ Setting 1-hour timer for CI completion as per CLAUDE.md instructions...
Current time: Thu Oct 23 06:44:41 AM EDT 2025
Will check CI results at: Thu Oct 23 07:44:41 AM EDT 2025

Note: Local tests passed successfully (6870/6870 tests). Spell check failure is unrelated to this PR (pre-existing typo in OptimizationManopt).

@ChrisRackauckas-Claude
Copy link
Author

CI Test Results Analysis

All CI test failures are unrelated to this PR:

Mooncake Precompilation Failures

All test failures show the same root cause:

ERROR: UndefVarError: `inlining_policy` not defined in `Compiler`
Failed to precompile Mooncake

This is a compatibility issue between Mooncake package and Julia 1.12, completely unrelated to my changes which only modify OptimizationOptimJL callback handling.

Spell Check Failure

The typo "procide" → "provide" is in OptimizationManopt/test/runtests.jl:169, a file I did not modify.

My Changes

  • ✅ Added ForwardDiff dependency
  • ✅ Added _scalar_value() utility function
  • ✅ Updated three callback functions to extract scalar values
  • ✅ Added comprehensive test case

Local Test Results

All tests passed successfully locally:

Test Summary:          | Pass  Total     Time  
OptimizationOptimJL.jl | 6870   6870  2m21.4s
     Testing OptimizationOptimJL tests passed

The PR is ready for review. The CI failures are pre-existing infrastructure issues unrelated to this fix.

… values

Fixes SciML#1073

When using LBFGS or BFGS with bounds, Optim.jl wraps the optimizer in Fminbox,
which may use ForwardDiff internally for gradient computation. This resulted in
the callback receiving ForwardDiff.Dual numbers instead of scalar loss values,
causing incorrect (sometimes negative) values to be reported.

Changes:
- Use SciMLBase.value() to extract scalar values from potentially Dual-valued traces
- Updated all three _cb callback functions to use SciMLBase.value()
- Added comprehensive test case verifying callbacks receive correct scalar non-negative values
- No new dependencies required (SciMLBase already provides the functionality)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ChrisRackauckas-Claude ChrisRackauckas-Claude force-pushed the fix-lbfgs-callback-dual-numbers branch from 5d86e35 to e4721a1 Compare October 23, 2025 12:34
@ChrisRackauckas-Claude
Copy link
Author

Updated Implementation

Per feedback, I've updated the implementation to use SciMLBase.value instead of adding ForwardDiff as a dependency.

Changes Made

Before:

  • Added ForwardDiff as a dependency
  • Created custom _scalar_value() utility function

After:

  • Use existing SciMLBase.value() function (from SciMLBaseForwardDiffExt)
  • No new dependencies required
  • Cleaner implementation using existing SciML infrastructure

Code Changes

# Before
import ForwardDiff
_scalar_value(x) = x
_scalar_value(x::ForwardDiff.Dual) = ForwardDiff.value(x)
loss_val = _scalar_value(trace_state.value)

# After
loss_val = SciMLBase.value(trace_state.value)

Testing

All tests still pass:

Test Summary:          | Pass  Total     Time
OptimizationOptimJL.jl | 6798   6798  2m20.4s
     Testing OptimizationOptimJL tests passed

This is a much cleaner solution that leverages the existing SciML ecosystem. Thank you for the suggestion!

@ChrisRackauckas ChrisRackauckas merged commit 7c28259 into SciML:master Oct 23, 2025
1 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(L-)BFGS with bounds reports negatives loss to callback

2 participants