Rework API for AD testing

penelopeysm · penelopeysm · commit 4ce84c27e801 · 2025-06-30T11:12:40.000+01:00
diff --git a/HISTORY.md b/HISTORY.md
@@ -8,6 +8,12 @@
 
 The `@submodel` macro is fully removed; please use `to_submodel` instead.
 
+### `DynamicPPL.TestUtils.AD.run_ad`
+
+The three keyword arguments, `test`, `reference_backend`, and `expected_value_and_grad` have been merged into a single `test` keyword argument.
+Please see the API documentation for more details.
+(The old `test=true` and `test=false` values are still valid, and you only need to adjust the invocation if you were explicitly passing the `reference_backend` or `expected_value_and_grad` arguments.)
+
 ### Accumulators
 
 This release overhauls how VarInfo objects track variables such as the log joint probability. The new approach is to use what we call accumulators: Objects that the VarInfo carries on it that may change their state at each `tilde_assume!!` and `tilde_observe!!` call based on the value of the variable in question. They replace both variables that were previously hard-coded in the `VarInfo` object (`logp` and `num_produce`) and some contexts. This brings with it a number of breaking changes:
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -211,6 +211,21 @@ To test and/or benchmark the performance of an AD backend on a model, DynamicPPL
 
 ```@docs
 DynamicPPL.TestUtils.AD.run_ad
+```
+
+THe default test setting is to compare against ForwardDiff.
+You can have more fine-grained control over how to test the AD backend using the following types:
+
+```@docs
+DynamicPPL.TestUtils.AD.AbstractADCorrectnessTestSetting
+DynamicPPL.TestUtils.AD.WithBackend
+DynamicPPL.TestUtils.AD.WithExpectedResult
+DynamicPPL.TestUtils.AD.NoTest
+```
+
+These are returned / thrown by the `run_ad` function:
+
+```@docs
 DynamicPPL.TestUtils.AD.ADResult
 DynamicPPL.TestUtils.AD.ADIncorrectException
 ```
diff --git a/src/test_utils/ad.jl b/src/test_utils/ad.jl
@@ -4,14 +4,7 @@ using ADTypes: AbstractADType, AutoForwardDiff
 using Chairmarks: @be
 import DifferentiationInterface as DI
 using DocStringExtensions
-using DynamicPPL:
-    Model,
-    LogDensityFunction,
-    VarInfo,
-    AbstractVarInfo,
-    link,
-    DefaultContext,
-    AbstractContext
+using DynamicPPL: Model, LogDensityFunction, VarInfo, AbstractVarInfo, link
 using LogDensityProblems: logdensity, logdensity_and_gradient
 using Random: Random, Xoshiro
 using Statistics: median
@@ -20,12 +13,48 @@ using Test: @test
 export ADResult, run_ad, ADIncorrectException
 
 """
-    REFERENCE_ADTYPE
+    AbstractADCorrectnessTestSetting
 
-Reference AD backend to use for comparison. In this case, ForwardDiff.jl, since
-it's the default AD backend used in Turing.jl.
+Different ways of testing the correctness of an AD backend.
 """
-const REFERENCE_ADTYPE = AutoForwardDiff()
+abstract type AbstractADCorrectnessTestSetting end
+
+"""
+    WithBackend(adtype::AbstractADType=AutoForwardDiff()) <: AbstractADCorrectnessTestSetting
+
+Test correctness by comparing it against the result obtained with `adtype`.
+
+`adtype` defaults to ForwardDiff.jl, since it's the default AD backend used in
+Turing.jl.
+"""
+struct WithBackend{AD<:AbstractADType} <: AbstractADCorrectnessTestSetting
+    adtype::AD
+end
+WithBackend() = WithBackend(AutoForwardDiff())
+
+"""
+    WithExpectedResult(
+        value::T,
+        grad::AbstractVector{T}
+    ) where {T <: AbstractFloat}
+    <: AbstractADCorrectnessTestSetting
+
+Test correctness by comparing it against a known result (e.g. one obtained
+analytically, or one obtained with a different backend previously). Both the
+value of the primal (i.e. the log-density) as well as its gradient must be
+supplied.
+"""
+struct WithExpectedResult{T<:AbstractFloat} <: AbstractADCorrectnessTestSetting
+    value::T
+    grad::AbstractVector{T}
+end
+
+"""
+    NoTest() <: AbstractADCorrectnessTestSetting
+
+Disable correctness testing.
+"""
+struct NoTest <: AbstractADCorrectnessTestSetting end
 
 """
     ADIncorrectException{T<:AbstractFloat}
@@ -84,14 +113,12 @@ end
     run_ad(
         model::Model,
         adtype::ADTypes.AbstractADType;
-        test=true,
+        test::Union{AbstractADCorrectnessTestSetting,Bool}=WithBackend(),
         benchmark=false,
         value_atol=1e-6,
         grad_atol=1e-6,
         varinfo::AbstractVarInfo=link(VarInfo(model), model),
         params::Union{Nothing,Vector{<:AbstractFloat}}=nothing,
-        reference_adtype::ADTypes.AbstractADType=REFERENCE_ADTYPE,
-        expected_value_and_grad::Union{Nothing,Tuple{AbstractFloat,Vector{<:AbstractFloat}}}=nothing,
         verbose=true,
     )::ADResult
 
@@ -143,22 +170,25 @@ Everything else is optional, and can be categorised into several groups:
    prep_params)`. You could then evaluate the gradient at a different set of
    parameters using the `params` keyword argument.
 
-3. _How to specify the results to compare against._ (Only if `test=true`.)
+3. _How to specify the results to compare against._
 
    Once logp and its gradient has been calculated with the specified `adtype`,
-   it must be tested for correctness.
+   it can optionally be tested for correctness. The exact way this is tested 
+   is specified in the `test` parameter.
 
-   This can be done either by specifying `reference_adtype`, in which case logp
-   and its gradient will also be calculated with this reference in order to
-   obtain the ground truth; or by using `expected_value_and_grad`, which is a
-   tuple of `(logp, gradient)` that the calculated values must match. The
-   latter is useful if you are testing multiple AD backends and want to avoid
-   recalculating the ground truth multiple times.
+   There are several options for this:
 
-   The default reference backend is ForwardDiff. If none of these parameters are
-   specified, ForwardDiff will be used to calculate the ground truth.
+    - You can explicitly specify the correct value using
+      [`WithExpectedResult()`](@ref).
+    - You can compare against the result obtained with a different AD backend
+      using [`WithBackend(adtype)`](@ref).
+    - You can disable testing by passing [`NoTest()`](@ref).
+    - The default is to compare against the result obtained with ForwardDiff,
+      i.e. `WithBackend(AutoForwardDiff())`.
+    - `test=false` and `test=true` are synonyms for
+      `NoTest()` and `WithBackend(AutoForwardDiff())`, respectively.
 
-4. _How to specify the tolerances._ (Only if `test=true`.)
+4. _How to specify the tolerances._ (Only if testing is enabled.)
 
    The tolerances for the value and gradient can be set using `value_atol` and
    `grad_atol`. These default to 1e-6.
@@ -180,48 +210,57 @@ thrown as-is.
 function run_ad(
     model::Model,
     adtype::AbstractADType;
-    test::Bool=true,
+    test::Union{AbstractADCorrectnessTestSetting,Bool}=WithBackend(),
     benchmark::Bool=false,
     value_atol::AbstractFloat=1e-6,
     grad_atol::AbstractFloat=1e-6,
     varinfo::AbstractVarInfo=link(VarInfo(model), model),
     params::Union{Nothing,Vector{<:AbstractFloat}}=nothing,
-    reference_adtype::AbstractADType=REFERENCE_ADTYPE,
-    expected_value_and_grad::Union{Nothing,Tuple{AbstractFloat,Vector{<:AbstractFloat}}}=nothing,
     verbose=true,
 )::ADResult
+    # Convert Boolean `test` to an AbstractADCorrectnessTestSetting
+    if test isa Bool
+        test = test ? WithBackend() : NoTest()
+    end
+
+    # Extract parameters
     if isnothing(params)
         params = varinfo[:]
     end
     params = map(identity, params)  # Concretise
 
+    # Calculate log-density and gradient with the backend of interest
     verbose && @info "Running AD on $(model.f) with $(adtype)\n"
     verbose && println("       params : $(params)")
     ldf = LogDensityFunction(model, varinfo; adtype=adtype)
-
     value, grad = logdensity_and_gradient(ldf, params)
+    # collect(): https://github.com/JuliaDiff/DifferentiationInterface.jl/issues/754
     grad = collect(grad)
     verbose && println("       actual : $((value, grad))")
 
-    if test
-        # Calculate ground truth to compare against
-        value_true, grad_true = if expected_value_and_grad === nothing
-            ldf_reference = LogDensityFunction(model, varinfo; adtype=reference_adtype)
-            logdensity_and_gradient(ldf_reference, params)
-        else
-            expected_value_and_grad
+    # Test correctness
+    if test isa NoTest
+        value_true = nothing
+        grad_true = nothing
+    else
+        # Get the correct result
+        if test isa WithExpectedResult
+            value_true = test.value
+            grad_true = test.grad
+        elseif test isa WithBackend
+            ldf_reference = LogDensityFunction(model, varinfo; adtype=test.adtype)
+            value_true, grad_true = logdensity_and_gradient(ldf_reference, params)
+            # collect(): https://github.com/JuliaDiff/DifferentiationInterface.jl/issues/754
+            grad_true = collect(grad_true)
         end
+        # Perform testing
         verbose && println("     expected : $((value_true, grad_true))")
-        grad_true = collect(grad_true)
-
         exc() = throw(ADIncorrectException(value, value_true, grad, grad_true))
         isapprox(value, value_true; atol=value_atol) || exc()
         isapprox(grad, grad_true; atol=grad_atol) || exc()
-    else
-        value_true = nothing
-        grad_true = nothing
     end
 
+    # Benchmark
     time_vs_primal = if benchmark
         primal_benchmark = @be (ldf, params) logdensity(_[1], _[2])
         grad_benchmark = @be (ldf, params) logdensity_and_gradient(_[1], _[2])
diff --git a/test/ad.jl b/test/ad.jl
@@ -1,4 +1,5 @@
 using DynamicPPL: LogDensityFunction
+using DynamicPPL.TestUtils.AD: run_ad, WithExpectedResult, NoTest
 
 @testset "Automatic differentiation" begin
     # Used as the ground truth that others are compared against.
@@ -31,9 +32,10 @@ using DynamicPPL: LogDensityFunction
                 linked_varinfo = DynamicPPL.link(varinfo, m)
                 f = LogDensityFunction(m, linked_varinfo)
                 x = DynamicPPL.getparams(f)
+
                 # Calculate reference logp + gradient of logp using ForwardDiff
-                ref_ldf = LogDensityFunction(m, linked_varinfo; adtype=ref_adtype)
-                ref_logp, ref_grad = LogDensityProblems.logdensity_and_gradient(ref_ldf, x)
+                ref_ad_result = run_ad(m, ref_adtype; varinfo=linked_varinfo, test=NoTest())
+                ref_logp, ref_grad = ref_ad_result.value_actual, ref_ad_result.grad_actual
 
                 @testset "$adtype" for adtype in test_adtypes
                     @info "Testing AD on: $(m.f) - $(short_varinfo_name(linked_varinfo)) - $adtype"
@@ -63,11 +65,11 @@ using DynamicPPL: LogDensityFunction
                             ref_ldf, adtype
                         )
                     else
-                        @test DynamicPPL.TestUtils.AD.run_ad(
+                        @test run_ad(
                             m,
                             adtype;
                             varinfo=linked_varinfo,
-                            expected_value_and_grad=(ref_logp, ref_grad),
+                            test=WithExpectedResult(ref_logp, ref_grad),
                         ) isa Any
                     end
                 end