Should the subfunction merge block apply a mask on the adjoint evaluation?#4177
Merged
connorjward merged 6 commits intomasterfrom Apr 9, 2025
Merged
Conversation
dham
previously requested changes
Apr 2, 2025
rckirby
approved these changes
Apr 8, 2025
pbrubeck
pushed a commit
that referenced
this pull request
Apr 10, 2025
* comment to explain how we evaluate the adjoint of the FunctionMergeBlock
This was referenced Nov 4, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Adjoining subfunctions is hard because it has to know that it doesn't hold its own data, it just views the full function. This means that when a subfunction is used, it also needs to add the full function onto the tape as a dependency so that the correct data is used.
The
pyadjoint.FloatingTypebase class does with two custom blocks:usub = u.subfunctions[i]for componentiof a full functionuis added as a dependency to blockblock0on the tape, what actually happens is:SubfunctionBlockis added to the tape withuas a dependency andusubas an output. This block just filters theusubcomponent.usubis then added as a dependency ofblock0. Nowblock0implicitly depends on the value ofuafter filtering out everything butusub.usubis added as an output to blockblock1on the tape, what actually happens is:usubis added as an output to blockblock1.FunctionMergeBlockis added to the tape withusubas a dependency anduas an output. This block combines the data inusubwith all data inuexcept that of componenti.block1now implicitly hasuas an output, but only for the data in componenti.This test just creates a function in the R space, and tapes 1) assigning to it from a control 2) multiplying by 2 and 3) calculating the square. The tapes with/without using subfunctions are shown below, where you can see the extra blocks in the subfunction case. (The odd numbering is just so the code matches the tape).
Without using subfunctions:

Using subfunctions:

Problem
This test fails if using subfunctions!
We want the
adj_valueof thei-th component to depend only on what happens along the subfunction branch, and theadj_valueof all other components to be whatever they were before the subfunctions split/merge.Instead, the adjoint of
FunctionMergeBlockwas (correctly) outputting thei-th component of the adj_value along the subfunction branch of the tape, but (incorrectly) outputting all components of the adj_value along the full-function branch of the tape.When these two branches recombined the adj_value for the
i-th component then had contributions from both branches.Solution
The merge
out[i] = usub; out[not i] = uis a mask. The adjoint of a mask is also a mask, soFunctionMergeBlocknow applies a mask before returning the adjoint outputs:output0 = adj_input[i]; output1 = adj_input[not i]