Skip to content

fix: use adjoint while move it#1772

Open
fogsong233 wants to merge 6 commits intovgvassilev:masterfrom
fogsong233:fix-xvalue-reverse
Open

fix: use adjoint while move it#1772
fogsong233 wants to merge 6 commits intovgvassilev:masterfrom
fogsong233:fix-xvalue-reverse

Conversation

@fogsong233
Copy link
Copy Markdown
Contributor

I have currently implemented a fix with maximum compatibility by creating a copy. However, xvalues are still being used within the function; this incorrect usage of rvalues is a systemic issue throughout the entire workflow.

It appears that Clad is not correctly perceiving the move operation, which prevents the proper transformation of the adjoint. This suggests a potential design conflict between the semantics of a 'move' and the requirements of automatic differentiation

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

if (argDiff.getExpr_dx() && arg->isXValue() &&
argDiff.getExpr_dx()->isLValue()) {
llvm::SmallVector<Expr*, 1> moveArg = {argDiff.getExpr_dx()};
Expr* revSweepAdjoint = Clone(argDiff.getExpr_dx());
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a test for this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also need change the exsiting test,which checks generated code as before

@fogsong233
Copy link
Copy Markdown
Contributor Author

#include <clad/Differentiator/CladtorchBuiltins.h>
#include <clad/Differentiator/Differentiator.h>

using FTensor = cladtorch::Tensor<float>;

struct M {
  FTensor f(const FTensor& a, const FTensor& b) const {
    auto t = a + b;
    return t;
  }
};

float loss(const M& m, const FTensor& a, const FTensor& b, const FTensor& c) {
  auto y = m.f(a, b);
  auto z = y * c;
  auto n = z.norm();
  return n.scalar();
}

int main() {
  auto grad = clad::gradient(loss, "1");
  (void)grad;
  return 0;
}

See this example: after clad's process:

clad::ValueAndAdjoint<FTensor, FTensor> f_reverse_forw(const FTensor &a, const FTensor &b, const M *_d_this, const FTensor &_d_a, const FTensor &_d_b) const {
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t0 = clad::custom_derivatives::class_functions::operator_plus_reverse_forw(&a, b, &_d_a, _d_b);
    cladtorch::Tensor<float> t = _t0.value;
    cladtorch::Tensor<float> _d_t = _t0.adjoint;
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
    return {_t1.value, _t1.adjoint};
}
void f_pullback(const FTensor &a, const FTensor &b, FTensor _d_y, M *_d_this, FTensor *_d_a, FTensor *_d_b) const {
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t0 = clad::custom_derivatives::class_functions::operator_plus_reverse_forw(&a, b, _d_a, (*_d_b));
    cladtorch::Tensor<float> t = _t0.value;
    cladtorch::Tensor<float> _d_t = _t0.adjoint;
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
    clad::custom_derivatives::class_functions::constructor_pullback(std::move(t), &_d_y, &_d_t);
    clad::custom_derivatives::class_functions::operator_plus_pullback(&a, b, _d_t, _d_a, _d_b);
}
void loss_grad_1(const M &m, const FTensor &a, const FTensor &b, const FTensor &c, FTensor *_d_a) {
    M _d_m = {};
    FTensor _d_b(b);
    FTensor _d_c(c);
    clad::ValueAndAdjoint<FTensor, FTensor> _t0 = m.f_reverse_forw(a, b, &_d_m, (*_d_a), _d_b);
    cladtorch::FTensor y = _t0.value;
    cladtorch::FTensor _d_y = _t0.adjoint;
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::operator_star_reverse_forw(&y, c, &_d_y, _d_c);
    cladtorch::Tensor<float> z = _t1.value;
    cladtorch::Tensor<float> _d_z = _t1.adjoint;
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t2 = clad::custom_derivatives::class_functions::norm_reverse_forw(&z, &_d_z);
    cladtorch::Tensor<float> n = _t2.value;
    cladtorch::Tensor<float> _d_n = _t2.adjoint;
    clad::custom_derivatives::class_functions::scalar_pullback(&n, 1, &_d_n);
    clad::custom_derivatives::class_functions::norm_pullback(&z, _d_n, &_d_z);
    clad::custom_derivatives::class_functions::operator_star_pullback(&y, c, _d_z, &_d_y, &_d_c);
    m.f_pullback(a, b, _d_y, &_d_m, _d_a, &_d_b);
}

There:

  cladtorch::Tensor<float> _d_t = _t0.adjoint;
    ::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
    clad::custom_derivatives::class_functions::constructor_pullback(std::move(t), &_d_y, &_d_t);
    clad::custom_derivatives::class_functions::operator_plus_pullback(&a, b, _d_t, _d_a, _d_b);

It use after move

@vgvassilev vgvassilev requested a review from guitargeek March 17, 2026 07:28
@vgvassilev
Copy link
Copy Markdown
Owner

@guitargeek, I lack bandwidth but can you take a look?

@guitargeek
Copy link
Copy Markdown
Collaborator

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

@guitargeek
Copy link
Copy Markdown
Collaborator

In particular, there are merge conflicts that need to be resolved.

Copy link
Copy Markdown
Collaborator

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase

@fogsong233
Copy link
Copy Markdown
Contributor Author

fogsong233 commented Mar 17, 2026

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This PR changes the structure of the generated code, but the plan hasn’t been finalized yet. I think we could first implement a solution, and then update the CI to match the new structure once it’s properly defined.
Do you have some idea to fix it? I'm not very familiar with the pipeline of reverse mode visitor.

@fogsong233
Copy link
Copy Markdown
Contributor Author

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

@vgvassilev
Copy link
Copy Markdown
Owner

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

Can we bisect it? Maybe somewhere there was a faulty merge.

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@fogsong233
Copy link
Copy Markdown
Contributor Author

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

Can we bisect it? Maybe somewhere there was a faulty merge.

Okay, i will try to find it, but i guess it is the commit that impl the move-specific path in reverse mode that causes it.

@vgvassilev
Copy link
Copy Markdown
Owner

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

Can we bisect it? Maybe somewhere there was a faulty merge.

Okay, i will try to find it, but i guess it is the commit that impl the move-specific path in reverse mode that causes it.

Perhaps, but let's make sure that's the case.

@fogsong233
Copy link
Copy Markdown
Contributor Author

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

Can we bisect it? Maybe somewhere there was a faulty merge.

9ed0b9e Generate constructor_reverse_forw automatically

Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue?

This is a new problem and not related to any existing issue.

Can we bisect it? Maybe somewhere there was a faulty merge.

Okay, i will try to find it, but i guess it is the commit that impl the move-specific path in reverse mode that causes it.

Perhaps, but let's make sure that's the case.

#1625
It seems that this pr makes this happens.

@vgvassilev
Copy link
Copy Markdown
Owner

Ok, then we will need a fix to move forward. Eg. we need to resolve this as either part of this PR or a separate PR.

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@fogsong233
Copy link
Copy Markdown
Contributor Author

fogsong233 commented Mar 26, 2026

Ok, then we will need a fix to move forward. Eg. we need to resolve this as either part of this PR or a separate PR.

I currently see two possible ways to address this issue, but neither is fully satisfactory because move construction is special.

Method 1:
Fall back to the pre-PR behavior for copy/move constructors, i.e. do not build or use constructor_reverse_forw at the call site.

This is the simplest and safest fix, and it avoids the immediate use-after-move problem. However, it also means we give up the new constructor_reverse_forw path for copy/move constructors, so any user-defined special behavior encoded there would no longer be used in this case.

Method 2:
Try to continue from the result produced by the first move, instead of reusing the original source object for the later differentiation steps.

At first glance this looks attractive, but I do not think it is generally sound. A move constructor is not just an ordinary pure function call: it may consume or transform the entire source object. In reverse mode, the current pipeline may need to touch the same construction in multiple stages (constructor_reverse_forw, forward reconstruction, and constructor_pullback). If we reuse the moved-to object as if it were the original source, this may be incorrect for user-defined move constructors.

For example:

struct S {
  double x;
  S(S&& o) : x(o.x * o.x) { o.x = 0; }
};

Here the destination object stores transformed state, not the original pre-move source state. So using the moved-to object for pullback is not generally equivalent to replaying the original move from the original source.

Therefore, i prefer the first method, i think it is enough, simple and good. Because move constructor itself should be something just move the data, not some heavy behavior, i think constructor_reverse_forw is useless most of time.

@vgvassilev
Copy link
Copy Markdown
Owner

Ok, then we will need a fix to move forward. Eg. we need to resolve this as either part of this PR or a separate PR.

I currently see two possible ways to address this issue, but neither is fully satisfactory because move construction is special.

Method 1: Fall back to the pre-PR behavior for copy/move constructors, i.e. do not build or use constructor_reverse_forw at the call site.

This is the simplest and safest fix, and it avoids the immediate use-after-move problem. However, it also means we give up the new constructor_reverse_forw path for copy/move constructors, so any user-defined special behavior encoded there would no longer be used in this case.

Method 2: Try to continue from the result produced by the first move, instead of reusing the original source object for the later differentiation steps.

At first glance this looks attractive, but I do not think it is generally sound. A move constructor is not just an ordinary pure function call: it may consume or transform the entire source object. In reverse mode, the current pipeline may need to touch the same construction in multiple stages (constructor_reverse_forw, forward reconstruction, and constructor_pullback). If we reuse the moved-to object as if it were the original source, this may be incorrect for user-defined move constructors.

For example:

struct S {
  double x;
  S(S&& o) : x(o.x * o.x) { o.x = 0; }
};

Here the destination object stores transformed state, not the original pre-move source state. So using the moved-to object for pullback is not generally equivalent to replaying the original move from the original source.

Therefore, i prefer the first method, i think it is enough, simple and good. Because move constructor itself should be something just move the data, not some heavy behavior, i think constructor_reverse_forw is useless most of time.

I'd opt for Method1 + opening a new issue to track this discussion and explain that we need a better fix.

@fogsong233
Copy link
Copy Markdown
Contributor Author

Ok, then we will need a fix to move forward. Eg. we need to resolve this as either part of this PR or a separate PR.

I currently see two possible ways to address this issue, but neither is fully satisfactory because move construction is special.
Method 1: Fall back to the pre-PR behavior for copy/move constructors, i.e. do not build or use constructor_reverse_forw at the call site.
This is the simplest and safest fix, and it avoids the immediate use-after-move problem. However, it also means we give up the new constructor_reverse_forw path for copy/move constructors, so any user-defined special behavior encoded there would no longer be used in this case.
Method 2: Try to continue from the result produced by the first move, instead of reusing the original source object for the later differentiation steps.
At first glance this looks attractive, but I do not think it is generally sound. A move constructor is not just an ordinary pure function call: it may consume or transform the entire source object. In reverse mode, the current pipeline may need to touch the same construction in multiple stages (constructor_reverse_forw, forward reconstruction, and constructor_pullback). If we reuse the moved-to object as if it were the original source, this may be incorrect for user-defined move constructors.
For example:

struct S {
  double x;
  S(S&& o) : x(o.x * o.x) { o.x = 0; }
};

Here the destination object stores transformed state, not the original pre-move source state. So using the moved-to object for pullback is not generally equivalent to replaying the original move from the original source.
Therefore, i prefer the first method, i think it is enough, simple and good. Because move constructor itself should be something just move the data, not some heavy behavior, i think constructor_reverse_forw is useless most of time.

I'd opt for Method1 + opening a new issue to track this discussion and explain that we need a better fix.

Sure.

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

clang-tidy review says "All clean, LGTM! 👍"

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

clang-tidy review says "All clean, LGTM! 👍"

@fogsong233
Copy link
Copy Markdown
Contributor Author

@vgvassilev It seems that failed checks is not related to this pr. One is run the previously benchmark, which fails as expected, another may be the problem of envs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants