fix: use adjoint while move it#1772
Conversation
|
clang-tidy review says "All clean, LGTM! 👍" |
| if (argDiff.getExpr_dx() && arg->isXValue() && | ||
| argDiff.getExpr_dx()->isLValue()) { | ||
| llvm::SmallVector<Expr*, 1> moveArg = {argDiff.getExpr_dx()}; | ||
| Expr* revSweepAdjoint = Clone(argDiff.getExpr_dx()); |
There was a problem hiding this comment.
We will need a test for this.
There was a problem hiding this comment.
Maybe we should also need change the exsiting test,which checks generated code as before
#include <clad/Differentiator/CladtorchBuiltins.h>
#include <clad/Differentiator/Differentiator.h>
using FTensor = cladtorch::Tensor<float>;
struct M {
FTensor f(const FTensor& a, const FTensor& b) const {
auto t = a + b;
return t;
}
};
float loss(const M& m, const FTensor& a, const FTensor& b, const FTensor& c) {
auto y = m.f(a, b);
auto z = y * c;
auto n = z.norm();
return n.scalar();
}
int main() {
auto grad = clad::gradient(loss, "1");
(void)grad;
return 0;
}See this example: after clad's process: clad::ValueAndAdjoint<FTensor, FTensor> f_reverse_forw(const FTensor &a, const FTensor &b, const M *_d_this, const FTensor &_d_a, const FTensor &_d_b) const {
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t0 = clad::custom_derivatives::class_functions::operator_plus_reverse_forw(&a, b, &_d_a, _d_b);
cladtorch::Tensor<float> t = _t0.value;
cladtorch::Tensor<float> _d_t = _t0.adjoint;
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
return {_t1.value, _t1.adjoint};
}
void f_pullback(const FTensor &a, const FTensor &b, FTensor _d_y, M *_d_this, FTensor *_d_a, FTensor *_d_b) const {
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t0 = clad::custom_derivatives::class_functions::operator_plus_reverse_forw(&a, b, _d_a, (*_d_b));
cladtorch::Tensor<float> t = _t0.value;
cladtorch::Tensor<float> _d_t = _t0.adjoint;
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
clad::custom_derivatives::class_functions::constructor_pullback(std::move(t), &_d_y, &_d_t);
clad::custom_derivatives::class_functions::operator_plus_pullback(&a, b, _d_t, _d_a, _d_b);
}
void loss_grad_1(const M &m, const FTensor &a, const FTensor &b, const FTensor &c, FTensor *_d_a) {
M _d_m = {};
FTensor _d_b(b);
FTensor _d_c(c);
clad::ValueAndAdjoint<FTensor, FTensor> _t0 = m.f_reverse_forw(a, b, &_d_m, (*_d_a), _d_b);
cladtorch::FTensor y = _t0.value;
cladtorch::FTensor _d_y = _t0.adjoint;
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::operator_star_reverse_forw(&y, c, &_d_y, _d_c);
cladtorch::Tensor<float> z = _t1.value;
cladtorch::Tensor<float> _d_z = _t1.adjoint;
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t2 = clad::custom_derivatives::class_functions::norm_reverse_forw(&z, &_d_z);
cladtorch::Tensor<float> n = _t2.value;
cladtorch::Tensor<float> _d_n = _t2.adjoint;
clad::custom_derivatives::class_functions::scalar_pullback(&n, 1, &_d_n);
clad::custom_derivatives::class_functions::norm_pullback(&z, _d_n, &_d_z);
clad::custom_derivatives::class_functions::operator_star_pullback(&y, c, _d_z, &_d_y, &_d_c);
m.f_pullback(a, b, _d_y, &_d_m, _d_a, &_d_b);
}There: cladtorch::Tensor<float> _d_t = _t0.adjoint;
::clad::ValueAndAdjoint< ::cladtorch::Tensor<float>, ::cladtorch::Tensor<float> > _t1 = clad::custom_derivatives::class_functions::constructor_reverse_forw(clad::Tag<Tensor<float> >(), std::move(t), std::move(_d_t));
clad::custom_derivatives::class_functions::constructor_pullback(std::move(t), &_d_y, &_d_t);
clad::custom_derivatives::class_functions::operator_plus_pullback(&a, b, _d_t, _d_a, _d_b);It use after move |
|
@guitargeek, I lack bandwidth but can you take a look? |
|
Sure! Can @fogsong233, could you first make sure that the CI passes? Also, does this PR address an existing GitHub issue? |
|
In particular, there are merge conflicts that need to be resolved. |
This PR changes the structure of the generated code, but the plan hasn’t been finalized yet. I think we could first implement a solution, and then update the CI to match the new structure once it’s properly defined. |
This is a new problem and not related to any existing issue. |
822b1f2 to
4713126
Compare
Can we bisect it? Maybe somewhere there was a faulty merge. |
|
clang-tidy review says "All clean, LGTM! 👍" |
Okay, i will try to find it, but i guess it is the commit that impl the move-specific path in reverse mode that causes it. |
Perhaps, but let's make sure that's the case. |
9ed0b9e Generate constructor_reverse_forw automatically
#1625 |
|
Ok, then we will need a fix to move forward. Eg. we need to resolve this as either part of this PR or a separate PR. |
4713126 to
6caf896
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
I currently see two possible ways to address this issue, but neither is fully satisfactory because move construction is special. Method 1: This is the simplest and safest fix, and it avoids the immediate use-after-move problem. However, it also means we give up the new Method 2: At first glance this looks attractive, but I do not think it is generally sound. A move constructor is not just an ordinary pure function call: it may consume or transform the entire source object. In reverse mode, the current pipeline may need to touch the same construction in multiple stages ( For example: struct S {
double x;
S(S&& o) : x(o.x * o.x) { o.x = 0; }
};Here the destination object stores transformed state, not the original pre-move source state. So using the moved-to object for pullback is not generally equivalent to replaying the original move from the original source. Therefore, i prefer the first method, i think it is enough, simple and good. Because move constructor itself should be something just move the data, not some heavy behavior, i think |
I'd opt for Method1 + opening a new issue to track this discussion and explain that we need a better fix. |
Sure. |
6caf896 to
540a2e4
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
@vgvassilev It seems that failed checks is not related to this pr. One is run the previously benchmark, which fails as expected, another may be the problem of envs |
I have currently implemented a fix with maximum compatibility by creating a copy. However, xvalues are still being used within the function; this incorrect usage of rvalues is a systemic issue throughout the entire workflow.
It appears that Clad is not correctly perceiving the move operation, which prevents the proper transformation of the adjoint. This suggests a potential design conflict between the semantics of a 'move' and the requirements of automatic differentiation