Skip to content

Conversation

@cptspacemanspiff
Copy link
Contributor

@cptspacemanspiff cptspacemanspiff commented Jan 30, 2025

Issue:

When I have been compiling executorch with LTO enabled (linktime-optimization) I ran into warnings saying that there was an ODR violation. ODR violations scare me, so looking at it, it seems to be caused by a const mismatch between the eigen declaration of the BLAS operation and the CPUBlas.cpp forward declaration.

In kernals/optimized/blas/CPUBlass.cpp

#ifdef ET_BUILD_WITH_BLAS
#ifdef ET_BUILD_FOR_APPLE
#include <Accelerate/Accelerate.h>
#else
// clang-format off
extern "C" void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, const double *a, int *lda, const double *b, int *ldb, double *beta, double *c, int *ldc);
extern "C" void sgemm_(char *transa, char *transb, int *m, int *n, int *k, float *alpha, const float *a, int *lda, const float *b, int *ldb, float *beta, float *c, int *ldc);
// clang-format on
#endif // ET_BUILD_FOR_APPLE
#endif // ET_BUILD_WITH_BLAS

In kernels/optimized/third-party/eigen/blas/level3_impl.h (they have a macro that generates the s/d precisions.)

EIGEN_BLAS_FUNC(gemm)
(const char *opa, const char *opb, const int *m, const int *n, const int *k, const RealScalar *palpha,
 const RealScalar *pa, const int *lda, const RealScalar *pb, const int *ldb, const RealScalar *pbeta, RealScalar *pc,
 const int *ldc)

Since the forward declaration is inside a cpp compilation unit, this error usually does not show, but LTO allows the linker to introspect across compilation units for optimizations.

It seems like it was probably a typo/manual copy paste error, to my understanding BLAS apis are pretty standardized, and it does not make sense to have your BLAS operator move your input pointers.

It also probably has not broken things (aside from possibly lost optimization opportunity)?

That being said ODR violations and undefined behavior in general scare me.

System:

FWIW, This is on ubuntu-22.04, x86-64 w/ zen3 cpu, but is probably there on all non-apple machines.

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8049

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 94decbc with merge base 21ec3a1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2025
@manuelcandales manuelcandales added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jan 30, 2025
@manuelcandales
Copy link
Contributor

cc @swolchok

@manuelcandales manuelcandales added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 30, 2025
// clang-format off
extern "C" void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, const double *a, int *lda, const double *b, int *ldb, double *beta, double *c, int *ldc);
extern "C" void sgemm_(char *transa, char *transb, int *m, int *n, int *k, float *alpha, const float *a, int *lda, const float *b, int *ldb, float *beta, float *c, int *ldc);
extern "C" void dgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k, const double *alpha, const double *a, const int *lda, const double *b, const int *ldb, const double *beta, double *c, const int *ldc);
Copy link
Contributor

@swolchok swolchok Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing code matches PyTorch core (https://github.com/pytorch/pytorch/blob/9f9904172d5b94e6cd06d915e15721d6e8868b7a/aten/src/ATen/native/CPUBlas.cpp#L19), so I'm hesitant to change this without an explanation of why it is OK in PyTorch core (which supports Eigen as the BLAS implementation) but not here. In particular, C linkage should mean that the name of the function is the only thing that matters. can you include the specific build command you are running and error you are seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is happening at linking, only occurs with LTO (as in building everything with LTO enabled)

&& /usr/bin/c++ -flto=auto -fno-fat-lto-objects -Wl,--whole-archive /home/nlong/execu-tools/cpp/build/stage/lib/libexecutorch.a -Wl,--no-whole-archive -Wl,--whole-archive /home/nlong/execu-tools/cpp/build/stage/lib/liboptimized_native_cpu_ops_lib.a -Wl,--no-whole-archive test/CMakeFiles/ExecuToolsEncoderDecoder.dir/manual_encoder_decoder_runner.cpp.o -o stage/bin/ExecuToolsEncoderDecoder  stage/lib/libExecuTools.a  stage/lib/liboptimized_native_cpu_ops_lib.a  stage/lib/liboptimized_kernels.a  stage/lib/libportable_kernels.a  stage/lib/libextension_module_static.a  stage/lib/libextension_data_loader.a  stage/lib/libextension_tensor.a  stage/lib/libetdump.a  _deps/executorch/third-party/flatcc/lib/libflatccrt.a  stage/lib/libtokenizers_cpp.a  _deps/tokenizers-cpp-build/release/libtokenizers_c.a  stage/lib/libsentencepiece.a  stage/lib/libcpublas.a  stage/lib/libeigen_blas.a  stage/lib/libextension_threadpool.a  stage/lib/libcpuinfo.a  stage/lib/libpthreadpool.a  stage/lib/libexecutorch.a  stage/lib/libexecutorch_core.a  -ldl && :
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/blas/CPUBlas.cpp:19:17: warning: ‘sgemm_’ violates the C++ One Definition Rule [-Wodr]
   19 | extern "C" void sgemm_(char *transa, char *transb, int *m, int *n, int *k, float *alpha, const float *a, int *lda, const float *b, int *ldb, float *beta, float *c, int *ldc);
      |                 ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type mismatch in parameter 1
   12 | EIGEN_BLAS_FUNC(gemm)
      | ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type ‘const char’ should match type ‘char’
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: ‘sgemm_’ was previously declared here
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/blas/CPUBlas.cpp:18:17: warning: ‘dgemm_’ violates the C++ One Definition Rule [-Wodr]
   18 | extern "C" void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, const double *a, int *lda, const double *b, int *ldb, double *beta, double *c, int *ldc);
      |                 ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type mismatch in parameter 1
   12 | EIGEN_BLAS_FUNC(gemm)
      | ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type ‘const char’ should match type ‘char’
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: ‘dgemm_’ was previously declared here

This is the command to clean build executorch off of the main branch w/ LTO enabled, which should reproduce the error:

(from the executorch root)

mkdir cmake_out_lto_build && cd cmake_out_lto_build 
cmake .. -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON -DCMAKE_POLICY_DEFAULT_CMP0069=NEW -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON
make -j 12

In terms of C linkage not exposing this:

(Note: this is to my best understanding...)

This only shows up because of link time optimization, which rather than compiling each static library/compilation unit to bytecode, it compiles them to an IR representation (Clang uses llvm IR, I am not sure what gcc uses). These are all bundled together during link time and an additional compilation pass is ran there, and then the bytecode is actually produced.

This allows for whole program optimization across compilation units, for instance if one of our gemm operators just did a no-op, LTO would optimize the whole call out.

This is why during link time this ODR issue shows up, the linker is looking at the IR representation of the whole program and saying wait these two definitions don't match.

This would not show up if eigen was not compiled with LTO, and was just a regular static library, the linker would fall back to the regular C linkage, and have no idea about the internals.

random thoughts

I cannot comment about pytorch, but I guess I am assuming that they don't do whole program builds with LTO enabled, in which case this would have never appeared? LTO is newish CMAKE added support for non-intel compilers in ~2017?

But depending on how the build is setup, to get LTO working, everything needs to be a static lib, and you need to be able to control the build flags of all artifacts. It's much easier to setup if you're building in tree.

I doubt that this particular ODR violation breaks things, because we are essentially calling with const parameters through a non-const wrapper. But even without the LTO optimization pass, this is probably non-ideal, because somewhere the compiler has to check did these pointers move, or more likely they just get dropped. IDK.

sidenote:

I am kind of curious if enabling LTO in one of the examples affects performance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional:

Looks like pytorch people are looking at enabling LTO (for binary size savings mainly it seems)?pytorch/pytorch#93955

My guess is that they have not run into this yet, but it is probably a bug on their end too (if my understanding that the gemm operators are a standard API).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two definitions don't match

Where is the second definition? The thing on line 19 of CPUBlas.cpp is a declaration, not a definition.

What compiler version is this? I see that it's only a warning (-Wodr), and perhaps the warning is just wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What compiler version

No repro on Mac with Xcode 16.2, FWIW.

Copy link
Contributor Author

@cptspacemanspiff cptspacemanspiff Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the second definition? The thing on line 19 of CPUBlas.cpp is a declaration, not a definition.

Sorry, that should have been declaration

No repro on Mac with Xcode 16.2, FWIW.

It should not happen on a mac, because they use the accelerate library instead which seems to provide the declaration headers (they use const in theirs FWIW)

I was using gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04), but will try with clang:

I just rebuilt with clang-18, and it is not there, so maybe it is just a compiler warning issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that should have been declaration

my point here was that you can't break the One Definition Rule with a declaration, hence the suspicion of the compiler

@shoumikhin
Copy link
Contributor

@cptspacemanspiff any updates?
@swolchok how do we want to proceed with this change?

@cptspacemanspiff
Copy link
Contributor Author

So, it is the same error in the gcc-14, an odr warning, because of mismatch between declarations.

Clang does not have the warning because the clang IR representation for LTO does not contain the full function type signature of the declarations to compare. so clang does not have the ability to catch this mismatch (according to a stack overflow article I read somewhere a month ago.)

My view:
All this being said, it does not cause issues, and apparently mirrors pytorch, so if they get annoyed with the warning message if/when enabling LTO builds, probably just fix the issue then.

I am assuming that it was a typo/forgot the const because the gemm function def is pretty well standardized, and since the only way you would notice is if you enable LTO, in gcc, and look at the warnings.

@byjlw
Copy link
Contributor

byjlw commented Mar 31, 2025

So, it is the same error in the gcc-14, an odr warning, because of mismatch between declarations.

Clang does not have the warning because the clang IR representation for LTO does not contain the full function type signature of the declarations to compare. so clang does not have the ability to catch this mismatch (according to a stack overflow article I read somewhere a month ago.)

My view: All this being said, it does not cause issues, and apparently mirrors pytorch, so if they get annoyed with the warning message if/when enabling LTO builds, probably just fix the issue then.

I am assuming that it was a typo/forgot the const because the gemm function def is pretty well standardized, and since the only way you would notice is if you enable LTO, in gcc, and look at the warnings.

What's the plan moving forward on this?

@swolchok
Copy link
Contributor

swolchok commented Apr 1, 2025

What's the plan moving forward on this?

As I replied on Discord on 3/25: my impression was that all parties concerned were happy not taking further action for now

@larryliu0820
Copy link
Contributor

Should we close this if no further action?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: ops & kernels Changes to the opset and any new / changed kernel implementations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants