Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions kernels/optimized/blas/CPUBlas.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
#include <Accelerate/Accelerate.h>
#else
// clang-format off
extern "C" void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, const double *a, int *lda, const double *b, int *ldb, double *beta, double *c, int *ldc);
extern "C" void sgemm_(char *transa, char *transb, int *m, int *n, int *k, float *alpha, const float *a, int *lda, const float *b, int *ldb, float *beta, float *c, int *ldc);
extern "C" void dgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k, const double *alpha, const double *a, const int *lda, const double *b, const int *ldb, const double *beta, double *c, const int *ldc);
Copy link
Contributor

@swolchok swolchok Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing code matches PyTorch core (https://github.com/pytorch/pytorch/blob/9f9904172d5b94e6cd06d915e15721d6e8868b7a/aten/src/ATen/native/CPUBlas.cpp#L19), so I'm hesitant to change this without an explanation of why it is OK in PyTorch core (which supports Eigen as the BLAS implementation) but not here. In particular, C linkage should mean that the name of the function is the only thing that matters. can you include the specific build command you are running and error you are seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is happening at linking, only occurs with LTO (as in building everything with LTO enabled)

&& /usr/bin/c++ -flto=auto -fno-fat-lto-objects -Wl,--whole-archive /home/nlong/execu-tools/cpp/build/stage/lib/libexecutorch.a -Wl,--no-whole-archive -Wl,--whole-archive /home/nlong/execu-tools/cpp/build/stage/lib/liboptimized_native_cpu_ops_lib.a -Wl,--no-whole-archive test/CMakeFiles/ExecuToolsEncoderDecoder.dir/manual_encoder_decoder_runner.cpp.o -o stage/bin/ExecuToolsEncoderDecoder  stage/lib/libExecuTools.a  stage/lib/liboptimized_native_cpu_ops_lib.a  stage/lib/liboptimized_kernels.a  stage/lib/libportable_kernels.a  stage/lib/libextension_module_static.a  stage/lib/libextension_data_loader.a  stage/lib/libextension_tensor.a  stage/lib/libetdump.a  _deps/executorch/third-party/flatcc/lib/libflatccrt.a  stage/lib/libtokenizers_cpp.a  _deps/tokenizers-cpp-build/release/libtokenizers_c.a  stage/lib/libsentencepiece.a  stage/lib/libcpublas.a  stage/lib/libeigen_blas.a  stage/lib/libextension_threadpool.a  stage/lib/libcpuinfo.a  stage/lib/libpthreadpool.a  stage/lib/libexecutorch.a  stage/lib/libexecutorch_core.a  -ldl && :
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/blas/CPUBlas.cpp:19:17: warning: ‘sgemm_’ violates the C++ One Definition Rule [-Wodr]
   19 | extern "C" void sgemm_(char *transa, char *transb, int *m, int *n, int *k, float *alpha, const float *a, int *lda, const float *b, int *ldb, float *beta, float *c, int *ldc);
      |                 ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type mismatch in parameter 1
   12 | EIGEN_BLAS_FUNC(gemm)
      | ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type ‘const char’ should match type ‘char’
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: ‘sgemm_’ was previously declared here
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/blas/CPUBlas.cpp:18:17: warning: ‘dgemm_’ violates the C++ One Definition Rule [-Wodr]
   18 | extern "C" void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, const double *a, int *lda, const double *b, int *ldb, double *beta, double *c, int *ldc);
      |                 ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type mismatch in parameter 1
   12 | EIGEN_BLAS_FUNC(gemm)
      | ^
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: type ‘const char’ should match type ‘char’
/home/nlong/execu-tools/cpp/build/_deps/executorch/kernels/optimized/third-party/eigen/blas/level3_impl.h:12:1: note: ‘dgemm_’ was previously declared here

This is the command to clean build executorch off of the main branch w/ LTO enabled, which should reproduce the error:

(from the executorch root)

mkdir cmake_out_lto_build && cd cmake_out_lto_build 
cmake .. -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON -DCMAKE_POLICY_DEFAULT_CMP0069=NEW -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON
make -j 12

In terms of C linkage not exposing this:

(Note: this is to my best understanding...)

This only shows up because of link time optimization, which rather than compiling each static library/compilation unit to bytecode, it compiles them to an IR representation (Clang uses llvm IR, I am not sure what gcc uses). These are all bundled together during link time and an additional compilation pass is ran there, and then the bytecode is actually produced.

This allows for whole program optimization across compilation units, for instance if one of our gemm operators just did a no-op, LTO would optimize the whole call out.

This is why during link time this ODR issue shows up, the linker is looking at the IR representation of the whole program and saying wait these two definitions don't match.

This would not show up if eigen was not compiled with LTO, and was just a regular static library, the linker would fall back to the regular C linkage, and have no idea about the internals.

random thoughts

I cannot comment about pytorch, but I guess I am assuming that they don't do whole program builds with LTO enabled, in which case this would have never appeared? LTO is newish CMAKE added support for non-intel compilers in ~2017?

But depending on how the build is setup, to get LTO working, everything needs to be a static lib, and you need to be able to control the build flags of all artifacts. It's much easier to setup if you're building in tree.

I doubt that this particular ODR violation breaks things, because we are essentially calling with const parameters through a non-const wrapper. But even without the LTO optimization pass, this is probably non-ideal, because somewhere the compiler has to check did these pointers move, or more likely they just get dropped. IDK.

sidenote:

I am kind of curious if enabling LTO in one of the examples affects performance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional:

Looks like pytorch people are looking at enabling LTO (for binary size savings mainly it seems)?pytorch/pytorch#93955

My guess is that they have not run into this yet, but it is probably a bug on their end too (if my understanding that the gemm operators are a standard API).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two definitions don't match

Where is the second definition? The thing on line 19 of CPUBlas.cpp is a declaration, not a definition.

What compiler version is this? I see that it's only a warning (-Wodr), and perhaps the warning is just wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What compiler version

No repro on Mac with Xcode 16.2, FWIW.

Copy link
Contributor Author

@cptspacemanspiff cptspacemanspiff Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the second definition? The thing on line 19 of CPUBlas.cpp is a declaration, not a definition.

Sorry, that should have been declaration

No repro on Mac with Xcode 16.2, FWIW.

It should not happen on a mac, because they use the accelerate library instead which seems to provide the declaration headers (they use const in theirs FWIW)

I was using gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04), but will try with clang:

I just rebuilt with clang-18, and it is not there, so maybe it is just a compiler warning issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that should have been declaration

my point here was that you can't break the One Definition Rule with a declaration, hence the suspicion of the compiler

extern "C" void sgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k, const float *alpha, const float *a,const int *lda, const float *b,const int *ldb, const float *beta, float *c, const int *ldc);
// clang-format on
#endif // ET_BUILD_FOR_APPLE
#endif // ET_BUILD_WITH_BLAS
Expand Down