-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Description
As we're preparing for the Triton 3.2 pin update for PyTorch, one of our internal kernels segfaults during compilation here during the TTGIR coalescing pass, inside the ModuleAxisInfoAnalysis::initialize. We were able to bisect the crash to #4927, so seemingly related to the LLVM update to b5cc222d.
Repro with the TTGIR and triton-opt (here on 185299e / #4927, but crashes on main, too):
git clone https://github.com/triton-lang/triton
cd triton/
git checkout 185299e
TRITON_BUILD_WITH_CLANG_LLD=true TRITON_BUILD_WITH_CCACHE=true pip install -e python
./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt --tritongpu-coalesce ttgir.txt
TTGIR (contents of the ttgir.txt above): https://gist.github.com/aakhundov/4ac07aeab88b00e89f42d2243d787391
Output with the crash backtrace:
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: ./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt --tritongpu-coalesce ttgir.txt
#0 0x00000000033c2b07 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x33c2b07)
#1 0x00000000033c062e llvm::sys::RunSignalHandlers() (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x33c062e)
#2 0x00000000033c31bf SignalHandler(int) Signals.cpp:0:0
#3 0x00007f425b23e730 __restore_rt (/lib64/libc.so.6+0x3e730)
#4 0x0000000003205743 mlir::dataflow::AbstractSparseForwardDataFlowAnalysis::visit(mlir::ProgramPoint*) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x3205743)
#5 0x00000000031e8cf8 mlir::DataFlowSolver::initializeAndRun(mlir::Operation*) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x31e8cf8)
#6 0x0000000000ffe95a llvm::LogicalResult::failed() const /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/Support/LogicalResult.h:43:43
#7 0x0000000000ffe95a llvm::failed(llvm::LogicalResult) /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/Support/LogicalResult.h:71:58
#8 0x0000000000ffe95a mlir::triton::ModuleAxisInfoAnalysis::initialize(mlir::FunctionOpInterface) /data/users/aakhundov/temp/triton/lib/Analysis/AxisInfo.cpp:1281:7
#9 0x000000000119f795 mlir::triton::ModuleAxisInfoAnalysis::ModuleAxisInfoAnalysis(mlir::ModuleOp) /data/users/aakhundov/temp/triton/include/triton/Analysis/AxisInfo.h:182:19
#10 0x000000000119f2b5 llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int>>::allocateBuckets(unsigned int) /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/ADT/DenseMap.h:0:0
#11 0x000000000119f2b5 llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int>>::init(unsigned int) /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/ADT/DenseMap.h:805:9
#12 0x000000000119f2b5 llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int>>::DenseMap(unsigned int) /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/ADT/DenseMap.h:742:52
#13 0x000000000119f2b5 llvm::MapVector<mlir::Operation*, mlir::Attribute, llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int>>, llvm::SmallVector<std::pair<mlir::Operation*, mlir::Attribute>, 0u>>::MapVector() /home/aakhundov/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/llvm/ADT/MapVector.h:36:7
#14 0x000000000119f2b5 mlir::triton::gpu::CoalescePass::runOnOperation() /data/users/aakhundov/temp/triton/lib/Dialect/TritonGPU/Transforms/Coalesce.cpp:163:45
#15 0x0000000002f87576 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f87576)
#16 0x0000000002f87d20 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f87d20)
#17 0x0000000002f8a175 mlir::PassManager::run(mlir::Operation*) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f8a175)
#18 0x0000000002f5081f performActions(llvm::raw_ostream&, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp:0:0
#19 0x0000000002f5044d llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_2>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#20 0x0000000003354fc6 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x3354fc6)
#21 0x0000000002f4b271 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f4b271)
#22 0x0000000002f4b523 mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f4b523)
#23 0x0000000002f4b8f6 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0x2f4b8f6)
#24 0x000000000135b5cb main /data/users/aakhundov/temp/triton/bin/triton-opt.cpp:9:33
#25 0x00007f425b2295d0 __libc_start_call_main (/lib64/libc.so.6+0x295d0)
#26 0x00007f425b229680 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x29680)
#27 0x0000000000f46005 _start (./python/build/cmake.linux-x86_64-cpython-3.10/bin/triton-opt+0xf46005)
Segmentation fault (core dumped)
Is this a bug in LLVM / MLIR or something missing in the Triton code, following the LLVM update to b5cc222d? Any help or advise would be greatly appreciated, as this is relatively urgent. CC @ThomasRaoux @peterbell10 @Jokeren. Thank you!
embg and SamGinzburg
Metadata
Metadata
Assignees
Labels
No labels