-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Description
When compiled with clang++ and linked with libomp, stedc_solve stochastically fails if OMP_NUM_THREADS > 1. I originally thought that it might be an accidental double linkage with gomp through the Fortran linker, but on inspecting the compiler and linker lines, no such issue.
Steps To Reproduce
- Build SLATE with
clang/libomp - Run tests with
OMP_NUM_THREADS > 1
$ OMP_NUM_THREADS=1 python3 run_tests.py --syev
<...>
--------------------------------------------------------------------------------
All routines passed
$ OMP_NUM_THREADS=2 python3 run_tests.py --syev
<...>
./tester --origin s --target t --ref n --nb 64,100 --type s,d,c,z --lookahead 1 --dim 100:500:100 --jobz v --method-eig qr,dc heev
% SLATE version 2023.08.25, id 57ea922b
% input: ./tester --origin s --target t --ref n --nb 64,100 --type s,d,c,z --lookahead 1 --dim 100:500:100 --jobz v --method-eig qr,dc heev
% 2023-08-27 21:37:32, 1 MPI ranks, CPU-only MPI, 2 OpenMP threads per MPI rank
type origin target eig A jobz uplo n nb ib p q la pt value err back err Z orth. time (s) ref time (s) status
s scalpk task qr 1 vec lower 100 64 32 1 1 1 1 NA 2.74e-08 1.44e-07 0.0125 NA pass
s scalpk task qr 1 vec lower 100 100 32 1 1 1 1 NA 1.46e-08 1.42e-07 0.00620 NA pass
s scalpk task qr 1 vec lower 200 64 32 1 1 1 1 NA 2.37e-08 1.50e-07 0.0452 NA pass
s scalpk task qr 1 vec lower 200 100 32 1 1 1 1 NA 1.08e-08 1.40e-07 0.0385 NA pass
s scalpk task qr 1 vec lower 300 64 32 1 1 1 1 NA 3.22e-08 1.42e-07 0.114 NA pass
s scalpk task qr 1 vec lower 300 100 32 1 1 1 1 NA 1.35e-08 1.37e-07 0.113 NA pass
s scalpk task qr 1 vec lower 400 64 32 1 1 1 1 NA 9.17e-09 1.28e-07 0.237 NA pass
s scalpk task qr 1 vec lower 400 100 32 1 1 1 1 NA 2.78e-08 1.26e-07 0.232 NA pass
s scalpk task qr 1 vec lower 500 64 32 1 1 1 1 NA 1.55e-08 1.24e-07 0.421 NA pass
s scalpk task qr 1 vec lower 500 100 32 1 1 1 1 NA 1.61e-08 1.35e-07 0.431 NA pass
tester: /application/slate/src/stedc_solve.cc:120: void slate::stedc_solve(std::vector<real_t> &, std::vector<real_t> &, Matrix<real_t> &, Matrix<real_t> &, Matrix<real_t> &, const slate::Options &) [real_t = float]: Assertion `Qii.mb() == ib' failed.
[76d71bce518f:00035] *** Process received signal ***
[76d71bce518f:00035] Signal: Aborted (6)
[76d71bce518f:00035] Signal code: (-6)
[76d71bce518f:00035] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f650ad5c520]
[76d71bce518f:00035] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f650adb0a7c]
[76d71bce518f:00035] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f650ad5c476]
[76d71bce518f:00035] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f650ad427f3]
[76d71bce518f:00035] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f650ad4271b]
[76d71bce518f:00035] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f650ad53e96]
[76d71bce518f:00035] [ 6] /application/build_slate/libslate.so(+0xc3c79e)[0x7f650d3b979e]
[76d71bce518f:00035] [ 7] /lib/x86_64-linux-gnu/libomp.so.5(+0x6156c)[0x7f650bdda56c]
[76d71bce518f:00035] [ 8] /lib/x86_64-linux-gnu/libomp.so.5(+0x653b2)[0x7f650bdde3b2]
[76d71bce518f:00035] [ 9] /lib/x86_64-linux-gnu/libomp.so.5(+0x72f90)[0x7f650bdebf90]
[76d71bce518f:00035] [10] /lib/x86_64-linux-gnu/libomp.so.5(+0x6e5ea)[0x7f650bde75ea]
[76d71bce518f:00035] [11] /lib/x86_64-linux-gnu/libomp.so.5(+0x7257e)[0x7f650bdeb57e]
[76d71bce518f:00035] [12] /lib/x86_64-linux-gnu/libomp.so.5(+0x44d3d)[0x7f650bdbdd3d]
[76d71bce518f:00035] [13] /lib/x86_64-linux-gnu/libomp.so.5(+0xa29f4)[0x7f650be1b9f4]
[76d71bce518f:00035] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f650adaeb43]
[76d71bce518f:00035] [15] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f650ae3fbb4]
[76d71bce518f:00035] *** End of error message ***
FAILED: heev, exit code -6
<...>
./tester --origin s --target t --ref n --nb 64,100 --dim 100:500:100 stedc
tester: /application/slate/src/stedc_solve.cc:120: void slate::stedc_solve(std::vector<real_t> &, std::vector<real_t> &, Matrix<real_t> &, Matrix<real_t> &, Matrix<real_t> &, const slate::Options &) [real_t = double]: Assertion `Qii.mb() == ib' failed.
[76d71bce518f:00095] *** Process received signal ***
[76d71bce518f:00095] Signal: Aborted (6)
[76d71bce518f:00095] Signal code: (-6)
[76d71bce518f:00095] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f0d0b6ce520]
[76d71bce518f:00095] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f0d0b722a7c]
[76d71bce518f:00095] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f0d0b6ce476]
[76d71bce518f:00095] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f0d0b6b47f3]
[76d71bce518f:00095] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f0d0b6b471b]
[76d71bce518f:00095] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f0d0b6c5e96]
[76d71bce518f:00095] [ 6] /application/build_slate/libslate.so(+0xc3cc9e)[0x7f0d0dd2bc9e]
[76d71bce518f:00095] [ 7] /lib/x86_64-linux-gnu/libomp.so.5(+0x6156c)[0x7f0d0c74c56c]
[76d71bce518f:00095] [ 8] /lib/x86_64-linux-gnu/libomp.so.5(+0x653b2)[0x7f0d0c7503b2]
[76d71bce518f:00095] [ 9] /lib/x86_64-linux-gnu/libomp.so.5(+0x72f90)[0x7f0d0c75df90]
[76d71bce518f:00095] [10] /lib/x86_64-linux-gnu/libomp.so.5(+0x6e5ea)[0x7f0d0c7595ea]
[76d71bce518f:00095] [11] /lib/x86_64-linux-gnu/libomp.so.5(+0x7257e)[0x7f0d0c75d57e]
[76d71bce518f:00095] [12] /lib/x86_64-linux-gnu/libomp.so.5(+0x44d3d)[0x7f0d0c72fd3d]
[76d71bce518f:00095] [13] /lib/x86_64-linux-gnu/libomp.so.5(+0xa29f4)[0x7f0d0c78d9f4]
[76d71bce518f:00095] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f0d0b720b43]
[76d71bce518f:00095] [15] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f0d0b7b1bb4]
[76d71bce518f:00095] *** End of error message ***
FAILED: stedc, exit code -6
Environment
I've also attached a Dockerfile to reproduce the build environment.
# Dockerfile
FROM ubuntu:22.04
RUN apt update && \
apt install -y locales && \
locale-gen "en_US.UTF-8" && \
update-locale LANG=en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
WORKDIR /application
# Base Environment
RUN apt -y update && apt -y install make wget curl \
lsb-release coreutils sudo bash-completion \
apt-transport-https software-properties-common \
ca-certificates gnupg linux-tools-common time pciutils \
build-essential wget curl \
git make ninja-build \
gdb valgrind \
libeigen3-dev \
libblas-dev liblapack-dev liblapacke-dev \
libunwind-dev libtbb-dev libomp-dev \
libopenmpi-dev openmpi-bin libscalapack-openmpi-dev
# CMake + Clang
RUN apt -y install cmake cmake-curses-gui
RUN apt -y install clang-12 libomp-12-dev
# Clone SLATE
RUN git clone --recurse-submodules https://github.com/icl-utk-edu/slate.git
RUN git -C slate checkout 57ea922b4a10876ba990a41648590ef36019acdd
# Build BLASPP
RUN cmake -S slate/blaspp -B build_blaspp -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12
RUN cmake --build build_blaspp --target blaspp -j2
# Build LAPACKPP
RUN cmake -S slate/lapackpp -B build_lapackpp -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 -Dblaspp_DIR=$PWD/build_blaspp
RUN cmake --build build_lapackpp --target lapackpp -j2
# Build SLATE
RUN cmake -S slate -B build_slate -DCMAKE_CXX_COMPILER=clang++-12 -Dblaspp_DIR=$PWD/build_blaspp -Dlapackpp_DIR=$PWD/build_lapackpp -DBUILD_TESTING=ON -DSCALAPACK_LIBRARIES="/usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so"
RUN cmake --build build_slate --target all -j2 --verbose
- SLATE version / commit ID (e.g.,
git log --oneline -n 1): 57ea922 - How installed:
- git clone
- release tar file
- Spack
- module
- How compiled:
- makefile (include your
make.inc) - CMake (include your command line options)
- makefile (include your
- Compiler & version (e.g.,
mpicxx --version): - BLAS library (e.g., MKL, ESSL, OpenBLAS) & version: NETLIB
- CUDA / ROCm / oneMKL version (e.g.,
nvcc --version): N/A - MPI library & version (MPICH, Open MPI, Intel MPI, IBM Spectrum, Cray MPI, etc. Sometimes
mpicxx -vgives info.): Open MPI - OS: Ubuntu 22.04
- Hardware (CPUs, GPUs, nodes):AMD EPYC 7302P 16-Core
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels