Skip to content

Commit 9eb7407

Browse files
authored
Run CUPTI C++ example with timeout and stack (rapidsai#501)
We have seen an instance of `./example_cupti_monitor` deadlocking in CI. This change limits it to a 30 second run and prints the stack when a deadlock occurs, so it may aid us in debugging if it happens again. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - James Lamb (https://github.com/jameslamb) URL: rapidsai#501
1 parent 4dc6bbb commit 9eb7407

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

ci/run_cpp_example_smoketests.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
set -xeuo pipefail
66

7+
TIMEOUT_TOOL_PATH="$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/timeout_with_stack.py
8+
79
# Support customizing the ctests' install location
810
cd "${INSTALL_PREFIX:-${CONDA_PREFIX:-/usr}}/bin/examples/librapidsmpf/"
911

@@ -16,7 +18,7 @@ export OMPI_MCA_opal_cuda_support=1 # enable CUDA support in OpenMPI
1618
mpirun --map-by node --bind-to none -np 2 ./example_shuffle
1719

1820
# Ensure that cupti monitor example is runnable and creates the expected csv file
19-
./example_cupti_monitor
21+
python "${TIMEOUT_TOOL_PATH}" 30 ./example_cupti_monitor
2022
if [[ ! -f cupti_monitor_example.csv ]]; then
2123
echo "Error: cupti_monitor_example.csv was not created!"
2224
exit 1

dependencies.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -274,19 +274,20 @@ dependencies:
274274
common:
275275
- output_types: conda
276276
packages:
277+
- click >=8.1
277278
- *cmake_ver
279+
- cuda-sanitizer-api
280+
- psutil # Used for timeout_with_stack.py
278281
- openmpi >=5.0 # See <https://github.com/rapidsai/rapidsmpf/issues/17>
279282
- valgrind
280-
- cuda-sanitizer-api
281-
- click >=8.1
282283
test_python:
283284
common:
284285
- output_types: conda
285286
packages:
286287
- gdb
287288
- output_types: [conda, pyproject, requirements]
288289
packages:
289-
- psutil
290+
- psutil # Used for timeout_with_stack.py
290291
- pytest
291292
- nvidia-ml-py
292293
specific:

0 commit comments

Comments
 (0)