Skip to content

[BUG] Tests fail on a 2-GPU system #802

@leofang

Description

@leofang

Describe the bug
See the attached test log: test_log.txt.zip

2 tests in test_module_callbacks.py fail, and then most tests starting test_multithreads.py fail with CUDA_ERROR_ILLEGAL_ADDRESS.

CUDA_ERROR_ILLEGAL_ADDRESS is also observed during interpreter shutdown time.

Steps/Code to reproduce bug
Run pytest on a 2-GPU system.

Expected behavior
Tests pass.

Environment details (please complete the following information):

Additional context
My system has 2 RTX A6000 + NVLink v4 + driver 580.65.06 (13.0). @brandon-b-miller reproduces on his DGX box too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions