-
Notifications
You must be signed in to change notification settings - Fork 1
Spawning isolated processes for each test #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add eval_multiprocessing.py to run each test in a separate subprocess for better CUDA error isolation. Log from test_adverse_cases.py
Log from test_eval_multiprocessing.py
|
@jiannanWang Also it'd be useful to add some timings for running op_info suite against aten here. You mentioned there was a large slowdown, so it's worth seeing how bad it is. Feel free to just use time {command} |
Nvidia-smi results showing this is multiprocessing. I'm not sure why device 0 got so many process.
Runtime:
1 GPU:
Main (no multiprocessing)
It turns out that multiprocessing does not provide a speedup in our case. This is likely because our tests run very quickly, and most of the overhead comes from initialization, which multiprocessing cannot accelerate. For now, we might need to avoid using multiprocessing as the default solution. Instead, we can offer it as an alternative option for users who encounter frequent CUDA errors in their experiments. |
@jiannanWang these results make sense I think, but what command are you using for testing / what is the backend/suite. Unclear if we should be concerned with using 8 gpus. Also can you a similar time trial on main for record keeping. |
Added in the previous comment. I'm using |
@@ -37,4 +37,4 @@ jobs: | |||
run: uv run python -m BackendBench.scripts.main --suite facto --backend aten --ops "add.Tensor" | |||
|
|||
- name: Run pytest tests | |||
run: uv run pytest test/ | |||
run: uv run pytest test/ --deselect test/test_adverse_cases.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not do this here cause then everyone would need to be aware of this test locally, you can add a skip test directly in the test file
import pytest | ||
import torch | ||
|
||
import BackendBench.multiprocessing_eval as multiprocessing_eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge this file into the the test_adverse_cases file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please address feedback before merge
Fix #45 .
This PR introduces a
--num-workers
flag to enable multiprocessing evaluation. The main purpose is to spawn isolated processes to prevent CUDA errors from waterfalling during benchmarking. Note that multiprocessing does not improve speed because tests run quickly and most overhead comes from initialization, which multiprocessing cannot accelerate. Therefore, multiprocessing is disabled by default and should only be enabled if CUDA errors block the experiment.