Docker single-GPU verification, NVSHMEM/pip build fix, and small runtime fixes by rich7420 · Pull Request #140 · hao-ai-lab/DistCA

rich7420 · 2026-02-06T02:42:58Z

Add Docker-based single-GPU smoke test and benchmark (no Slurm).
Fix csrc build when using pip-installed NVSHMEM (nvidia-nvshmem-cu12).
Fix rope_scaling handling when hf_config is a PretrainedConfig object.
Improve softmax_lse shape assert message in dispatch.py and flash-attn error message in fused_comm_attn.py.

Docker / scripts: Dockerfile, scripts/docker_install_and_build.sh, scripts/run_docker_benchmark.sh, scripts/run_docker_single_gpu_smoke.sh, scripts/run_docker_single_gpu_benchmark.sh, scripts/single_gpu_smoke.sh, scripts/single_gpu_benchmark.sh — one-shot smoke/benchmark (container exits) or interactive shell (container stays).
Docs: README.md (link to Docker verification), README.Docker.md (step-by-step).
Build: csrc/CMakeLists.txt, csrc/cmake/FindNVSHMEM.cmake — support NVSHMEM from pip (no NVSHMEMConfig.cmake).
Runtime: distca/runtime/attn_kernels/dispatch.py (clearer assert), distca/runtime/megatron/ops/fused_comm_attn.py (error text), distca/utils/megatron_test_utils.py (rope_scaling for config object).
Other: .gitignore (e.g. models/, .build/), requirements.txt (transformers pin, comment cleanup).

From repo root with one GPU: ./scripts/run_docker_single_gpu_smoke.sh

rich7420 added 2 commits February 3, 2026 22:58

finally passed pretrain part

becf9e8

add more script and docs

819f2d5

Provide feedback