A minimal example project for running Large Language Model (LLM) inference on CHTC GPU Lab at UW-Madison.
GMT20260116-222906_Recording_1920x1080.mp4
CHTC's documentation is great, but setting up LLM inference involves several gotchas:
- GPU Lab nodes cannot directly access
/stagingvia paths in submit files - Version conflicts between container PyTorch and pip-installed packages
- Efficient environment management to avoid slow pip installs every job
This project solves all of that with a clean, working setup.
Check available GPUs:
condor_status -constraint 'Gpus > 0 && State=="Unclaimed" && CUDACapability >= 9.0' \
-af Machine Name Gpus CUDAGlobalMemoryMb CUDACapability.
├── setup_env.sh # One-time: Create conda environment in /staging
├── setup_model.sh # One-time: Download model to /staging
├── gpu-lab.sub # HTCondor submit file
├── run_gpu_job.sh # Job execution script
├── infer.py # Python inference code
├── prompts.txt # Input prompts (one per line)
├── run.sh # Convenience script to submit jobs
├── log/ # Job logs (created automatically)
└── responses/ # Model outputs (created automatically)
- CHTC account with GPU Lab access
- SSH access to submit node (e.g.,
ap2001.chtc.wisc.edu) - Miniconda installed in your home directory
Below, USERNAME mean yours username. Mine is hhao9.
ssh USERNAME@ap2001.chtc.wisc.edu
cd ~
git clone https://github.com/hongtaoh/chtc_llm_demo.git llm
cd llmReplace hhao9 with your username in all files:
sed -i 's/hhao9/USERNAME/g' setup_env.sh setup_model.sh gpu-lab.sub run_gpu_job.shThis creates a portable conda environment with PyTorch and Transformers:
bash setup_env.shThis saves llm.tar.gz (~5-6GB) to /staging/USERNAME/.
This downloads and packages the Qwen 0.5B model:
bash setup_model.shThis saves Qwen-0.5B.tar.gz (~1GB) to /staging/USERNAME/my_models/.
After setup, your staging directory should look like:
/staging/<username>/
├── llm.tar.gz # Conda environment (~5-6GB)
└── my_models/
└── Qwen-0.5B.tar.gz # Model weights (~1GB)
bash run.shTo see the resources:
condor_q -better-analyzeEdit setup_model.sh:
MODEL_REPO="meta-llama/Llama-2-7b-chat-hf" # Change this
MODEL_NAME="Llama-7B" # Change thisThen re-run:
bash setup_model.shUpdate run_gpu_job.sh to point to the new model:
MODEL_TARBALL="/staging/USERNAME/my_models/Llama-7B.tar.gz"
export MODEL_ID="./Llama-7B"Edit prompts.txt (one prompt per line):
Your first prompt here
Your second prompt here
Edit gpu-lab.sub:
request_memory = 24GB
request_disk = 50GB
+GPUJobLength = "medium" # For jobs > 12 hours
- Environment: Pre-built conda env (
llm.tar.gz) contains PyTorch + Transformers - Model: Pre-downloaded model (
Qwen-0.5B.tar.gz) avoids runtime downloads - Staging access: GPU Lab nodes access
/stagingdirectly at runtime (not viatransfer_input_files) - No containers: Uses vanilla universe with portable conda environment
This approach gives fast job startup (~30 sec) instead of slow pip installs (~5 min).
MIT