CHTC GPU Lab: LLM Inference Example

A minimal example project for running Large Language Model (LLM) inference on CHTC GPU Lab at UW-Madison.

Demo Video

GMT20260116-222906_Recording_1920x1080.mp4

Why This Project?

CHTC's documentation is great, but setting up LLM inference involves several gotchas:

GPU Lab nodes cannot directly access /staging via paths in submit files
Version conflicts between container PyTorch and pip-installed packages
Efficient environment management to avoid slow pip installs every job

This project solves all of that with a clean, working setup.

Available GPUs

Check available GPUs:

condor_status -constraint 'Gpus > 0 && State=="Unclaimed" && CUDACapability >= 9.0' \
  -af Machine Name Gpus CUDAGlobalMemoryMb CUDACapability

Project Structure

.
├── setup_env.sh       # One-time: Create conda environment in /staging
├── setup_model.sh     # One-time: Download model to /staging
├── gpu-lab.sub        # HTCondor submit file
├── run_gpu_job.sh     # Job execution script
├── infer.py           # Python inference code
├── prompts.txt        # Input prompts (one per line)
├── run.sh             # Convenience script to submit jobs
├── log/               # Job logs (created automatically)
└── responses/         # Model outputs (created automatically)

Prerequisites

CHTC account with GPU Lab access
SSH access to submit node (e.g., ap2001.chtc.wisc.edu)
Miniconda installed in your home directory

Quick Start

1. Clone the repository

Below, USERNAME mean yours username. Mine is hhao9.

ssh USERNAME@ap2001.chtc.wisc.edu
cd ~
git clone https://github.com/hongtaoh/chtc_llm_demo.git llm
cd llm

2. Configure for your username

Replace hhao9 with your username in all files:

sed -i 's/hhao9/USERNAME/g' setup_env.sh setup_model.sh gpu-lab.sub run_gpu_job.sh

3. Set up environment (one-time, ~10 min)

This creates a portable conda environment with PyTorch and Transformers:

bash setup_env.sh

This saves llm.tar.gz (~5-6GB) to /staging/USERNAME/.

4. Download model (one-time, ~2 min)

This downloads and packages the Qwen 0.5B model:

bash setup_model.sh

This saves Qwen-0.5B.tar.gz (~1GB) to /staging/USERNAME/my_models/.

Files in /staging

After setup, your staging directory should look like:

/staging/<username>/
├── llm.tar.gz                    # Conda environment (~5-6GB)
└── my_models/
    └── Qwen-0.5B.tar.gz          # Model weights (~1GB)

5. Submit a job

bash run.sh

To see the resources:

condor_q -better-analyze

Customization

Use a different model

Edit setup_model.sh:

MODEL_REPO="meta-llama/Llama-2-7b-chat-hf"  # Change this
MODEL_NAME="Llama-7B"                        # Change this

Then re-run:

bash setup_model.sh

Update run_gpu_job.sh to point to the new model:

MODEL_TARBALL="/staging/USERNAME/my_models/Llama-7B.tar.gz"
export MODEL_ID="./Llama-7B"

Change prompts

Edit prompts.txt (one prompt per line):

Your first prompt here
Your second prompt here

Request more resources (for larger models)

Edit gpu-lab.sub:

request_memory = 24GB
request_disk   = 50GB
+GPUJobLength  = "medium"  # For jobs > 12 hours

How It Works

Environment: Pre-built conda env (llm.tar.gz) contains PyTorch + Transformers
Model: Pre-downloaded model (Qwen-0.5B.tar.gz) avoids runtime downloads
Staging access: GPU Lab nodes access /staging directly at runtime (not via transfer_input_files)
No containers: Uses vanilla universe with portable conda environment

This approach gives fast job startup (~30 sec) instead of slow pip installs (~5 min).

Resources

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHTC GPU Lab: LLM Inference Example

Demo Video

Why This Project?

Available GPUs

Project Structure

Prerequisites

Quick Start

1. Clone the repository

2. Configure for your username

3. Set up environment (one-time, ~10 min)

4. Download model (one-time, ~2 min)

Files in /staging

5. Submit a job

Customization

Use a different model

Change prompts

Request more resources (for larger models)

How It Works

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
log		log
responses		responses
README.md		README.md
gpu-lab.sub		gpu-lab.sub
infer.py		infer.py
prompts.txt		prompts.txt
run.sh		run.sh
run_gpu_job.sh		run_gpu_job.sh
setup_env.sh		setup_env.sh
setup_model.sh		setup_model.sh

Folders and files

Latest commit

History

Repository files navigation

CHTC GPU Lab: LLM Inference Example

Demo Video

Why This Project?

Available GPUs

Project Structure

Prerequisites

Quick Start

1. Clone the repository

2. Configure for your username

3. Set up environment (one-time, ~10 min)

4. Download model (one-time, ~2 min)

Files in /staging

5. Submit a job

Customization

Use a different model

Change prompts

Request more resources (for larger models)

How It Works

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages