Skip to content

hongtaoh/chtc_llm_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHTC GPU Lab: LLM Inference Example

A minimal example project for running Large Language Model (LLM) inference on CHTC GPU Lab at UW-Madison.

Demo Video

GMT20260116-222906_Recording_1920x1080.mp4

Why This Project?

CHTC's documentation is great, but setting up LLM inference involves several gotchas:

  • GPU Lab nodes cannot directly access /staging via paths in submit files
  • Version conflicts between container PyTorch and pip-installed packages
  • Efficient environment management to avoid slow pip installs every job

This project solves all of that with a clean, working setup.

Available GPUs

Check available GPUs:

condor_status -constraint 'Gpus > 0 && State=="Unclaimed" && CUDACapability >= 9.0' \
  -af Machine Name Gpus CUDAGlobalMemoryMb CUDACapability

Project Structure

.
├── setup_env.sh       # One-time: Create conda environment in /staging
├── setup_model.sh     # One-time: Download model to /staging
├── gpu-lab.sub        # HTCondor submit file
├── run_gpu_job.sh     # Job execution script
├── infer.py           # Python inference code
├── prompts.txt        # Input prompts (one per line)
├── run.sh             # Convenience script to submit jobs
├── log/               # Job logs (created automatically)
└── responses/         # Model outputs (created automatically)

Prerequisites

  • CHTC account with GPU Lab access
  • SSH access to submit node (e.g., ap2001.chtc.wisc.edu)
  • Miniconda installed in your home directory

Quick Start

1. Clone the repository

Below, USERNAME mean yours username. Mine is hhao9.

ssh USERNAME@ap2001.chtc.wisc.edu
cd ~
git clone https://github.com/hongtaoh/chtc_llm_demo.git llm
cd llm

2. Configure for your username

Replace hhao9 with your username in all files:

sed -i 's/hhao9/USERNAME/g' setup_env.sh setup_model.sh gpu-lab.sub run_gpu_job.sh

3. Set up environment (one-time, ~10 min)

This creates a portable conda environment with PyTorch and Transformers:

bash setup_env.sh

This saves llm.tar.gz (~5-6GB) to /staging/USERNAME/.

4. Download model (one-time, ~2 min)

This downloads and packages the Qwen 0.5B model:

bash setup_model.sh

This saves Qwen-0.5B.tar.gz (~1GB) to /staging/USERNAME/my_models/.

Files in /staging

After setup, your staging directory should look like:

/staging/<username>/
├── llm.tar.gz                    # Conda environment (~5-6GB)
└── my_models/
    └── Qwen-0.5B.tar.gz          # Model weights (~1GB)

5. Submit a job

bash run.sh

To see the resources:

condor_q -better-analyze

Customization

Use a different model

Edit setup_model.sh:

MODEL_REPO="meta-llama/Llama-2-7b-chat-hf"  # Change this
MODEL_NAME="Llama-7B"                        # Change this

Then re-run:

bash setup_model.sh

Update run_gpu_job.sh to point to the new model:

MODEL_TARBALL="/staging/USERNAME/my_models/Llama-7B.tar.gz"
export MODEL_ID="./Llama-7B"

Change prompts

Edit prompts.txt (one prompt per line):

Your first prompt here
Your second prompt here

Request more resources (for larger models)

Edit gpu-lab.sub:

request_memory = 24GB
request_disk   = 50GB
+GPUJobLength  = "medium"  # For jobs > 12 hours

How It Works

  1. Environment: Pre-built conda env (llm.tar.gz) contains PyTorch + Transformers
  2. Model: Pre-downloaded model (Qwen-0.5B.tar.gz) avoids runtime downloads
  3. Staging access: GPU Lab nodes access /staging directly at runtime (not via transfer_input_files)
  4. No containers: Uses vanilla universe with portable conda environment

This approach gives fast job startup (~30 sec) instead of slow pip installs (~5 min).

Resources

License

MIT

About

LLM Inference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors