Enable CUDA ARCH SM 8.9 for exllama builds

cyang49 · njhill · commit d83995d4846f · 2024-01-12T14:47:16.000-08:00
This PR enables SM 8.9 binary build for exllama kernels to support L40S (Ada).

As for Pytorch, the stock build doesn't include SM 8.9. Pytorch developers claim that (1) CUDA automatically uses SM 8.6 binary when running on SM 8.9 GPUs and (2) CUDA binaries aren't shipped with SM 8.9 binaries. I think we can keep using stock pytorch pre-built package for now.
diff --git a/Dockerfile b/Dockerfile
@@ -220,15 +220,15 @@ FROM python-builder as exllama-kernels-builder
 WORKDIR /usr/src
 
 COPY server/exllama_kernels/ .
-RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build
+RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX;8.9" python setup.py build
 
 ## Build transformers exllamav2 kernels ########################################
 FROM python-builder as exllamav2-kernels-builder
 
 WORKDIR /usr/src
 
 COPY server/exllamav2_kernels/ .
-RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build
+RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX;8.9" python setup.py build
 
 ## Flash attention cached build image ##########################################
 FROM base as flash-att-cache