Skip to content

Commit 0cc4a2e

Browse files
🔥 Remove our exllama code because we use auto-gptq vendored kernels (IBM#59)
We recently found that AutoGPTQ vendors its own versions of exllama and exllamav2 kernels in [augotgptq_extension](https://github.com/AutoGPTQ/AutoGPTQ/tree/main/autogptq_extension) that are installed with the library. Since we install AutoGPTQ after we installed our own builds of the exllama kernels, the AutoGPTQ ones overwrite our copies. So it turns out that we don't need to vendor and compile our own exllama kernels. Signed-off-by: Travis Johnson <[email protected]>
1 parent 845a7e9 commit 0cc4a2e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+0
-7784
lines changed

‎Dockerfile

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -253,24 +253,6 @@ COPY server/custom_kernels/ /usr/src/.
253253
RUN cd /usr/src && python setup.py build_ext && python setup.py install
254254

255255

256-
## Build transformers exllama kernels ##########################################
257-
FROM python-builder as exllama-kernels-builder
258-
259-
WORKDIR /usr/src
260-
261-
COPY server/exllama_kernels/ .
262-
RUN python setup.py build
263-
264-
265-
## Build transformers exllamav2 kernels ########################################
266-
FROM python-builder as exllamav2-kernels-builder
267-
268-
WORKDIR /usr/src
269-
270-
COPY server/exllamav2_kernels/ .
271-
RUN python setup.py build
272-
273-
274256
## Flash attention v2 cached build image #######################################
275257
FROM base as flash-att-v2-cache
276258

@@ -301,12 +283,6 @@ ENV PATH=/opt/tgis/bin:$PATH
301283
RUN --mount=type=bind,from=flash-att-v2-cache,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \
302284
pip install /usr/src/flash-attention-v2/*.whl --no-cache-dir
303285

304-
# Copy build artifacts from exllama kernels builder
305-
COPY --from=exllama-kernels-builder /usr/src/build/lib.linux-x86_64-cpython-* ${SITE_PACKAGES}
306-
307-
# Copy build artifacts from exllamav2 kernels builder
308-
COPY --from=exllamav2-kernels-builder /usr/src/build/lib.linux-x86_64-cpython-* ${SITE_PACKAGES}
309-
310286
# Copy over the auto-gptq wheel and install it
311287
RUN --mount=type=bind,from=auto-gptq-cache,src=/usr/src/auto-gptq-wheel,target=/usr/src/auto-gptq-wheel \
312288
pip install /usr/src/auto-gptq-wheel/*.whl --no-cache-dir

‎server/exllama_kernels/exllama_kernels/cuda_buffers.cu

Lines changed: 0 additions & 71 deletions
This file was deleted.

‎server/exllama_kernels/exllama_kernels/cuda_buffers.cuh

Lines changed: 0 additions & 52 deletions
This file was deleted.

‎server/exllama_kernels/exllama_kernels/cuda_compat.cuh

Lines changed: 0 additions & 58 deletions
This file was deleted.

‎server/exllama_kernels/exllama_kernels/cuda_func/column_remap.cu

Lines changed: 0 additions & 61 deletions
This file was deleted.

‎server/exllama_kernels/exllama_kernels/cuda_func/column_remap.cuh

Lines changed: 0 additions & 19 deletions
This file was deleted.

0 commit comments

Comments
 (0)