Skip to content

Commit 16fc318

Browse files
committed
OpenVINO integration for CausalLM models
Signed-off-by: Helena <[email protected]>
1 parent b5f534a commit 16fc318

File tree

3 files changed

+1218
-1174
lines changed

3 files changed

+1218
-1174
lines changed

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -160,9 +160,8 @@ COPY server/Makefile server/Makefile
160160
# Install server
161161
COPY proto proto
162162
COPY server server
163-
RUN cd server && \
164-
make gen-server && \
165-
pip install ".[accelerate]" --no-cache-dir
163+
# RUN --mount=type=cache,target=/root/.cache/pip cd server && make gen-server && pip install ".[accelerate, openvino]"
164+
RUN cd server && make gen-server && pip install ".[accelerate, openvino]" --no-cache-dir
166165

167166
# Patch codegen model changes into transformers 4.35
168167
RUN cp server/transformers_patch/modeling_codegen.py ${SITE_PACKAGES}/transformers/models/codegen/modeling_codegen.py
@@ -311,7 +310,8 @@ RUN --mount=type=bind,from=auto-gptq-cache,src=/usr/src/auto-gptq-wheel,target=/
311310
# Install server
312311
COPY proto proto
313312
COPY server server
314-
RUN cd server && make gen-server && pip install ".[accelerate, onnx-gpu, quantize]" --no-cache-dir
313+
# RUN --mount=type=cache,target=/root/.cache/pip cd server && make gen-server && pip install ".[accelerate, openvino]"
314+
RUN cd server && make gen-server && pip install ".[accelerate, onnx-gpu, openvino, quantize]" --no-cache-dir
315315

316316
# Patch codegen model changes into transformers 4.35
317317
RUN cp server/transformers_patch/modeling_codegen.py ${SITE_PACKAGES}/transformers/models/codegen/modeling_codegen.py

0 commit comments

Comments
 (0)