Dockerfile: use base instead of cuda-runtime as base for server-release (#50)

dtrifiro · web-flow · commit 771d023b3277 · 2024-03-13T16:59:35.000-06:00
The cuda-runtime target is ~2.3 GB, but duplicates libraries that are
installed in the python virtualenv when installing `torch` and the
various `nvidia-` modules.

By using base, we shave off ~2.3 GB off the final image.

Signed-off-by: Daniele Trifirò &lt;dtrifiro@redhat.com&gt;
diff --git a/Dockerfile b/Dockerfile
@@ -323,7 +323,7 @@ RUN cp server/transformers_patch/modeling_codegen.py ${SITE_PACKAGES}/transforme
 
 
 ## Final Inference Server image ################################################
-FROM cuda-runtime as server-release
+FROM base as server-release
 ARG PYTHON_VERSION
 ARG SITE_PACKAGES=/opt/tgis/lib/python${PYTHON_VERSION}/site-packages