-
Notifications
You must be signed in to change notification settings - Fork 174
CUDNN lib path fix in tensorrt docker #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e117c8b
to
a1842ad
Compare
WalkthroughReplace dynamic Python-based discovery of TensorRT/CuDNN paths in the Dockerfile with static environment variables: Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
examples/onnx_ptq/docker/Dockerfile (1)
17-17
: Use sysconfig over site.getsitepackages for robustness.site.getsitepackages() can be absent/variant across venvs and distros. sysconfig.get_paths()['purelib'] is more reliable.
-export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib +CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/onnx_ptq/docker/Dockerfile
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (1)
examples/onnx_ptq/docker/Dockerfile (1)
18-18
: Double-check library preference order to avoid version skew.Prepending cuDNN ahead of TensorRT libs changes resolution precedence. Ensure this matches the cuDNN/TRT versions in 25.08 to avoid ABI mismatches at runtime.
Would you confirm that the resolved libcudnn*.so at runtime matches the expected version from the base image by running ldd on trtexec inside the built image and sharing the libcudnn paths?
examples/onnx_ptq/docker/Dockerfile
Outdated
export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \ | ||
export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \ | ||
export PATH="${TRT_PATH}/bin:${PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Exports in RUN don’t persist; fix by registering lib dirs with ldconfig (and drop /usr/include).
The three exports only affect this build layer and won’t be present at runtime or later layers. Also, /usr/include does not belong in LD_LIBRARY_PATH. Persist by adding the library dirs to ld.so.conf.d and running ldconfig; optionally symlink trtexec to be on PATH.
Apply:
- export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \
- export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \
- export PATH="${TRT_PATH}/bin:${PATH}"
+ CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))") && \
+ echo "${TRT_PATH}/lib" > /etc/ld.so.conf.d/tensorrt.conf && \
+ if [ -d "${CUDNN_LIB_DIR}" ]; then echo "${CUDNN_LIB_DIR}" > /etc/ld.so.conf.d/cudnn.conf; fi && \
+ ldconfig && \
+ ln -sfn "${TRT_PATH}/bin/trtexec" /usr/local/bin/trtexec
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \ | |
export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \ | |
export PATH="${TRT_PATH}/bin:${PATH}" | |
# Register TensorRT and cuDNN libs with ldconfig instead of ephemeral exports | |
CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))") && \ | |
echo "${TRT_PATH}/lib" > /etc/ld.so.conf.d/tensorrt.conf && \ | |
if [ -d "${CUDNN_LIB_DIR}" ]; then echo "${CUDNN_LIB_DIR}" > /etc/ld.so.conf.d/cudnn.conf; fi && \ | |
ldconfig && \ | |
ln -sfn "${TRT_PATH}/bin/trtexec" /usr/local/bin/trtexec |
🤖 Prompt for AI Agents
In examples/onnx_ptq/docker/Dockerfile around lines 17 to 19, the RUN-layer
exports (CUDNN_LIB_DIR, LD_LIBRARY_PATH, PATH) do not persist to later layers or
runtime and /usr/include should not be in LD_LIBRARY_PATH; instead create a file
in /etc/ld.so.conf.d (e.g., /etc/ld.so.conf.d/trt_cudnn.conf) containing the
resolved CUDNN lib directory and ${TRT_PATH}/lib, run ldconfig to register them,
remove /usr/include from the LD path commands, and if a persistent PATH change
is desired create a symlink for trtexec into /usr/local/bin (or add an ENV PATH
entry) so the runtime can find the binary.
examples/onnx_ptq/docker/Dockerfile
Outdated
export LD_LIBRARY_PATH="$TRT_PATH/lib:/usr/include:${LD_LIBRARY_PATH}" && \ | ||
export PATH="$TRT_PATH/bin:${PATH}" | ||
export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \ | ||
export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove /usr/include from LD_LIBRARY_PATH.
The dynamic linker doesn’t search header directories. Keeping it here is incorrect and can mask configuration issues.
🤖 Prompt for AI Agents
In examples/onnx_ptq/docker/Dockerfile around line 18, remove "/usr/include"
from the LD_LIBRARY_PATH assignment because header directories should not be in
the dynamic linker search path; update the assignment to only include the actual
library directories (CUDNN_LIB_DIR and TRT_PATH/lib and any existing
LD_LIBRARY_PATH), ensuring you don't introduce empty path entries or duplicate
colons when expanding LD_LIBRARY_PATH.
a1842ad
to
0466bf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/onnx_ptq/docker/Dockerfile (1)
18-18
: Remove /usr/include from LD_LIBRARY_PATH.Header directories are not searched by the dynamic linker and keeping them in LD_LIBRARY_PATH is incorrect.
-ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" +ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:${LD_LIBRARY_PATH}"If you adopt ldconfig per my other comment, you can drop this ENV entirely.
🧹 Nitpick comments (2)
examples/onnx_ptq/docker/Dockerfile (2)
15-15
: Avoid reinstalling TensorRT inside a TensorRT base image; verify version alignment.The base image nvcr.io/nvidia/tensorrt:25.08-py3 typically ships with a matched TRT stack. Layering pip-installed tensorrt==10.13.2.6 can cause ABI/version skew with preinstalled CUDA/cuDNN/TRT libs.
Please confirm what
trtexec --version
and Pythonimport tensorrt; print(tensorrt.__version__)
report in a container built from this Dockerfile. If they differ, drop the pip install and rely on the image’s TRT, or upgrade the base image to the matching tag.If you keep the pip install, prefer avoiding cache bloat:
-RUN pip install tensorrt==10.13.2.6 +RUN pip install --no-cache-dir tensorrt==10.13.2.6
19-19
: Verify trtexec location; the pip wheel may not provide bin/trtexec.
${TRT_PATH}/bin
might not exist in the Python wheel. In NVIDIA TRT containers, trtexec usually lives under/usr/src/tensorrt/bin
. Prefer a symlink to/usr/local/bin
to make it discoverable instead of PATH editing.Apply (already included in the earlier diff):
-ENV PATH="${TRT_PATH}/bin:${PATH}" +RUN ln -sf /usr/src/tensorrt/bin/trtexec /usr/local/bin/trtexecPlease confirm
which trtexec
succeeds in a built image.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/onnx_ptq/docker/Dockerfile
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
examples/onnx_ptq/docker/Dockerfile
Outdated
ENV TRT_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt | ||
ENV CUDNN_LIB_DIR=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hardcoding site-packages paths is brittle; resolve at build-time and register with ldconfig.
The paths will break if Python minor changes (e.g., 3.11) or if the wheel layout shifts. Prefer computing locations via Python and adding them to ld.so.conf.d, then run ldconfig, so runtime doesn’t depend on LD_LIBRARY_PATH.
Apply:
-RUN pip install tensorrt==10.13.2.6
-ENV TRT_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt
-ENV CUDNN_LIB_DIR=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib
-ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}"
-ENV PATH="${TRT_PATH}/bin:${PATH}"
+RUN pip install --no-cache-dir tensorrt==10.13.2.6 && \
+ python - <<'PY'
+import os, sysconfig, pathlib
+pure = sysconfig.get_paths()['purelib']
+trt = pathlib.Path(pure) / 'tensorrt'
+cudnn = pathlib.Path(pure) / 'nvidia' / 'cudnn' / 'lib'
+paths = [str(trt / 'lib')]
+if cudnn.is_dir():
+ paths.append(str(cudnn))
+open('/etc/ld.so.conf.d/tensorrt_cudnn.conf','w').write('\n'.join(paths)+'\n')
+PY
+ && ldconfig && \
+ ln -sf /usr/src/tensorrt/bin/trtexec /usr/local/bin/trtexec
+# No LD_LIBRARY_PATH or PATH mutations needed
This is resilient to Python version changes and avoids fragile ENV-based linker config.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/onnx_ptq/docker/Dockerfile around lines 16-17, avoid hardcoding
Python site-packages paths; instead run a small Python command at build time to
resolve the actual TensorRT and cuDNN library directories, write those resolved
library paths into a file under /etc/ld.so.conf.d/, and run ldconfig so the
dynamic linker finds them at runtime; remove the brittle ENV-assigned
site-packages paths and, if necessary, set only logical ENV names (e.g., TRT_PKG
or CUDNN_PKG) while relying on the ldconfig-registered library locations for
linking.
Signed-off-by: Riyad Islam <[email protected]>
0466bf0
to
c4474ec
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #306 +/- ##
=======================================
Coverage 73.86% 73.87%
=======================================
Files 172 172
Lines 17415 17415
=======================================
+ Hits 12864 12865 +1
+ Misses 4551 4550 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Riyad Islam <[email protected]> Signed-off-by: Jingyu Xin <[email protected]>
Signed-off-by: Riyad Islam <[email protected]>
What does this PR do?
Type of change: Bug fix
Overview: CuDNN lib path is not set in docker. Hence cuda and trt EP was not working
Usage
# Add a code snippet demonstrating how to use this
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Bug Fixes
Chores