CUDNN lib path fix in tensorrt docker #306

i-riyad · 2025-09-09T06:33:16Z

What does this PR do?

Type of change: Bug fix

Overview: CuDNN lib path is not set in docker. Hence cuda and trt EP was not working

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved GPU library path handling in the ONNX PTQ Docker image to reduce runtime linking errors and enhance container stability.
Chores
- Simplified Docker image setup by using fixed environment paths and streamlining package installation for more predictable dependency resolution and consistent tool availability.

coderabbitai · 2025-09-09T06:34:09Z

Walkthrough

Replace dynamic Python-based discovery of TensorRT/CuDNN paths in the Dockerfile with static environment variables: TRT_PATH set to /usr/local/lib/python3.12/dist-packages/tensorrt and CUDNN_LIB_DIR set to /usr/local/lib/python3.12/dist-packages/nvidia/cudnn/lib. LD_LIBRARY_PATH and PATH are set via ENV; RUN installs tensorrt==10.13.2.6 and removes Python introspection.

Changes

Cohort / File(s)	Summary of changes
Docker env path updates & RUN simplification `examples/onnx_ptq/docker/Dockerfile`	Replace runtime Python-based discovery and shell exports with static `ENV` declarations: set `TRT_PATH=/usr/local/lib/python3.12/dist-packages/tensorrt`, add `CUDNN_LIB_DIR=/usr/local/lib/python3.12/dist-packages/nvidia/cudnn/lib`, rebuild `LD_LIBRARY_PATH` to prepend `CUDNN_LIB_DIR` and `TRT_PATH/lib`, update `PATH` to include `TRT_PATH/bin`; simplify `RUN` to install `tensorrt==10.13.2.6` and remove Python introspection and export steps.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble bytes where libraries hide,
Static paths tuck CuDNN by my side.
TRT found, no longer lost in night,
Docker carrots lined up just right.
Hop on, code—my build runs light. 🐇

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0466bf0 and c4474ec.

📒 Files selected for processing (1)

examples/onnx_ptq/docker/Dockerfile (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (2)

examples/onnx_ptq/docker/Dockerfile (2)

19-19: Ensure trtexec is in PATH or symlink it
NGC images usually install trtexec at /usr/src/tensorrt/bin, so ENV PATH="${TRT_PATH}/bin:${PATH}" may not include it—manually verify in your built container (e.g. docker run --rm trt-ptq:dev which trtexec) and either symlink the binary into /usr/local/bin (as in the prior diff) or conditionally export the actual directory.

15-15: Remove manual TensorRT pip install to avoid version skew
The base image nvcr.io/nvidia/tensorrt:25.08-py3 already bundles matching TensorRT libraries and Python wheels. Manually running pip install tensorrt==10.13.2.6 risks desynchronizing system libs and the wheel, leading to runtime loader errors. Remove this line or align the wheel version exactly with what’s shipped in the image. Unable to validate versions in this environment—please verify manually.

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely states the main change by specifying that the PR fixes the CuDNN library path in the TensorRT Docker setup, which accurately reflects the primary update to the Dockerfile.
Description Check	✅ Passed	The PR description succinctly identifies the change as a bug fix, explains the issue with the missing CuDNN library path, and directly relates to the modifications in the Dockerfile.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rislam/cudnn-path-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

examples/onnx_ptq/docker/Dockerfile (1)
17-17: Use sysconfig over site.getsitepackages for robustness.

site.getsitepackages() can be absent/variant across venvs and distros. sysconfig.get_paths()['purelib'] is more reliable.
-export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib
+CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6fa34c and a1842ad.

📒 Files selected for processing (1)

examples/onnx_ptq/docker/Dockerfile (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (1)

examples/onnx_ptq/docker/Dockerfile (1)

18-18: Double-check library preference order to avoid version skew.

Prepending cuDNN ahead of TensorRT libs changes resolution precedence. Ensure this matches the cuDNN/TRT versions in 25.08 to avoid ABI mismatches at runtime.

Would you confirm that the resolved libcudnn*.so at runtime matches the expected version from the base image by running ldd on trtexec inside the built image and sharing the libcudnn paths?

coderabbitai · 2025-09-09T06:39:08Z

examples/onnx_ptq/docker/Dockerfile

+    export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \
+    export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \
+    export PATH="${TRT_PATH}/bin:${PATH}"


🛠️ Refactor suggestion

⚠️ Potential issue

Exports in RUN don’t persist; fix by registering lib dirs with ldconfig (and drop /usr/include).

The three exports only affect this build layer and won’t be present at runtime or later layers. Also, /usr/include does not belong in LD_LIBRARY_PATH. Persist by adding the library dirs to ld.so.conf.d and running ldconfig; optionally symlink trtexec to be on PATH.

Apply:

- export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \ - export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \ - export PATH="${TRT_PATH}/bin:${PATH}" + CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))") && \ + echo "${TRT_PATH}/lib" > /etc/ld.so.conf.d/tensorrt.conf && \ + if [ -d "${CUDNN_LIB_DIR}" ]; then echo "${CUDNN_LIB_DIR}" > /etc/ld.so.conf.d/cudnn.conf; fi && \ + ldconfig && \ + ln -sfn "${TRT_PATH}/bin/trtexec" /usr/local/bin/trtexec

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \

export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \

export PATH="${TRT_PATH}/bin:${PATH}"

# Register TensorRT and cuDNN libs with ldconfig instead of ephemeral exports

CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))") && \

echo "${TRT_PATH}/lib" > /etc/ld.so.conf.d/tensorrt.conf && \

if [ -d "${CUDNN_LIB_DIR}" ]; then echo "${CUDNN_LIB_DIR}" > /etc/ld.so.conf.d/cudnn.conf; fi && \

ldconfig && \

ln -sfn "${TRT_PATH}/bin/trtexec" /usr/local/bin/trtexec

🤖 Prompt for AI Agents

In examples/onnx_ptq/docker/Dockerfile around lines 17 to 19, the RUN-layer exports (CUDNN_LIB_DIR, LD_LIBRARY_PATH, PATH) do not persist to later layers or runtime and /usr/include should not be in LD_LIBRARY_PATH; instead create a file in /etc/ld.so.conf.d (e.g., /etc/ld.so.conf.d/trt_cudnn.conf) containing the resolved CUDNN lib directory and ${TRT_PATH}/lib, run ldconfig to register them, remove /usr/include from the LD path commands, and if a persistent PATH change is desired create a symlink for trtexec into /usr/local/bin (or add an ENV PATH entry) so the runtime can find the binary.

coderabbitai · 2025-09-09T06:39:08Z

examples/onnx_ptq/docker/Dockerfile

-    export LD_LIBRARY_PATH="$TRT_PATH/lib:/usr/include:${LD_LIBRARY_PATH}" && \
-    export PATH="$TRT_PATH/bin:${PATH}"
+    export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \
+    export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \


⚠️ Potential issue

Remove /usr/include from LD_LIBRARY_PATH.

The dynamic linker doesn’t search header directories. Keeping it here is incorrect and can mask configuration issues.

🤖 Prompt for AI Agents

In examples/onnx_ptq/docker/Dockerfile around line 18, remove "/usr/include" from the LD_LIBRARY_PATH assignment because header directories should not be in the dynamic linker search path; update the assignment to only include the actual library directories (CUDNN_LIB_DIR and TRT_PATH/lib and any existing LD_LIBRARY_PATH), ensuring you don't introduce empty path entries or duplicate colons when expanding LD_LIBRARY_PATH.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/onnx_ptq/docker/Dockerfile (1)
18-18: Remove /usr/include from LD_LIBRARY_PATH.

Header directories are not searched by the dynamic linker and keeping them in LD_LIBRARY_PATH is incorrect.
-ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}"
+ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:${LD_LIBRARY_PATH}"
If you adopt ldconfig per my other comment, you can drop this ENV entirely.

🧹 Nitpick comments (2)

examples/onnx_ptq/docker/Dockerfile (2)
15-15: Avoid reinstalling TensorRT inside a TensorRT base image; verify version alignment.

The base image nvcr.io/nvidia/tensorrt:25.08-py3 typically ships with a matched TRT stack. Layering pip-installed tensorrt==10.13.2.6 can cause ABI/version skew with preinstalled CUDA/cuDNN/TRT libs.

Please confirm what trtexec --version and Python import tensorrt; print(tensorrt.__version__) report in a container built from this Dockerfile. If they differ, drop the pip install and rely on the image’s TRT, or upgrade the base image to the matching tag.

If you keep the pip install, prefer avoiding cache bloat:
-RUN pip install tensorrt==10.13.2.6
+RUN pip install --no-cache-dir tensorrt==10.13.2.6
19-19: Verify trtexec location; the pip wheel may not provide bin/trtexec.

${TRT_PATH}/bin might not exist in the Python wheel. In NVIDIA TRT containers, trtexec usually lives under /usr/src/tensorrt/bin. Prefer a symlink to /usr/local/bin to make it discoverable instead of PATH editing.

Apply (already included in the earlier diff):
-ENV PATH="${TRT_PATH}/bin:${PATH}"
+RUN ln -sf /usr/src/tensorrt/bin/trtexec /usr/local/bin/trtexec
Please confirm which trtexec succeeds in a built image.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1842ad and 0466bf0.

📒 Files selected for processing (1)

examples/onnx_ptq/docker/Dockerfile (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: build-docs
GitHub Check: code-quality

coderabbitai · 2025-09-09T06:52:04Z

examples/onnx_ptq/docker/Dockerfile

+ENV TRT_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt
+ENV CUDNN_LIB_DIR=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib


🛠️ Refactor suggestion

Hardcoding site-packages paths is brittle; resolve at build-time and register with ldconfig.

The paths will break if Python minor changes (e.g., 3.11) or if the wheel layout shifts. Prefer computing locations via Python and adding them to ld.so.conf.d, then run ldconfig, so runtime doesn’t depend on LD_LIBRARY_PATH.

Apply:

-RUN pip install tensorrt==10.13.2.6 -ENV TRT_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt -ENV CUDNN_LIB_DIR=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib -ENV LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" -ENV PATH="${TRT_PATH}/bin:${PATH}" +RUN pip install --no-cache-dir tensorrt==10.13.2.6 && \ + python - <<'PY' +import os, sysconfig, pathlib +pure = sysconfig.get_paths()['purelib'] +trt = pathlib.Path(pure) / 'tensorrt' +cudnn = pathlib.Path(pure) / 'nvidia' / 'cudnn' / 'lib' +paths = [str(trt / 'lib')] +if cudnn.is_dir(): + paths.append(str(cudnn)) +open('/etc/ld.so.conf.d/tensorrt_cudnn.conf','w').write('\n'.join(paths)+'\n') +PY + && ldconfig && \ + ln -sf /usr/src/tensorrt/bin/trtexec /usr/local/bin/trtexec +# No LD_LIBRARY_PATH or PATH mutations needed

This is resilient to Python version changes and avoids fragile ENV-based linker config.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/onnx_ptq/docker/Dockerfile around lines 16-17, avoid hardcoding Python site-packages paths; instead run a small Python command at build time to resolve the actual TensorRT and cuDNN library directories, write those resolved library paths into a file under /etc/ld.so.conf.d/, and run ldconfig so the dynamic linker finds them at runtime; remove the brittle ENV-assigned site-packages paths and, if necessary, set only logical ENV names (e.g., TRT_PKG or CUDNN_PKG) while relying on the ldconfig-registered library locations for linking.

Signed-off-by: Riyad Islam <[email protected]>

codecov · 2025-09-09T07:12:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.87%. Comparing base (d6d2e75) to head (d0be96b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #306   +/-   ##
=======================================
  Coverage   73.86%   73.87%           
=======================================
  Files         172      172           
  Lines       17415    17415           
=======================================
+ Hits        12864    12865    +1     
+ Misses       4551     4550    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Riyad Islam <[email protected]> Signed-off-by: Jingyu Xin <[email protected]>

Signed-off-by: Riyad Islam <[email protected]>

i-riyad requested a review from a team as a code owner September 9, 2025 06:33

i-riyad requested a review from ajrasane September 9, 2025 06:33

i-riyad force-pushed the rislam/cudnn-path-fix branch from e117c8b to a1842ad Compare September 9, 2025 06:33

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

i-riyad force-pushed the rislam/cudnn-path-fix branch from a1842ad to 0466bf0 Compare September 9, 2025 06:47

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

CUDNN lib path fix in tensorrt docker

c4474ec

Signed-off-by: Riyad Islam <[email protected]>

i-riyad force-pushed the rislam/cudnn-path-fix branch from 0466bf0 to c4474ec Compare September 9, 2025 06:58

cjluo-nv approved these changes Sep 9, 2025

View reviewed changes

ajrasane approved these changes Sep 9, 2025

View reviewed changes

i-riyad enabled auto-merge (squash) September 9, 2025 22:50

Merge branch 'main' into rislam/cudnn-path-fix

d0be96b

i-riyad merged commit 669ae05 into main Sep 9, 2025
22 checks passed

i-riyad deleted the rislam/cudnn-path-fix branch September 9, 2025 23:18

jingyu-ml pushed a commit that referenced this pull request Sep 10, 2025

CUDNN lib path fix in tensorrt docker (#306)

94e85b9

Signed-off-by: Riyad Islam <[email protected]> Signed-off-by: Jingyu Xin <[email protected]>

benchislett pushed a commit that referenced this pull request Sep 15, 2025

CUDNN lib path fix in tensorrt docker (#306)

2d6797d

Signed-off-by: Riyad Islam <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDNN lib path fix in tensorrt docker #306

CUDNN lib path fix in tensorrt docker #306

Uh oh!

i-riyad commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

codecov bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    export CUDNN_LIB_DIR=$(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib && \
-    export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${TRT_PATH}/lib:/usr/include:${LD_LIBRARY_PATH}" && \
-    export PATH="${TRT_PATH}/bin:${PATH}"
+    # Register TensorRT and cuDNN libs with ldconfig instead of ephemeral exports
+    CUDNN_LIB_DIR=$(python -c "import sysconfig, os; print(os.path.join(sysconfig.get_paths()['purelib'], 'nvidia', 'cudnn', 'lib'))") && \
+    echo "${TRT_PATH}/lib" > /etc/ld.so.conf.d/tensorrt.conf && \
+    if [ -d "${CUDNN_LIB_DIR}" ]; then echo "${CUDNN_LIB_DIR}" > /etc/ld.so.conf.d/cudnn.conf; fi && \
+    ldconfig && \
+    ln -sfn "${TRT_PATH}/bin/trtexec" /usr/local/bin/trtexec

		ENV TRT_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt
		ENV CUDNN_LIB_DIR=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib

CUDNN lib path fix in tensorrt docker #306

CUDNN lib path fix in tensorrt docker #306

Uh oh!

Conversation

i-riyad commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks (3 passed)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

i-riyad commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

codecov bot commented Sep 9, 2025 •

edited

Loading