[#9760][fix] Use RequestError for validation errors to prevent engine shutdown #9761

tzulingk · 2025-12-06T06:53:51Z

Background

Currently, validation errors in _check_arguments() raise ValueError, which causes the engine to shut down when caught by the Dynamo handler instead of returning a proper HTTP 400 error response to the client.

Why This Fix is Needed

RequestError is caught by the executor's error handling system and converted to proper error responses
Prevents entire engine shutdown on invalid user input
Maintains consistency with other validation errors (e.g., default_max_tokens) which already use RequestError
Allows graceful error handling in production environments

Testing

Build the image and run the container

./container/build.sh --framework trtllm
./container/run.sh -it --mount-workspace --gpus all --framework trtllm

agg.yaml

Change agg.yaml file
tensor_parallel_size: 1
moe_expert_parallel_size: 1
enable_attention_dp: false
max_num_tokens: 23
max_batch_size: 16
trust_remote_code: true
backend: pytorch
enable_chunked_prefill: false

kv_cache_config:
  free_gpu_memory_fraction: 0.85

Inside the container
Environment variables with defaults
exportMODEL_PATH=${MODEL_PATH:-"Qwen/Qwen3-0.6B"}
exportSERVED_MODEL_NAME=${SERVED_MODEL_NAME:-"Qwen/Qwen3-0.6B"}
exportAGG_ENGINE_ARGS=${AGG_ENGINE_ARGS:-"examples/backends/trtllm/engine_configs/qwen3/agg.yaml"}
run frontend
python3 -m dynamo.frontend --router-mode kv --http-port 8080 >& tzuling_frontend.log &
run worker
python3 -m dynamo.trtllm
  --model-path "$MODEL_PATH"
  --served-model-name "$SERVED_MODEL_NAME"
  --extra-engine-args "$AGG_ENGINE_ARGS"
  --max-seq-len 20
  --publish-events-and-metrics >& tzuling_backend.log &

Send prompt > max_seq_len request
curl -s localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-0.6B",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello! It’s a wonderful day. How are you today? Have you had breakfast?"
    }
  ],
  "stream": false,
  "max_tokens": 300
}'

Now returns

{"message":"The sum of prompt length (26.0), query length (0) should not exceed max_num_tokens (22)","type":"Internal Server Error","code":500}

without crashing the engine

Fixes #9760

Summary by CodeRabbit

Refactor
- Improved error handling consistency by updating exception types in input validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…tError to prevent engine shutdown Signed-off-by: [email protected] <[email protected]>

coderabbitai · 2025-12-06T06:56:08Z

📝 Walkthrough

Walkthrough

Import RequestError from tensorrt_llm.executor.utils and change the exception type raised in _check_arguments from ValueError to RequestError when prompt and query lengths exceed max_num_tokens under the PyTorch backend.

Changes

Cohort / File(s)	Change Summary
Error handling fix `tensorrt_llm/llmapi/llm.py`	Added import of `RequestError` from `tensorrt_llm.executor.utils`; changed exception type from `ValueError` to `RequestError` in `_check_arguments` validation when sum of prompt and query lengths exceeds `max_num_tokens` under PyTorch backend.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Verify that RequestError is the correct exception type for this validation failure
Confirm this is the only occurrence of this particular validation issue requiring a fix
Ensure downstream error handling in the executor pipeline properly catches and formats RequestError as an HTTP 400 response

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description adequately explains the background, rationale, and includes detailed testing steps, though PR title format and some template sections are not fully followed.
Linked Issues check	✅ Passed	The PR directly addresses all objectives from issue #9760: changing ValueError to RequestError in _check_arguments() to prevent engine shutdown and enable proper HTTP error responses.
Out of Scope Changes check	✅ Passed	All changes are narrowly scoped to fixing the validation error handling in _check_arguments() as specified in issue #9760, with no extraneous modifications.
Title check	✅ Passed	The title clearly identifies the main change: using RequestError instead of ValueError for validation errors to prevent engine shutdown.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tensorrt_llm/llmapi/llm.py (1)

655-657: Consider extracting error message if this pattern repeats.

The static analysis tool suggests avoiding long error messages inline (TRY003). While the current implementation is functional and the descriptive message is helpful, if similar validation errors are added in the future, consider extracting common message formatting into a helper function or exception class attribute.

This is a low-priority style suggestion and not blocking.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d2178d and ad73414.

📒 Files selected for processing (1)

tensorrt_llm/llmapi/llm.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

tensorrt_llm/llmapi/llm.py

**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

tensorrt_llm/llmapi/llm.py

🧬 Code graph analysis (1)

tensorrt_llm/llmapi/llm.py (1)

tensorrt_llm/executor/utils.py (2)

RequestError (76-77)

create_mpi_comm_session (48-65)

🪛 Ruff (0.14.7)

tensorrt_llm/llmapi/llm.py

655-657: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

tensorrt_llm/llmapi/llm.py (2)

33-34: LGTM: Import correctly added for RequestError.

The import is properly added to the existing import statement from tensorrt_llm.executor.utils, maintaining the namespace as required by coding guidelines.

655-657: Exception type change correctly prevents engine shutdown for PyTorch backend.

The change from ValueError to RequestError aligns with the PR objectives and ensures validation failures return proper HTTP error responses instead of causing engine shutdown. The error message is descriptive and includes relevant values.

Verify whether the TRT backend validation at line 672 and lines 678-709 should also use RequestError for consistency with the PyTorch backend error handling strategy. If ValueError is intentionally retained for the TRT backend due to different error handling patterns, document this design decision.

karljang · 2025-12-06T08:29:29Z

/bot run

tensorrt-cicd · 2025-12-06T08:34:45Z

PR_Github #27182 [ run ] triggered by Bot. Commit: ad73414

tensorrt-cicd · 2025-12-06T10:00:33Z

PR_Github #27182 [ run ] completed with state SUCCESS. Commit: ad73414
/LLM/main/L0_MergeRequest_PR pipeline #20744 completed with status: 'FAILURE'

karljang · 2025-12-06T19:58:31Z

/bot run

tensorrt-cicd · 2025-12-06T20:03:46Z

PR_Github #27189 [ run ] triggered by Bot. Commit: ad73414

tensorrt-cicd · 2025-12-06T20:45:21Z

PR_Github #27189 [ run ] completed with state SUCCESS. Commit: ad73414
/LLM/main/L0_MergeRequest_PR pipeline #20751 completed with status: 'FAILURE'

karljang · 2025-12-07T01:37:36Z

@tzulingk ,
Thanks for the contribution!
Could you also update TestLlmError.test_max_num_token_check accordingly?
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/unittest/llmapi/test_llm_pytorch.py#L800C1-L809C32

karljang · 2025-12-07T01:44:03Z

Additionally, to address the thread leak failure shown in the Jenkins log, could you add some extra guards like the following:

def test_max_num_token_check(self):
    """ LLM should raise error when got prompt length exceed the valid range. """
    with LLM(llama_model_path,
             kv_cache_config=global_kvcache_config,
             max_num_tokens=100) as llm:
        
        with pytest.raises(RequestError,
                           match="should not exceed max_num_tokens"):
            ids = [random.randint(10, 100) for _ in range(101)]
            llm.generate([ids])

or

def test_max_num_token_check(self):
    """ LLM should raise error when got prompt length exceed the valid range. """
    llm = LLM(llama_model_path,
              kv_cache_config=global_kvcache_config,
              max_num_tokens=100)
    
    try:
        with pytest.raises(RequestError,  # ← Changed from ValueError
                           match="should not exceed max_num_tokens"):
            ids = [random.randint(10, 100) for _ in range(101)]
            llm.generate([ids])
    finally:
        llm.shutdown()  # ← Added cleanup

… adding proper cleanup Signed-off-by: [email protected] <[email protected]>

karljang · 2025-12-09T22:21:24Z

/bot run

tensorrt-cicd · 2025-12-09T22:26:58Z

PR_Github #27562 [ run ] triggered by Bot. Commit: e55afdd

tensorrt-cicd · 2025-12-09T22:48:47Z

PR_Github #27562 [ run ] completed with state FAILURE. Commit: e55afdd
LLM/main/L0_MergeRequest_PR #21034 (Blue Ocean) completed with status: ABORTED

karljang · 2025-12-09T23:29:17Z

/bot run

tensorrt-cicd · 2025-12-09T23:35:04Z

PR_Github #27573 [ run ] triggered by Bot. Commit: e55afdd

karljang · 2025-12-11T00:13:19Z

/bot run

tensorrt-cicd · 2025-12-11T00:19:32Z

PR_Github #27758 [ run ] triggered by Bot. Commit: e55afdd

karljang · 2025-12-11T00:24:22Z

@hchings , Could you please review or assign this PR? CI is still failing, but the errors don’t seem to be related to this PR.

tensorrt-cicd · 2025-12-11T11:39:21Z

PR_Github #27758 [ run ] completed with state SUCCESS. Commit: e55afdd
/LLM/main/L0_MergeRequest_PR pipeline #21185 completed with status: 'FAILURE'

karljang · 2025-12-11T22:13:36Z

@tzulingk
Another failed case: https://prod.blsm.nvidia.com/sw-tensorrt-top-1/job/LLM/job/main/job/L0_MergeRequest_PR/21185/testReport/junit/DGX_H100-2_GPUs-PyTorch-Ray-1/test_unittests/test_unittests_v2_unittest_llmapi_test_llm_multi_gpu_pytorch_py__m__gpu2__/

unittest/DGX_H100-2_GPUs-PyTorch-Ray-1/unittest/llmapi/test_llm_multi_gpu_pytorch.py::test_llm_capture_request_error FAILED                                                                      [  9%]

I don't know the history, but it seems like the team intended to handle ValueError and RequestError differently. If that’s the case, it might be more effective to catch ‘ValueError’ from trtllm-serve or another source.

@tzulingk , @hchings , Could you both share your thoughts on this?

…orch backend Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-12-15T02:02:34Z

/bot run

karljang · 2025-12-15T06:26:50Z

/bot run

tensorrt-cicd · 2025-12-15T06:34:53Z

PR_Github #28336 [ run ] triggered by Bot. Commit: 657bc26

tensorrt-cicd · 2025-12-15T12:12:30Z

PR_Github #28336 [ run ] completed with state SUCCESS. Commit: 657bc26
/LLM/main/L0_MergeRequest_PR pipeline #21677 completed with status: 'FAILURE'

karljang · 2025-12-16T04:47:54Z

/bot run

tensorrt-cicd · 2025-12-16T04:55:31Z

PR_Github #28516 [ run ] triggered by Bot. Commit: 5e8af75

tensorrt-cicd · 2025-12-16T10:07:43Z

PR_Github #28516 [ run ] completed with state SUCCESS. Commit: 5e8af75
/LLM/main/L0_MergeRequest_PR pipeline #21838 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

LinPoly

I think it is reasonable to further distinguish illegal request from other kinds of exceptions. THX for improving.

rmccorm4 · 2025-12-17T19:57:03Z

Hi @LinPoly @karljang is anything else needed to get this PR merged? and which public pypi wheel RC will this change end up in?

pcastonguay · 2025-12-17T20:37:52Z

/bot run --disable-fail-fast

pcastonguay · 2025-12-17T20:59:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-17T21:05:40Z

PR_Github #28816 [ run ] triggered by Bot. Commit: 5e8af75

pcastonguay · 2026-01-06T16:35:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-06T16:41:29Z

PR_Github #30763 [ run ] triggered by Bot. Commit: 5e8af75

tensorrt-cicd · 2026-01-06T20:39:04Z

PR_Github #30763 [ run ] completed with state SUCCESS. Commit: 5e8af75
/LLM/main/L0_MergeRequest_PR pipeline #23746 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

LinPoly · 2026-01-07T08:46:33Z

/bot run

tensorrt-cicd · 2026-01-07T08:54:18Z

PR_Github #30878 [ run ] triggered by Bot. Commit: 5e8af75

tensorrt-cicd · 2026-01-07T12:50:02Z

PR_Github #30878 [ run ] completed with state SUCCESS. Commit: 5e8af75
/LLM/main/L0_MergeRequest_PR pipeline #23842 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Fix: Change max_num_tokens validation error from ValueError to Reques…

ad73414

…tError to prevent engine shutdown Signed-off-by: [email protected] <[email protected]>

tzulingk requested a review from a team as a code owner December 6, 2025 06:53

tzulingk requested a review from hchings December 6, 2025 06:53

coderabbitai bot reviewed Dec 6, 2025

View reviewed changes

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Dec 6, 2025

karljang changed the title ~~fix: Use RequestError for validation errors to prevent engine shutdown~~ [#9760][fix] Use RequestError for validation errors to prevent engine shutdown Dec 6, 2025

Fix thread leak in test_max_num_token_check by using RequestError and…

e55afdd

… adding proper cleanup Signed-off-by: [email protected] <[email protected]>

github-actions bot added the waiting for feedback label Dec 11, 2025

Fix tests to expect RequestError for max_num_tokens validation in PyT…

657bc26

…orch backend Signed-off-by: [email protected] <[email protected]>

pcastonguay requested a review from LinPoly December 15, 2025 16:06

Merge branch 'main' into tzulingk/request_error

5e8af75

LinPoly approved these changes Dec 16, 2025

View reviewed changes

pcastonguay approved these changes Dec 17, 2025

View reviewed changes

[#9760][fix] Use RequestError for validation errors to prevent engine shutdown #9761

Are you sure you want to change the base?

[#9760][fix] Use RequestError for validation errors to prevent engine shutdown #9761

Conversation

tzulingk commented Dec 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Why This Fix is Needed

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

karljang commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

karljang commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

karljang commented Dec 7, 2025

Uh oh!

karljang commented Dec 7, 2025

Uh oh!

karljang commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

karljang commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

karljang commented Dec 11, 2025

Uh oh!

tensorrt-cicd commented Dec 11, 2025

Uh oh!

karljang commented Dec 11, 2025

Uh oh!

tensorrt-cicd commented Dec 11, 2025

Uh oh!

karljang commented Dec 11, 2025

Uh oh!

tzulingk commented Dec 15, 2025

Uh oh!

karljang commented Dec 15, 2025

Uh oh!

tensorrt-cicd commented Dec 15, 2025

Uh oh!

tensorrt-cicd commented Dec 15, 2025

Uh oh!

karljang commented Dec 16, 2025

Uh oh!

tensorrt-cicd commented Dec 16, 2025

Uh oh!

tensorrt-cicd commented Dec 16, 2025

Uh oh!

LinPoly left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented Dec 17, 2025

Uh oh!

pcastonguay commented Dec 17, 2025

Uh oh!

pcastonguay commented Dec 17, 2025

Uh oh!

tzulingk commented Dec 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 6, 2025 •

edited

Loading