-
Notifications
You must be signed in to change notification settings - Fork 2k
[#9760][fix] Use RequestError for validation errors to prevent engine shutdown #9761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tError to prevent engine shutdown Signed-off-by: [email protected] <[email protected]>
📝 WalkthroughWalkthroughImport Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/llmapi/llm.py (1)
655-657: Consider extracting error message if this pattern repeats.The static analysis tool suggests avoiding long error messages inline (TRY003). While the current implementation is functional and the descriptive message is helpful, if similar validation errors are added in the future, consider extracting common message formatting into a helper function or exception class attribute.
This is a low-priority style suggestion and not blocking.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/llmapi/llm.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., usefrom package.subpackage import fooand thenfoo.SomeClass()instead offrom package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g.,some_file.py)
Python class names should use PascalCase (e.g.,class SomeClass)
Python function and method names should use snake_case (e.g.,def my_awesome_function():)
Python local variable names should use snake_case, with prefixkfor variable names that start with a number (e.g.,k_99th_percentile = ...)
Python global variables should use upper snake_case with prefixG(e.g.,G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g.,MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g.,self.x = 5followed by"""<type>: Description of 'x'""")
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic
Files:
tensorrt_llm/llmapi/llm.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top
Files:
tensorrt_llm/llmapi/llm.py
🧬 Code graph analysis (1)
tensorrt_llm/llmapi/llm.py (1)
tensorrt_llm/executor/utils.py (2)
RequestError(76-77)create_mpi_comm_session(48-65)
🪛 Ruff (0.14.7)
tensorrt_llm/llmapi/llm.py
655-657: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
tensorrt_llm/llmapi/llm.py (2)
33-34: LGTM: Import correctly added for RequestError.The import is properly added to the existing import statement from
tensorrt_llm.executor.utils, maintaining the namespace as required by coding guidelines.
655-657: Exception type change correctly prevents engine shutdown for PyTorch backend.The change from
ValueErrortoRequestErroraligns with the PR objectives and ensures validation failures return proper HTTP error responses instead of causing engine shutdown. The error message is descriptive and includes relevant values.Verify whether the TRT backend validation at line 672 and lines 678-709 should also use
RequestErrorfor consistency with the PyTorch backend error handling strategy. If ValueError is intentionally retained for the TRT backend due to different error handling patterns, document this design decision.
|
/bot run |
|
PR_Github #27182 [ run ] triggered by Bot. Commit: |
|
PR_Github #27182 [ run ] completed with state |
|
/bot run |
|
PR_Github #27189 [ run ] triggered by Bot. Commit: |
|
PR_Github #27189 [ run ] completed with state |
|
@tzulingk , |
|
Additionally, to address the thread leak failure shown in the Jenkins log, could you add some extra guards like the following: def test_max_num_token_check(self):
""" LLM should raise error when got prompt length exceed the valid range. """
with LLM(llama_model_path,
kv_cache_config=global_kvcache_config,
max_num_tokens=100) as llm:
with pytest.raises(RequestError,
match="should not exceed max_num_tokens"):
ids = [random.randint(10, 100) for _ in range(101)]
llm.generate([ids])or def test_max_num_token_check(self):
""" LLM should raise error when got prompt length exceed the valid range. """
llm = LLM(llama_model_path,
kv_cache_config=global_kvcache_config,
max_num_tokens=100)
try:
with pytest.raises(RequestError, # ← Changed from ValueError
match="should not exceed max_num_tokens"):
ids = [random.randint(10, 100) for _ in range(101)]
llm.generate([ids])
finally:
llm.shutdown() # ← Added cleanup |
… adding proper cleanup Signed-off-by: [email protected] <[email protected]>
|
/bot run |
|
PR_Github #27562 [ run ] triggered by Bot. Commit: |
|
PR_Github #27562 [ run ] completed with state |
|
/bot run |
|
PR_Github #27573 [ run ] triggered by Bot. Commit: |
|
/bot run |
|
PR_Github #27758 [ run ] triggered by Bot. Commit: |
|
@hchings , Could you please review or assign this PR? CI is still failing, but the errors don’t seem to be related to this PR. |
|
PR_Github #27758 [ run ] completed with state |
|
@tzulingk I don't know the history, but it seems like the team intended to handle ValueError and RequestError differently. If that’s the case, it might be more effective to catch ‘ValueError’ from trtllm-serve or another source. @tzulingk , @hchings , Could you both share your thoughts on this? |
…orch backend Signed-off-by: [email protected] <[email protected]>
|
/bot run |
1 similar comment
|
/bot run |
|
PR_Github #28336 [ run ] triggered by Bot. Commit: |
|
PR_Github #28336 [ run ] completed with state |
|
/bot run |
|
PR_Github #28516 [ run ] triggered by Bot. Commit: |
|
PR_Github #28516 [ run ] completed with state
|
LinPoly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to further distinguish illegal request from other kinds of exceptions. THX for improving.
|
/bot run --disable-fail-fast |
1 similar comment
|
/bot run --disable-fail-fast |
|
PR_Github #28816 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #30763 [ run ] triggered by Bot. Commit: |
|
PR_Github #30763 [ run ] completed with state
|
|
/bot run |
|
PR_Github #30878 [ run ] triggered by Bot. Commit: |
|
PR_Github #30878 [ run ] completed with state
|
Background
Currently, validation errors in
_check_arguments()raiseValueError, which causes the engine to shut down when caught by the Dynamo handler instead of returning a proper HTTP 400 error response to the client.Why This Fix is Needed
RequestErroris caught by the executor's error handling system and converted to proper error responsesdefault_max_tokens) which already useRequestErrorTesting
Now returns
without crashing the engine
Fixes #9760
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.