Skip to content

Error with runner.generate in TensorRT-LLM 0.14.0 for Qwen Example #2452

@tedqu

Description

@tedqu

Environment

•	Docker Image: nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
•	TensorRT-LLM Version: 0.14.0
•	Run Command:

python3 ../run.py
--input_text "你好,请问你叫什么?"
--max_output_len=50
--tokenizer_dir /data/models/Qwen1.5-7B-Chat/
--engine_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu/

•Example Code: examples/qwen/run.py (from README)

Description

While running the run.py script as described in the README of the examples/qwen/ directory, the following error occurs when invoking runner.generate:

Error Traceback

Traceback (most recent call last):
File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 887, in
main(args)
File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 711, in main
outputs = runner.generate(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 624, in generate
requests = [
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 625, in
trtllm.Request(
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. tensorrt_llm.bindings.executor.Request(input_token_ids: list[int], *, max_tokens: Optional[int] = None, max_new_tokens: Optional[int] = None, streaming: bool = False, sampling_config: tensorrt_llm.bindings.executor.SamplingConfig = SamplingConfig(), output_config: tensorrt_llm.bindings.executor.OutputConfig = OutputConfig(), end_id: Optional[int] = None, pad_id: Optional[int] = None, position_ids: Optional[list[int]] = None, bad_words: Optional[list[list[int]]] = None, stop_words: Optional[list[list[int]]] = None, embedding_bias: Optional[torch.Tensor] = None, external_draft_tokens_config: Optional[tensorrt_llm.bindings.executor.ExternalDraftTokensConfig] = None, prompt_tuning_config: Optional[tensorrt_llm.bindings.executor.PromptTuningConfig = None, lora_config: Optional[tensorrt_llm.bindings.executor.LoraConfig] = None, lookahead_config: Optional[tensorrt_llm.bindings.executor.LookaheadDecodingConfig] = None, logits_post_processor_name: Optional[str] = None, encoder_input_token_ids: Optional[list[int]] = None, client_id: Optional[int] = None, return_all_generated_tokens: bool = False, priority: float = 0.5, type: tensorrt_llm.bindings.executor.RequestType = RequestType.REQUEST_TYPE_CONTEXT_AND_GENERATION, context_phase_params: Optional[tensorrt_llm.bindings.executor.ContextPhaseParams] = None, encoder_input_features: Optional[torch.Tensor] = None, encoder_output_length: Optional[int] = None, num_return_sequences: int = 1)

Invoked with: kwargs:
input_token_ids=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 108386, 37945, 56007, 56568, 99882, 99245, 11319, 151645, 198, 151644, 77091, 198],
encoder_input_token_ids=None,
encoder_output_length=None,
encoder_input_features=None,
position_ids=None,
max_tokens=50,
num_return_sequences=None,
pad_id=151643,
end_id=151645,
stop_words=None,
bad_words=None,
sampling_config=<tensorrt_llm.bindings.executor.SamplingConfig object at 0x7f000502f830>,
lookahead_config=None,
streaming=False,
output_config=<tensorrt_llm.bindings.executor.OutputConfig object at 0x7f0001cca270>,
prompt_tuning_config=None,
lora_config=None,
return_all_generated_tokens=False,
logits_post_processor_name=None,
external_draft_tokens_config=None

Additional Context

The engine and tokenizer paths are configured as follows:
• --tokenizer_dir: /data/models/Qwen1.5-7B-Chat/
• --engine_dir: ./tmp/qwen/7B/trt_engines/fp16/1-gpu/

The engine appears to load successfully, as indicated by the log output:

[TensorRT-LLM][INFO] Engine version 0.14.0 found in the config file, assuming engine(s) built by new builder API.
...
[11/18/2024-02:33:18] [TRT-LLM] [I] Load engine takes: 12.188158512115479 sec

However, the error seems to indicate a problem with the argument types for the tensorrt_llm.bindings.executor.Request class, particularly with sampling_config and output_config.

If more logs or information are needed, please let me know! Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions