-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Environment
• Docker Image: nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
• TensorRT-LLM Version: 0.14.0
• Run Command:
python3 ../run.py
--input_text "你好,请问你叫什么?"
--max_output_len=50
--tokenizer_dir /data/models/Qwen1.5-7B-Chat/
--engine_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu/
•Example Code: examples/qwen/run.py (from README)
Description
While running the run.py script as described in the README of the examples/qwen/ directory, the following error occurs when invoking runner.generate:
Error Traceback
Traceback (most recent call last):
File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 887, in
main(args)
File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 711, in main
outputs = runner.generate(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 624, in generate
requests = [
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 625, in
trtllm.Request(
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. tensorrt_llm.bindings.executor.Request(input_token_ids: list[int], *, max_tokens: Optional[int] = None, max_new_tokens: Optional[int] = None, streaming: bool = False, sampling_config: tensorrt_llm.bindings.executor.SamplingConfig = SamplingConfig(), output_config: tensorrt_llm.bindings.executor.OutputConfig = OutputConfig(), end_id: Optional[int] = None, pad_id: Optional[int] = None, position_ids: Optional[list[int]] = None, bad_words: Optional[list[list[int]]] = None, stop_words: Optional[list[list[int]]] = None, embedding_bias: Optional[torch.Tensor] = None, external_draft_tokens_config: Optional[tensorrt_llm.bindings.executor.ExternalDraftTokensConfig] = None, prompt_tuning_config: Optional[tensorrt_llm.bindings.executor.PromptTuningConfig = None, lora_config: Optional[tensorrt_llm.bindings.executor.LoraConfig] = None, lookahead_config: Optional[tensorrt_llm.bindings.executor.LookaheadDecodingConfig] = None, logits_post_processor_name: Optional[str] = None, encoder_input_token_ids: Optional[list[int]] = None, client_id: Optional[int] = None, return_all_generated_tokens: bool = False, priority: float = 0.5, type: tensorrt_llm.bindings.executor.RequestType = RequestType.REQUEST_TYPE_CONTEXT_AND_GENERATION, context_phase_params: Optional[tensorrt_llm.bindings.executor.ContextPhaseParams] = None, encoder_input_features: Optional[torch.Tensor] = None, encoder_output_length: Optional[int] = None, num_return_sequences: int = 1)
Invoked with: kwargs:
input_token_ids=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 108386, 37945, 56007, 56568, 99882, 99245, 11319, 151645, 198, 151644, 77091, 198],
encoder_input_token_ids=None,
encoder_output_length=None,
encoder_input_features=None,
position_ids=None,
max_tokens=50,
num_return_sequences=None,
pad_id=151643,
end_id=151645,
stop_words=None,
bad_words=None,
sampling_config=<tensorrt_llm.bindings.executor.SamplingConfig object at 0x7f000502f830>,
lookahead_config=None,
streaming=False,
output_config=<tensorrt_llm.bindings.executor.OutputConfig object at 0x7f0001cca270>,
prompt_tuning_config=None,
lora_config=None,
return_all_generated_tokens=False,
logits_post_processor_name=None,
external_draft_tokens_config=None
Additional Context
The engine and tokenizer paths are configured as follows:
• --tokenizer_dir: /data/models/Qwen1.5-7B-Chat/
• --engine_dir: ./tmp/qwen/7B/trt_engines/fp16/1-gpu/
The engine appears to load successfully, as indicated by the log output:
[TensorRT-LLM][INFO] Engine version 0.14.0 found in the config file, assuming engine(s) built by new builder API.
...
[11/18/2024-02:33:18] [TRT-LLM] [I] Load engine takes: 12.188158512115479 sec
However, the error seems to indicate a problem with the argument types for the tensorrt_llm.bindings.executor.Request class, particularly with sampling_config and output_config.
If more logs or information are needed, please let me know! Thank you!