-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Basically title, across few different versions of llgtrt & TensorRT-LLM (0.17.0 to 0.20.0.rc0) with Llama 3.2 11b Vision Instruct when using text only prompts get the below error:
ERROR [llgtrt::routes::completions] received error message (rest): Encountered an error in forwardAsync function: GenericLlmRequest::getEncoderInputLen - Do not have encoder length! (/home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/cpp/include/tensorrt_llm/batch_manager/llmRequest.h:716)
1 0x7f96cf54dc4a /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6eac4a) [0x7f96cf54dc4a]
2 0x7f96d023fa1b tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1451
3 0x7f96d02cc775 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 437
4 0x7f96d02d84e6 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1206
5 0x7f96b9e6edb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f96b9e6edb4]
6 0x7f96b924eaa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f96b924eaa4]
7 0x7f96b92dbc3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f96b92dbc3c]
Text-vision prompts work as expected, so didn't notice any issue for couple days of testing. According to @JC1DA this should work, hence filing a bug.
sample prompt:
curl --location 'http://127.0.0.1:3000/v1/completions'
--header 'Content-Type: application/json'
--data '{
"prompt": "give me a random fact in detail about math",
"temperature": 1,
"stream": false,
"max_tokens": 200,
"model": "Llama-3.2-11B-Vision-Instruct"
}'
build commands for model:
python3 /opt/TensorRT-LLM-examples/models/core/mllama/convert_checkpoint.py --model_dir /models/Llama-3.2-11B-Vision-Instruct/ --output_dir /models/11-ckpt --dtype bfloat16`
python3 -m tensorrt_llm.commands.build --checkpoint_dir /models/11-ckpt --output_dir /models/11-trt --max_num_tokens 4096 --max_seq_len 2048 --workers 1 --gemm_plugin auto --max_batch_size 128 --max_encoder_input_len 6404 --input_timing_cache model.cache --use_paged_context_fmha enable
Lmk what other info might be needed.