Text-only prompts with Llama 3.2 11B Vision-Instruct gives TensorRT-LLM error

Basically title, across few different versions of llgtrt & TensorRT-LLM (0.17.0 to 0.20.0.rc0) with Llama 3.2 11b Vision Instruct when using text only prompts get the below error:

> ERROR [llgtrt::routes::completions] received error message (rest): Encountered an error in forwardAsync function: GenericLlmRequest::getEncoderInputLen - Do not have encoder length! (/home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/cpp/include/tensorrt_llm/batch_manager/llmRequest.h:716)
1       0x7f96cf54dc4a /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6eac4a) [0x7f96cf54dc4a]
2       0x7f96d023fa1b tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1451
3       0x7f96d02cc775 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 437
4       0x7f96d02d84e6 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1206
5       0x7f96b9e6edb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f96b9e6edb4]
6       0x7f96b924eaa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f96b924eaa4]
7       0x7f96b92dbc3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f96b92dbc3c]

Text-vision prompts work as expected, so didn't notice any issue for couple days of testing. According to @JC1DA this should work, hence filing a bug.

sample prompt:
>curl --location 'http://127.0.0.1:3000/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "give me a random fact in detail about math",
    "temperature": 1,
    "stream": false,
    "max_tokens": 200,
    "model": "Llama-3.2-11B-Vision-Instruct"
}' 

build commands for model:
python3 /opt/TensorRT-LLM-examples/models/core/mllama/convert_checkpoint.py --model_dir  /models/Llama-3.2-11B-Vision-Instruct/ --output_dir /models/11-ckpt  --dtype bfloat16`

`python3 -m tensorrt_llm.commands.build --checkpoint_dir /models/11-ckpt --output_dir /models/11-trt --max_num_tokens 4096 --max_seq_len 2048 --workers 1 --gemm_plugin auto --max_batch_size 128 --max_encoder_input_len 6404 --input_timing_cache model.cache --use_paged_context_fmha enable`

Lmk what other info might be needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text-only prompts with Llama 3.2 11B Vision-Instruct gives TensorRT-LLM error #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Text-only prompts with Llama 3.2 11B Vision-Instruct gives TensorRT-LLM error #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions