Skip to content

Text-only prompts with Llama 3.2 11B Vision-Instruct gives TensorRT-LLM error #32

@natehofmann

Description

@natehofmann

Basically title, across few different versions of llgtrt & TensorRT-LLM (0.17.0 to 0.20.0.rc0) with Llama 3.2 11b Vision Instruct when using text only prompts get the below error:

ERROR [llgtrt::routes::completions] received error message (rest): Encountered an error in forwardAsync function: GenericLlmRequest::getEncoderInputLen - Do not have encoder length! (/home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/cpp/include/tensorrt_llm/batch_manager/llmRequest.h:716)
1 0x7f96cf54dc4a /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6eac4a) [0x7f96cf54dc4a]
2 0x7f96d023fa1b tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1451
3 0x7f96d02cc775 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 437
4 0x7f96d02d84e6 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1206
5 0x7f96b9e6edb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f96b9e6edb4]
6 0x7f96b924eaa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f96b924eaa4]
7 0x7f96b92dbc3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f96b92dbc3c]

Text-vision prompts work as expected, so didn't notice any issue for couple days of testing. According to @JC1DA this should work, hence filing a bug.

sample prompt:

curl --location 'http://127.0.0.1:3000/v1/completions'
--header 'Content-Type: application/json'
--data '{
"prompt": "give me a random fact in detail about math",
"temperature": 1,
"stream": false,
"max_tokens": 200,
"model": "Llama-3.2-11B-Vision-Instruct"
}'

build commands for model:
python3 /opt/TensorRT-LLM-examples/models/core/mllama/convert_checkpoint.py --model_dir /models/Llama-3.2-11B-Vision-Instruct/ --output_dir /models/11-ckpt --dtype bfloat16`

python3 -m tensorrt_llm.commands.build --checkpoint_dir /models/11-ckpt --output_dir /models/11-trt --max_num_tokens 4096 --max_seq_len 2048 --workers 1 --gemm_plugin auto --max_batch_size 128 --max_encoder_input_len 6404 --input_timing_cache model.cache --use_paged_context_fmha enable

Lmk what other info might be needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions