-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
System Info
Device: Jetson Orin AGX (64GB)
OS / SDK: JetPack 6.2.1, Ubuntu 22.04
TensorRT-LLM: v0.21
Model: MS Phi-4 Multimodal
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
run the example of Phi-4-multimodal with --run_profiling option
Expected behavior
The model generates only response to my prompt.
actual behavior
Regardless of the input prompt, the model always generates exactly max_new_tokens tokens.
After expected response, irregular texts are appended.
Below is one of exmple.
'The image shows a stunning view of the Marina Bay Sands in Singapore, with the iconic Marina Bay Sands hotel and its two iconic towers, known as the "Eye of the Storm" and the "Infinity Pool," illuminated against a backdrop of a beautiful sunset. In the foreground, there is a majestic stone sculpture of the Chinese dragon, known as the "Lion Dance," which is a symbol of good luck and prosperity. The image also features a tranquil water scene with a fountain, creating a serene and picturesque atmosphere. The overall scene captures the beauty and modernity of Singapore's skyline at dusk.The image shows a stunning view of the Marina Bay Sands in Singapore, with the iconic Marina Bay Sands hotel and its two iconic towers, known as the "Eye of the Storm" and the "Infinity Pool," illuminated against a backdrop of a beautiful sunset. In the foreground, there is a majestic stone sculpture of the Chinese dragon, known as the "Lion Dance," which is a symbol of good luck and prosperity. The image also features a tranquil water scene with a fountain, creating a serene and picturesque atmosphere. The overall scene captures the beauty and modernity of Singapore's skyline at dusk.The image shows a stunning view of the Marina Bay Sands in Singapore, with the iconic Marina Bay Sands hotel and its two iconic towers, known as the "Eye of the Storm" and the "Infinity Pool," illuminated against a backdrop of a beautiful sunset. In the foreground, there is a majestic stone sculpture of the Chinese dragon, known as the "Lion Dance," which is a symbol of good luck and prosperity. The image also features a tranquil water scene with a fountain, creating a serene and picturesque atmosphere. The overall scene captures the beauty and modernity of Singapore's skyline at dusk.The image shows a stunning view of the Marina Bay Sands in Singapore, with the iconic Marina Bay Sands hotel and its two iconic towers, known as the "Eye of the Storm" and the "Infinity Pool," illuminated against a backdrop of a beautiful sunset. In the foreground, there is a majestic stone sculpture of the Chinese dragon, known as the "Lion Dance," which is a symbol of good luck and prosperity. The image also features a tranquil water scene with a fountain, creating a serene and picturesque atmosphere. The overall scene captures the beauty and modernity of Singapore's skyline at dusk.The image shows a stunning view of the Marina Bay Sands in Singapore, with the iconic Marina Bay Sands hotel and its two iconic towers, known as the "Eye'
additional notes
Root Cause
In MS Phi-4 Multimodal, when the generation should stop, the model returns the <|USER|> token (200020) instead of the standard EOS token (eos_id=199999).
Currently, the MultimodalModelRunner class does not handle this case, so generation continues until max_new_tokens is reached, causing unnecessary tokens to be produced.
Proposed Fix
In the MultimodalModelRunner class, when using MS Phi-4 Multimodal, add the argument stop_words_list=[[[200020]]] to the self.model.generate call so generation stops correctly when <|USER|> is encountered.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.