-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
CPU: x86_64
RAM: 144 GB
GPU: Nvidia L4, 24 GB
Libraries:
TensorRT-LLM version: 1.2.0rc0
Docker container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
Nvidia driver: 535.261.03-1
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Python 3.12.3
pip3 show tensorrt_llm tensorrt torch
Name: tensorrt_llm
Version: 1.2.0rc0
Name: tensorrt
Version: 10.11.0.33
Name: torch
Version: 2.7.1
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
tensorrt_llm/runtime/multimodal_model_runner.py:
def setup_inputs(self, input_text, raw_image, raw_audio=None):
from ..tools.multimodal_builder import compute_rotary_pos_emb **# in-method imports are discouraged by pep8**
elif 'qwen2_vl' in self.model_type:
**# the same:**
from qwen_vl_utils import process_vision_info
from transformers.models.qwen2_vl.modeling_qwen2_vl import \
VisionRotaryEmbedding
messages = [[{
"role":
"user",
"content": [
{
"type": "image",
"image": raw_image[idx],
},
{
"type": "text",
"text": input_text[idx],
},
],
}] for idx in range(self.args.batch_size)] **# This always fails, as load_test_data method always returns one image object, so we receive IndexError.**
def load_test_data(self, image_path=None, video_path=None):
....
elif "qwen2_vl" in self.model_type:
images = [] # you define list here
if self.args.image_path is None:
img_url = 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'
image = Image.open(
requests.get(img_url, stream=True,
timeout=5).raw).convert('RGB')
image = image.resize((504, 504))
images.append(image)
else:
images = [] **# then you redefine it here, but you use self.args.image_path, that are passed in constructor: model = MultimodalModelRunner(Namespace(**AI_MODEL_ARGS))
# not using actual arguments I pass in model.run():
# input_text, output_text = model.run(prompts, visual_data, None, max_new_tokens)
# Therefore, if one path specified in args, this method always returns one item, instead of multiple items**
for image_path in self.args.image_path:
image = Image.open(image_path).convert('RGB')
image = image.resize((504, 504))
images.append(image)
def run(self, input_text, input_image, input_audio, max_new_tokens):
input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_audio_inputs, other_decoder_inputs = self.setup_inputs(
input_text, input_image, input_audio)
**# Here you do not allow passing sampling_config and evern worse, you define two SamplingConfigs within one project with two different field sets. One in C++ and another one in Python. Python version do not accept all arguments that C++ one has, for example beam_width.**
output_text = self.generate(pre_prompt,
post_prompt,
processed_image,
decoder_input_ids,
max_new_tokens,
other_vision_inputs=other_vision_inputs,
other_audio_inputs=other_audio_inputs,
other_decoder_inputs=other_decoder_inputs
)
return input_text, output_text
def generate(self,
pre_prompt,
post_prompt,
image,
decoder_input_ids,
max_new_tokens,
other_vision_inputs={},
other_audio_inputs={},
other_decoder_inputs={}):
...
if sampling_config is None:
**# Here you define:**
sampling_config_list = [None] * batch_size
use_sampling_config_for_each_request = True
...
else:
sampling_config = copy.deepcopy(sampling_config)
**# If sampling config is not specified, you just don't define sampling_config_list and use_sampling_config_for_each_request therefore later code fails with traceback: variable not defined**
Expected behavior
I expect $4.5T company like Nvidia to produce better code, so I would not have to spend week debugging your bugs.
Publishing that code is a spew in face of open-source community !
Evidently you are not using that code yourself, as it contains so many bugs. But you expect us debugging it, while making astronomical money.
actual behavior
Nothing works good in TensorRT-LLM. Not a single feature.
I bought Nvidia L4 for 1600 EUR and I can't use it, because there's no production-ready software !
additional notes
Seriously, you should hire me.
I and Grok/ChatGPT/Claude could refactor your lousy code in no time, to save your face.
My profile in LinkedIn:
https://www.linkedin.com/in/python-java-erlang-ai-ml-developer/
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.