Skip to content

[Bug]: Disgraceful quality of codeΒ #8652

@pythonjavaerlang

Description

@pythonjavaerlang

System Info

CPU: x86_64
RAM: 144 GB
GPU: Nvidia L4, 24 GB
Libraries:
TensorRT-LLM version: 1.2.0rc0
Docker container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
Nvidia driver: 535.261.03-1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Python 3.12.3

pip3 show tensorrt_llm tensorrt torch
Name: tensorrt_llm
Version: 1.2.0rc0
Name: tensorrt
Version: 10.11.0.33
Name: torch
Version: 2.7.1

Who can help?

@Tracin @juney-nvidia @kaiyux

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

tensorrt_llm/runtime/multimodal_model_runner.py:

    def setup_inputs(self, input_text, raw_image, raw_audio=None):
        from ..tools.multimodal_builder import compute_rotary_pos_emb **# in-method imports are discouraged by pep8**

        elif 'qwen2_vl' in self.model_type:
           **# the same:**
            from qwen_vl_utils import process_vision_info
            from transformers.models.qwen2_vl.modeling_qwen2_vl import \
                VisionRotaryEmbedding

            messages = [[{
                "role":
                "user",
                "content": [
                    {
                        "type": "image",
                        "image": raw_image[idx],
                    },
                    {
                        "type": "text",
                        "text": input_text[idx],
                    },
                ],
            }] for idx in range(self.args.batch_size)]  **# This always fails, as load_test_data method always returns one image object, so we receive IndexError.**

    def load_test_data(self, image_path=None, video_path=None):
        ....
        elif "qwen2_vl" in self.model_type:
            images = []  # you define list here
            if self.args.image_path is None:
                img_url = 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'
                image = Image.open(
                    requests.get(img_url, stream=True,
                                 timeout=5).raw).convert('RGB')
                image = image.resize((504, 504))
                images.append(image)
            else:
                images = []  **# then you redefine it here, but you use self.args.image_path, that are passed in constructor:     model = MultimodalModelRunner(Namespace(**AI_MODEL_ARGS))
                # not using actual arguments I pass in model.run():
                #     input_text, output_text = model.run(prompts, visual_data, None, max_new_tokens)
                # Therefore, if one path specified in args, this method always returns one item, instead of multiple items**
                for image_path in self.args.image_path:
                    image = Image.open(image_path).convert('RGB')
                    image = image.resize((504, 504))
                    images.append(image)


def run(self, input_text, input_image, input_audio, max_new_tokens): 
  input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_audio_inputs, other_decoder_inputs = self.setup_inputs(
      input_text, input_image, input_audio)
  **# Here you do not allow passing sampling_config and evern worse, you define two SamplingConfigs within one project with two different field sets. One in C++ and another one in Python. Python version do not accept all arguments that C++ one has, for example beam_width.**
  output_text = self.generate(pre_prompt,
                              post_prompt,
                              processed_image,
                              decoder_input_ids,
                              max_new_tokens,
                              other_vision_inputs=other_vision_inputs,
                              other_audio_inputs=other_audio_inputs,
                              other_decoder_inputs=other_decoder_inputs
                              )
  return input_text, output_text

    def generate(self,
                 pre_prompt,
                 post_prompt,
                 image,
                 decoder_input_ids,
                 max_new_tokens,
                 other_vision_inputs={},
                 other_audio_inputs={},
                 other_decoder_inputs={}):
        ...
        if sampling_config is None:
            **# Here you define:**
            sampling_config_list = [None] * batch_size
             use_sampling_config_for_each_request = True
              ...
        else:
            sampling_config = copy.deepcopy(sampling_config)
            **# If sampling config is not specified, you just don't define sampling_config_list and use_sampling_config_for_each_request therefore later code fails with traceback: variable not defined**

Expected behavior

I expect $4.5T company like Nvidia to produce better code, so I would not have to spend week debugging your bugs.
Publishing that code is a spew in face of open-source community !
Evidently you are not using that code yourself, as it contains so many bugs. But you expect us debugging it, while making astronomical money.

actual behavior

Nothing works good in TensorRT-LLM. Not a single feature.
I bought Nvidia L4 for 1600 EUR and I can't use it, because there's no production-ready software !

additional notes

Seriously, you should hire me.
I and Grok/ChatGPT/Claude could refactor your lousy code in no time, to save your face.

My profile in LinkedIn:
https://www.linkedin.com/in/python-java-erlang-ai-ml-developer/

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingstalewaiting for feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions