[Bug]: Disgraceful quality of code

### System Info

CPU: x86_64
RAM: 144 GB
GPU: Nvidia L4, 24 GB
Libraries:
TensorRT-LLM version: 1.2.0rc0
Docker container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
Nvidia driver: 535.261.03-1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Python 3.12.3

pip3 show tensorrt_llm tensorrt torch
Name: tensorrt_llm
Version: 1.2.0rc0
Name: tensorrt
Version: 10.11.0.33
Name: torch
Version: 2.7.1

### Who can help?

@Tracin @juney-nvidia @kaiyux

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

tensorrt_llm/runtime/multimodal_model_runner.py:

```
    def setup_inputs(self, input_text, raw_image, raw_audio=None):
        from ..tools.multimodal_builder import compute_rotary_pos_emb **# in-method imports are discouraged by pep8**

        elif 'qwen2_vl' in self.model_type:
           **# the same:**
            from qwen_vl_utils import process_vision_info
            from transformers.models.qwen2_vl.modeling_qwen2_vl import \
                VisionRotaryEmbedding

            messages = [[{
                "role":
                "user",
                "content": [
                    {
                        "type": "image",
                        "image": raw_image[idx],
                    },
                    {
                        "type": "text",
                        "text": input_text[idx],
                    },
                ],
            }] for idx in range(self.args.batch_size)]  **# This always fails, as load_test_data method always returns one image object, so we receive IndexError.**

    def load_test_data(self, image_path=None, video_path=None):
        ....
        elif "qwen2_vl" in self.model_type:
            images = []  # you define list here
            if self.args.image_path is None:
                img_url = 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'
                image = Image.open(
                    requests.get(img_url, stream=True,
                                 timeout=5).raw).convert('RGB')
                image = image.resize((504, 504))
                images.append(image)
            else:
                images = []  **# then you redefine it here, but you use self.args.image_path, that are passed in constructor:     model = MultimodalModelRunner(Namespace(**AI_MODEL_ARGS))
                # not using actual arguments I pass in model.run():
                #     input_text, output_text = model.run(prompts, visual_data, None, max_new_tokens)
                # Therefore, if one path specified in args, this method always returns one item, instead of multiple items**
                for image_path in self.args.image_path:
                    image = Image.open(image_path).convert('RGB')
                    image = image.resize((504, 504))
                    images.append(image)


def run(self, input_text, input_image, input_audio, max_new_tokens): 
  input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_audio_inputs, other_decoder_inputs = self.setup_inputs(
      input_text, input_image, input_audio)
  **# Here you do not allow passing sampling_config and evern worse, you define two SamplingConfigs within one project with two different field sets. One in C++ and another one in Python. Python version do not accept all arguments that C++ one has, for example beam_width.**
  output_text = self.generate(pre_prompt,
                              post_prompt,
                              processed_image,
                              decoder_input_ids,
                              max_new_tokens,
                              other_vision_inputs=other_vision_inputs,
                              other_audio_inputs=other_audio_inputs,
                              other_decoder_inputs=other_decoder_inputs
                              )
  return input_text, output_text

    def generate(self,
                 pre_prompt,
                 post_prompt,
                 image,
                 decoder_input_ids,
                 max_new_tokens,
                 other_vision_inputs={},
                 other_audio_inputs={},
                 other_decoder_inputs={}):
        ...
        if sampling_config is None:
            **# Here you define:**
            sampling_config_list = [None] * batch_size
             use_sampling_config_for_each_request = True
              ...
        else:
            sampling_config = copy.deepcopy(sampling_config)
            **# If sampling config is not specified, you just don't define sampling_config_list and use_sampling_config_for_each_request therefore later code fails with traceback: variable not defined**
```

### Expected behavior

I expect $4.5T company like Nvidia to produce better code, so I would not have to spend week debugging your bugs.
Publishing that code is a spew in face of open-source community !
Evidently you are not using that code yourself, as it contains so many bugs. But you expect us debugging it, while making astronomical money.

### actual behavior

Nothing works good in TensorRT-LLM. Not a single feature.
I bought Nvidia L4 for 1600 EUR and I can't use it, because there's no production-ready software !

### additional notes

Seriously, you should hire me. 
I and Grok/ChatGPT/Claude could refactor your lousy code in no time, to save your face.

My profile in LinkedIn:
https://www.linkedin.com/in/python-java-erlang-ai-ml-developer/


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Disgraceful quality of code #8652

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Disgraceful quality of code #8652

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions