Facing issue in config search with Batch Size, Dynamic Batching, and Sequence Length

I am running into some issues when running the model analyzer. 

1) Running it on an onnx model with below config.yaml file, the model-analyzer only checks batch size 4 (attached the screenshot under question 3), not going beyond that:
 
```yml
model_repository: /path/to/models_repo/

# Disable run config search
run_config_search_disable: True

# Model profiling configuration
profile_models:
  reranker:
    parameters:
      concurrency:
        start: 5
        stop: 20
        step: 5
      batch_sizes: [4, 8, 16, 32, 64]
    model_config_parameters:
      dynamic_batching:
        max_queue_delay_microseconds: [200, 400, 600]
      instance_group:
        - kind: KIND_GPU
          count: [1, 2]
perf_analyzer_flags:
  shape:
  - input_ids:128
  - attention_mask:128
  - token_type_ids:128

```

2) I tried to run it using CLI with only the input shapes present in config, with the below command: 

```bash
model-analyzer profile \
  -f config.yaml \
  --triton-launch-mode=docker \
  --output-model-repository-path /path/to/output \
  --run-config-search-max-instance-count 2 \
  --profile-models reranker \
  --run-config-search-max-concurrency 2 \
  --run-config-search-max-model-batch-size 2 \
  --override-output-model-repository \
  --model-repository /path/to/models_repo/

```

But this also results in another issue. For configs other than default config, when the model-analyzer loads the triton server, through the logs I could see that the server was setting the max batch size to 1 and dynamic batch size to 4 which resulted in following error : "dynamic batching preferred size must be <= max batch size"

3) Is there any way to have model-analyzer run all configurations with dynamic batching turned off also, to compare how it affects the throughput? Below is the report generated using the config mentioned in in 1) - how to have rows corresponding to dynamic batching disabled in this? 

<img width="1033" alt="Screenshot 2025-01-13 at 6 16 21 pm" src="https://github.com/user-attachments/assets/41768d01-6a62-4737-a444-6422d0220207" />

4) Is there any way to include varying sequence lengths in the analysis?  I tried below config but it did not work

```yml
perf_analyzer_flags:
  shape:
  - input_ids:[128,256]
  - attention_mask:[128,256]
  - token_type_ids:[128,256]

```
 
It would be really helpful if you could answer these questions. And if it is possible to check all these various configs (batch size/dynamic batch/sequence length) with a single config/CLI command, that would be ideal.

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Facing issue in config search with Batch Size, Dynamic Batching, and Sequence Length #957

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Facing issue in config search with Batch Size, Dynamic Batching, and Sequence Length #957

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions