Skip to content

Facing issue in config search with Batch Size, Dynamic Batching, and Sequence Length #957

@harsh-boloai

Description

@harsh-boloai

I am running into some issues when running the model analyzer.

  1. Running it on an onnx model with below config.yaml file, the model-analyzer only checks batch size 4 (attached the screenshot under question 3), not going beyond that:
model_repository: /path/to/models_repo/

# Disable run config search
run_config_search_disable: True

# Model profiling configuration
profile_models:
  reranker:
    parameters:
      concurrency:
        start: 5
        stop: 20
        step: 5
      batch_sizes: [4, 8, 16, 32, 64]
    model_config_parameters:
      dynamic_batching:
        max_queue_delay_microseconds: [200, 400, 600]
      instance_group:
        - kind: KIND_GPU
          count: [1, 2]
perf_analyzer_flags:
  shape:
  - input_ids:128
  - attention_mask:128
  - token_type_ids:128
  1. I tried to run it using CLI with only the input shapes present in config, with the below command:
model-analyzer profile \
  -f config.yaml \
  --triton-launch-mode=docker \
  --output-model-repository-path /path/to/output \
  --run-config-search-max-instance-count 2 \
  --profile-models reranker \
  --run-config-search-max-concurrency 2 \
  --run-config-search-max-model-batch-size 2 \
  --override-output-model-repository \
  --model-repository /path/to/models_repo/

But this also results in another issue. For configs other than default config, when the model-analyzer loads the triton server, through the logs I could see that the server was setting the max batch size to 1 and dynamic batch size to 4 which resulted in following error : "dynamic batching preferred size must be <= max batch size"

  1. Is there any way to have model-analyzer run all configurations with dynamic batching turned off also, to compare how it affects the throughput? Below is the report generated using the config mentioned in in 1) - how to have rows corresponding to dynamic batching disabled in this?
Screenshot 2025-01-13 at 6 16 21 pm
  1. Is there any way to include varying sequence lengths in the analysis? I tried below config but it did not work
perf_analyzer_flags:
  shape:
  - input_ids:[128,256]
  - attention_mask:[128,256]
  - token_type_ids:[128,256]

It would be really helpful if you could answer these questions. And if it is possible to check all these various configs (batch size/dynamic batch/sequence length) with a single config/CLI command, that would be ideal.

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions