-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Description
I am running into some issues when running the model analyzer.
- Running it on an onnx model with below config.yaml file, the model-analyzer only checks batch size 4 (attached the screenshot under question 3), not going beyond that:
model_repository: /path/to/models_repo/
# Disable run config search
run_config_search_disable: True
# Model profiling configuration
profile_models:
reranker:
parameters:
concurrency:
start: 5
stop: 20
step: 5
batch_sizes: [4, 8, 16, 32, 64]
model_config_parameters:
dynamic_batching:
max_queue_delay_microseconds: [200, 400, 600]
instance_group:
- kind: KIND_GPU
count: [1, 2]
perf_analyzer_flags:
shape:
- input_ids:128
- attention_mask:128
- token_type_ids:128
- I tried to run it using CLI with only the input shapes present in config, with the below command:
model-analyzer profile \
-f config.yaml \
--triton-launch-mode=docker \
--output-model-repository-path /path/to/output \
--run-config-search-max-instance-count 2 \
--profile-models reranker \
--run-config-search-max-concurrency 2 \
--run-config-search-max-model-batch-size 2 \
--override-output-model-repository \
--model-repository /path/to/models_repo/
But this also results in another issue. For configs other than default config, when the model-analyzer loads the triton server, through the logs I could see that the server was setting the max batch size to 1 and dynamic batch size to 4 which resulted in following error : "dynamic batching preferred size must be <= max batch size"
- Is there any way to have model-analyzer run all configurations with dynamic batching turned off also, to compare how it affects the throughput? Below is the report generated using the config mentioned in in 1) - how to have rows corresponding to dynamic batching disabled in this?
- Is there any way to include varying sequence lengths in the analysis? I tried below config but it did not work
perf_analyzer_flags:
shape:
- input_ids:[128,256]
- attention_mask:[128,256]
- token_type_ids:[128,256]
It would be really helpful if you could answer these questions. And if it is possible to check all these various configs (batch size/dynamic batch/sequence length) with a single config/CLI command, that would be ideal.
Thank you in advance!
Metadata
Metadata
Assignees
Labels
No labels