Skip to content

Commit e919f39

Browse files
committed
save
1 parent f679d15 commit e919f39

File tree

1 file changed

+41
-55
lines changed

1 file changed

+41
-55
lines changed

demos/image_generation/README.md

Lines changed: 41 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,32 @@ Output file (`output2.png`):
474474
![output2](./output2.png)
475475

476476

477-
## Measuring performance
477+
## Measuring throughput
478+
To increase throughput in image generation scenarios, it is worth changing plugin config and increase NUM_STREAMS. Additionally, set up static shape for the model to avoid dynamic shape overhead. This can be done by setting `resolution` parameter in the request.
479+
480+
Edit graph.pbtxt and restart the server:
481+
```
482+
input_stream: "HTTP_REQUEST_PAYLOAD:input"
483+
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
484+
485+
node: {
486+
name: "ImageGenExecutor"
487+
calculator: "ImageGenCalculator"
488+
input_stream: "HTTP_REQUEST_PAYLOAD:input"
489+
input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
490+
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
491+
node_options: {
492+
[type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
493+
models_path: "./"
494+
device: "CPU"
495+
num_images_per_prompt: 4 # 4 images per inference request
496+
resolution: "512x512" # reshape to static value
497+
plugin_config: '{"PERFORMANCE_HINT":"THROUGHPUT","NUM_STREAMS":8}'
498+
}
499+
}
500+
}
501+
```
502+
478503
Prepare example request `input_data.json`:
479504
```
480505
{
@@ -484,7 +509,7 @@ Prepare example request `input_data.json`:
484509
{
485510
"model": "OpenVINO/stable-diffusion-v1-5-int8-ov",
486511
"prompt": "dog",
487-
"num_inference_steps": 2
512+
"num_inference_steps": 50
488513
}
489514
]
490515
}
@@ -503,50 +528,8 @@ docker run --rm -it --net=host -v $(pwd):/work:rw nvcr.io/nvidia/tritonserver:24
503528
--endpoint=v3/images/generations \
504529
--async \
505530
-u localhost:8000 \
506-
--request-count 8 \
507-
--concurrency-range 8
508-
```
509-
510-
MCLX23
511-
```
512-
*** Measurement Settings ***
513-
Service Kind: OPENAI
514-
Sending 8 benchmark requests
515-
Using asynchronous calls for inference
516-
517-
Request concurrency: 8
518-
Client:
519-
Request count: 8
520-
Throughput: 0.210501 infer/sec
521-
Avg latency: 29514881 usec (standard deviation 1509943 usec)
522-
p50 latency: 31140977 usec
523-
p90 latency: 36002018 usec
524-
p95 latency: 37274567 usec
525-
p99 latency: 37274567 usec
526-
Avg HTTP time: 29514870 usec (send/recv 3558 usec + response wait 29511312 usec)
527-
Inferences/Second vs. Client Average Batch Latency
528-
Concurrency: 8, throughput: 0.210501 infer/sec, latency 29514881 usec
529-
```
530-
531-
SPR36
532-
```
533-
*** Measurement Settings ***
534-
Service Kind: OPENAI
535-
Sending 8 benchmark requests
536-
Using asynchronous calls for inference
537-
538-
Request concurrency: 8
539-
Client:
540-
Request count: 8
541-
Throughput: 1.14268 infer/sec
542-
Avg latency: 5124694 usec (standard deviation 695195 usec)
543-
p50 latency: 5252478 usec
544-
p90 latency: 5922719 usec
545-
p95 latency: 6080321 usec
546-
p99 latency: 6080321 usec
547-
Avg HTTP time: 5124684 usec (send/recv 15272 usec + response wait 5109412 usec)
548-
Inferences/Second vs. Client Average Batch Latency
549-
Concurrency: 8, throughput: 1.14268 infer/sec, latency 5124694 usec
531+
--request-count 16 \
532+
--concurrency-range 16
550533
```
551534

552535
```
@@ -556,19 +539,22 @@ Concurrency: 8, throughput: 1.14268 infer/sec, latency 5124694 usec
556539
Using asynchronous calls for inference
557540
558541
Request concurrency: 16
559-
Client:
542+
Client:
560543
Request count: 16
561-
Throughput: 1.33317 infer/sec
562-
Avg latency: 8945421 usec (standard deviation 929729 usec)
563-
p50 latency: 9395319 usec
564-
p90 latency: 11657659 usec
565-
p95 latency: 11657659 usec
566-
p99 latency: 11659369 usec
567-
Avg HTTP time: 8945411 usec (send/recv 491743 usec + response wait 8453668 usec)
544+
Throughput: 0.0999919 infer/sec
545+
Avg latency: 156783666 usec (standard deviation 1087845 usec)
546+
p50 latency: 157110315 usec
547+
p90 latency: 158720060 usec
548+
p95 latency: 158720060 usec
549+
p99 latency: 159494095 usec
550+
Avg HTTP time: 156783654 usec (send/recv 8717 usec + response wait 156774937 usec)
568551
Inferences/Second vs. Client Average Batch Latency
569-
Concurrency: 16, throughput: 1.33317 infer/sec, latency 8945421 usec
552+
Concurrency: 16, throughput: 0.0999919 infer/sec, latency 156783666 usec
570553
```
571554

555+
0.0999919 infer/sec meaning 0.4 images per second considering 4 images per prompt.
556+
557+
572558
## References
573559
- [Image Generation API](../../docs/model_server_rest_api_image_generation.md)
574560
- [Writing client code](../../docs/clients_genai.md)

0 commit comments

Comments
 (0)