@@ -474,7 +474,32 @@ Output file (`output2.png`):
474
474
![ output2] ( ./output2.png )
475
475
476
476
477
- ## Measuring performance
477
+ ## Measuring throughput
478
+ To increase throughput in image generation scenarios, it is worth changing plugin config and increase NUM_STREAMS. Additionally, set up static shape for the model to avoid dynamic shape overhead. This can be done by setting ` resolution ` parameter in the request.
479
+
480
+ Edit graph.pbtxt and restart the server:
481
+ ```
482
+ input_stream: "HTTP_REQUEST_PAYLOAD:input"
483
+ output_stream: "HTTP_RESPONSE_PAYLOAD:output"
484
+
485
+ node: {
486
+ name: "ImageGenExecutor"
487
+ calculator: "ImageGenCalculator"
488
+ input_stream: "HTTP_REQUEST_PAYLOAD:input"
489
+ input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
490
+ output_stream: "HTTP_RESPONSE_PAYLOAD:output"
491
+ node_options: {
492
+ [type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
493
+ models_path: "./"
494
+ device: "CPU"
495
+ num_images_per_prompt: 4 # 4 images per inference request
496
+ resolution: "512x512" # reshape to static value
497
+ plugin_config: '{"PERFORMANCE_HINT":"THROUGHPUT","NUM_STREAMS":8}'
498
+ }
499
+ }
500
+ }
501
+ ```
502
+
478
503
Prepare example request ` input_data.json ` :
479
504
```
480
505
{
@@ -484,7 +509,7 @@ Prepare example request `input_data.json`:
484
509
{
485
510
"model": "OpenVINO/stable-diffusion-v1-5-int8-ov",
486
511
"prompt": "dog",
487
- "num_inference_steps": 2
512
+ "num_inference_steps": 50
488
513
}
489
514
]
490
515
}
@@ -503,50 +528,8 @@ docker run --rm -it --net=host -v $(pwd):/work:rw nvcr.io/nvidia/tritonserver:24
503
528
--endpoint=v3/images/generations \
504
529
--async \
505
530
-u localhost:8000 \
506
- --request-count 8 \
507
- --concurrency-range 8
508
- ```
509
-
510
- MCLX23
511
- ```
512
- *** Measurement Settings ***
513
- Service Kind: OPENAI
514
- Sending 8 benchmark requests
515
- Using asynchronous calls for inference
516
-
517
- Request concurrency: 8
518
- Client:
519
- Request count: 8
520
- Throughput: 0.210501 infer/sec
521
- Avg latency: 29514881 usec (standard deviation 1509943 usec)
522
- p50 latency: 31140977 usec
523
- p90 latency: 36002018 usec
524
- p95 latency: 37274567 usec
525
- p99 latency: 37274567 usec
526
- Avg HTTP time: 29514870 usec (send/recv 3558 usec + response wait 29511312 usec)
527
- Inferences/Second vs. Client Average Batch Latency
528
- Concurrency: 8, throughput: 0.210501 infer/sec, latency 29514881 usec
529
- ```
530
-
531
- SPR36
532
- ```
533
- *** Measurement Settings ***
534
- Service Kind: OPENAI
535
- Sending 8 benchmark requests
536
- Using asynchronous calls for inference
537
-
538
- Request concurrency: 8
539
- Client:
540
- Request count: 8
541
- Throughput: 1.14268 infer/sec
542
- Avg latency: 5124694 usec (standard deviation 695195 usec)
543
- p50 latency: 5252478 usec
544
- p90 latency: 5922719 usec
545
- p95 latency: 6080321 usec
546
- p99 latency: 6080321 usec
547
- Avg HTTP time: 5124684 usec (send/recv 15272 usec + response wait 5109412 usec)
548
- Inferences/Second vs. Client Average Batch Latency
549
- Concurrency: 8, throughput: 1.14268 infer/sec, latency 5124694 usec
531
+ --request-count 16 \
532
+ --concurrency-range 16
550
533
```
551
534
552
535
```
@@ -556,19 +539,22 @@ Concurrency: 8, throughput: 1.14268 infer/sec, latency 5124694 usec
556
539
Using asynchronous calls for inference
557
540
558
541
Request concurrency: 16
559
- Client:
542
+ Client:
560
543
Request count: 16
561
- Throughput: 1.33317 infer/sec
562
- Avg latency: 8945421 usec (standard deviation 929729 usec)
563
- p50 latency: 9395319 usec
564
- p90 latency: 11657659 usec
565
- p95 latency: 11657659 usec
566
- p99 latency: 11659369 usec
567
- Avg HTTP time: 8945411 usec (send/recv 491743 usec + response wait 8453668 usec)
544
+ Throughput: 0.0999919 infer/sec
545
+ Avg latency: 156783666 usec (standard deviation 1087845 usec)
546
+ p50 latency: 157110315 usec
547
+ p90 latency: 158720060 usec
548
+ p95 latency: 158720060 usec
549
+ p99 latency: 159494095 usec
550
+ Avg HTTP time: 156783654 usec (send/recv 8717 usec + response wait 156774937 usec)
568
551
Inferences/Second vs. Client Average Batch Latency
569
- Concurrency: 16, throughput: 1.33317 infer/sec, latency 8945421 usec
552
+ Concurrency: 16, throughput: 0.0999919 infer/sec, latency 156783666 usec
570
553
```
571
554
555
+ 0.0999919 infer/sec meaning 0.4 images per second considering 4 images per prompt.
556
+
557
+
572
558
## References
573
559
- [ Image Generation API] ( ../../docs/model_server_rest_api_image_generation.md )
574
560
- [ Writing client code] ( ../../docs/clients_genai.md )
0 commit comments