File tree Expand file tree Collapse file tree 1 file changed +27
-1
lines changed
install_pod_files/lazy_cuda_graph Expand file tree Collapse file tree 1 file changed +27
-1
lines changed Original file line number Diff line number Diff line change @@ -1097,4 +1097,30 @@ DEBUG 08-06 11:51:06 [core.py:681] EngineCore waiting for work.
1097
1097
(APIServer pid=210846) INFO 08-06 11:51:12 [ launcher.py:80] Shutting down FastAPI HTTP server.
1098
1098
(APIServer pid=210846) INFO: Shutting down
1099
1099
(APIServer pid=210846) INFO: Waiting for application shutdown.
1100
- (APIServer pid=210846) INFO: Application shutdown complete.
1100
+ (APIServer pid=210846) INFO: Application shutdown complete.
1101
+
1102
+
1103
+ ### LAZY but freezing the GC
1104
+
1105
+ ============ Serving Benchmark Result ============
1106
+ Successful requests: 10
1107
+ Benchmark duration (s): 4.72
1108
+ Total input tokens: 1369
1109
+ Total generated tokens: 1196
1110
+ Request throughput (req/s): 2.12
1111
+ Output token throughput (tok/s): 253.24
1112
+ Total Token throughput (tok/s): 543.12
1113
+ ---------------Time to First Token----------------
1114
+ Mean TTFT (ms): 55.10
1115
+ Median TTFT (ms): 60.95
1116
+ P99 TTFT (ms): 61.79
1117
+ -----Time per Output Token (excl. 1st token)------
1118
+ Mean TPOT (ms): 24.75
1119
+ Median TPOT (ms): 12.69
1120
+ P99 TPOT (ms): 63.33
1121
+ ---------------Inter-token Latency----------------
1122
+ Mean ITL (ms): 8.29
1123
+ Median ITL (ms): 5.00
1124
+ P99 ITL (ms): 281.93
1125
+ ==================================================
1126
+
You can’t perform that action at this time.
0 commit comments