Skip to content

Commit 17e88a2

Browse files
results freezing GC lazy Cuda graph
Signed-off-by: Diego-Castan <[email protected]>
1 parent f3ae88c commit 17e88a2

File tree

1 file changed

+27
-1
lines changed

1 file changed

+27
-1
lines changed

install_pod_files/lazy_cuda_graph/README.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1097,4 +1097,30 @@ DEBUG 08-06 11:51:06 [core.py:681] EngineCore waiting for work.
10971097
(APIServer pid=210846) INFO 08-06 11:51:12 [launcher.py:80] Shutting down FastAPI HTTP server.
10981098
(APIServer pid=210846) INFO: Shutting down
10991099
(APIServer pid=210846) INFO: Waiting for application shutdown.
1100-
(APIServer pid=210846) INFO: Application shutdown complete.
1100+
(APIServer pid=210846) INFO: Application shutdown complete.
1101+
1102+
1103+
### LAZY but freezing the GC
1104+
1105+
============ Serving Benchmark Result ============
1106+
Successful requests: 10
1107+
Benchmark duration (s): 4.72
1108+
Total input tokens: 1369
1109+
Total generated tokens: 1196
1110+
Request throughput (req/s): 2.12
1111+
Output token throughput (tok/s): 253.24
1112+
Total Token throughput (tok/s): 543.12
1113+
---------------Time to First Token----------------
1114+
Mean TTFT (ms): 55.10
1115+
Median TTFT (ms): 60.95
1116+
P99 TTFT (ms): 61.79
1117+
-----Time per Output Token (excl. 1st token)------
1118+
Mean TPOT (ms): 24.75
1119+
Median TPOT (ms): 12.69
1120+
P99 TPOT (ms): 63.33
1121+
---------------Inter-token Latency----------------
1122+
Mean ITL (ms): 8.29
1123+
Median ITL (ms): 5.00
1124+
P99 ITL (ms): 281.93
1125+
==================================================
1126+

0 commit comments

Comments
 (0)