You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[{"role": "system", "content": "You are a helpful and accurate Q&A bot."}],
20
22
[
21
23
{"role": "system", "content": "You are a helpful and accurate Q&A bot."},
22
24
{"role": "user", "content": "What is the capital of Japan and what is its population?"},
23
-
]
25
+
],
24
26
),
25
27
# --- Test Case 2: Code Generation ---
26
28
(
27
-
[{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."}],
28
29
[
29
-
{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."},
30
-
{"role": "user", "content": "Write a Python function to find all prime numbers up to a given integer 'n' using the Sieve of Eratosthenes algorithm."},
31
-
]
30
+
{
31
+
"role": "system",
32
+
"content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code.",
33
+
}
34
+
],
35
+
[
36
+
{
37
+
"role": "system",
38
+
"content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code.",
39
+
},
40
+
{
41
+
"role": "user",
42
+
"content": "Write a Python function to find all prime numbers up to a given integer 'n' using the Sieve of Eratosthenes algorithm.",
43
+
},
44
+
],
32
45
),
33
46
# --- Test Case 3: Text Summarization ---
34
47
(
35
-
[{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."}],
36
48
[
37
-
{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."},
38
-
{"role": "user", "content": """
49
+
{
50
+
"role": "system",
51
+
"content": "You are a summarization expert. Your task is to read the following text and provide a concise summary.",
52
+
}
53
+
],
54
+
[
55
+
{
56
+
"role": "system",
57
+
"content": "You are a summarization expert. Your task is to read the following text and provide a concise summary.",
58
+
},
59
+
{
60
+
"role": "user",
61
+
"content": """
39
62
Text to summarize:
40
-
'The vLLM project is a high-throughput and memory-efficient inference and serving engine for Large Language Models (LLMs).
63
+
'The vLLM project is a high-throughput and memory-efficient inference and serving engine for Large Language Models (LLMs).
41
64
One of its key innovations is PagedAttention, a memory management algorithm inspired by virtual memory and paging in operating systems.'
42
-
65
+
43
66
Please summarize this text in a single sentence.
44
-
"""},
45
-
]
67
+
""",
68
+
},
69
+
],
46
70
),
47
71
# --- Test Case 4: Role-playing / Persona ---
48
72
(
49
-
[{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."}],
50
73
[
51
-
{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."},
74
+
{
75
+
"role": "system",
76
+
"content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate.",
77
+
}
78
+
],
79
+
[
80
+
{
81
+
"role": "system",
82
+
"content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate.",
83
+
},
52
84
{"role": "user", "content": "What's the best way to invest my money for retirement?"},
53
-
]
85
+
],
54
86
),
55
87
# --- Test Case 5: Chain-of-Thought Reasoning ---
56
88
(
57
-
[{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."}],
58
89
[
59
-
{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."},
60
-
{"role": "user", "content": "A cafeteria has 3 types of sandwiches, 2 types of sides, and 4 types of drinks. How many different meal combinations can be created?"},
61
-
]
90
+
{
91
+
"role": "system",
92
+
"content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer.",
93
+
}
94
+
],
95
+
[
96
+
{
97
+
"role": "system",
98
+
"content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer.",
99
+
},
100
+
{
101
+
"role": "user",
102
+
"content": "A cafeteria has 3 types of sandwiches, 2 types of sides, and 4 types of drinks. How many different meal combinations can be created?",
103
+
},
104
+
],
62
105
),
63
106
# --- Test Case 6: Technical Explanation ---
64
107
(
@@ -69,9 +112,15 @@
69
112
[
70
113
{"role": "system", "content": "You are a computer science professor."},
71
114
{"role": "user", "content": "I'm new to machine learning."},
72
-
{"role": "assistant", "content": "Welcome! It's a fascinating field. Feel free to ask me anything."},
73
-
{"role": "user", "content": "Can you explain what 'KV Cache' means in the context of Large Language Models, as if I were a beginner?"},
74
-
]
115
+
{
116
+
"role": "assistant",
117
+
"content": "Welcome! It's a fascinating field. Feel free to ask me anything.",
118
+
},
119
+
{
120
+
"role": "user",
121
+
"content": "Can you explain what 'KV Cache' means in the context of Large Language Models, as if I were a beginner?",
122
+
},
123
+
],
75
124
),
76
125
]
77
126
@@ -81,7 +130,7 @@ def run_ttft_benchmark(num_runs: int = 10, warmup_runs: int = 3):
81
130
Runs the TTFT benchmark for each test case and prints statistics.
82
131
"""
83
132
print("--- Time to First Token (TTFT) Benchmark for vLLM ---")
84
-
133
+
85
134
# 1. Configuration - MUST match your running vLLM server
0 commit comments