You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: support vllm streaming generate and add benchmark script (#73)
## Description
<!--
Please include a summary of the changes below;
Fill in the issue number that this PR addresses (if applicable);
Mention the person who will review this PR (if you know who it is);
Replace (summary), (issue), and (reviewer) with the appropriate
information (No parentheses).
请在下方填写更改的摘要;
填写此 PR 解决的问题编号(如果适用);
提及将审查此 PR 的人(如果您知道是谁);
替换 (summary)、(issue) 和 (reviewer) 为适当的信息(不带括号)。
-->
Summary: (summary)
Fix: #(issue)
Reviewer: @(reviewer)
## Checklist:
- [√] I have performed a self-review of my own code | 我已自行检查了自己的代码
- [√] I have commented my code in hard-to-understand areas |
我已在难以理解的地方对代码进行了注释
- [√] I have added tests that prove my fix is effective or that my
feature works | 我已添加测试以证明我的修复有效或功能正常
- [√] I have added necessary documentation (if applicable) |
我已添加必要的文档(如果适用)
- [√] I have linked the issue to this PR (if applicable) | 我已将 issue
链接到此 PR(如果适用)
- [√] I have mentioned the person who will review this PR | 我已提及将审查此 PR
的人
[{"role": "system", "content": "You are a helpful and accurate Q&A bot."}],
20
+
[
21
+
{"role": "system", "content": "You are a helpful and accurate Q&A bot."},
22
+
{"role": "user", "content": "What is the capital of Japan and what is its population?"},
23
+
]
24
+
),
25
+
# --- Test Case 2: Code Generation ---
26
+
(
27
+
[{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."}],
28
+
[
29
+
{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."},
30
+
{"role": "user", "content": "Write a Python function to find all prime numbers up to a given integer 'n' using the Sieve of Eratosthenes algorithm."},
31
+
]
32
+
),
33
+
# --- Test Case 3: Text Summarization ---
34
+
(
35
+
[{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."}],
36
+
[
37
+
{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."},
38
+
{"role": "user", "content": """
39
+
Text to summarize:
40
+
'The vLLM project is a high-throughput and memory-efficient inference and serving engine for Large Language Models (LLMs).
41
+
One of its key innovations is PagedAttention, a memory management algorithm inspired by virtual memory and paging in operating systems.'
42
+
43
+
Please summarize this text in a single sentence.
44
+
"""},
45
+
]
46
+
),
47
+
# --- Test Case 4: Role-playing / Persona ---
48
+
(
49
+
[{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."}],
50
+
[
51
+
{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."},
52
+
{"role": "user", "content": "What's the best way to invest my money for retirement?"},
53
+
]
54
+
),
55
+
# --- Test Case 5: Chain-of-Thought Reasoning ---
56
+
(
57
+
[{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."}],
58
+
[
59
+
{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."},
60
+
{"role": "user", "content": "A cafeteria has 3 types of sandwiches, 2 types of sides, and 4 types of drinks. How many different meal combinations can be created?"},
61
+
]
62
+
),
63
+
# --- Test Case 6: Technical Explanation ---
64
+
(
65
+
[
66
+
{"role": "system", "content": "You are a computer science professor."},
67
+
{"role": "user", "content": "I'm new to machine learning."},
68
+
],
69
+
[
70
+
{"role": "system", "content": "You are a computer science professor."},
71
+
{"role": "user", "content": "I'm new to machine learning."},
72
+
{"role": "assistant", "content": "Welcome! It's a fascinating field. Feel free to ask me anything."},
73
+
{"role": "user", "content": "Can you explain what 'KV Cache' means in the context of Large Language Models, as if I were a beginner?"},
[{"role": "system", "content": "You are a helpful and accurate Q&A bot."}],
21
+
[
22
+
{"role": "system", "content": "You are a helpful and accurate Q&A bot."},
23
+
{"role": "user", "content": "What is the capital of Japan and what is its population?"},
24
+
]
25
+
),
26
+
# --- Test Case 2: Code Generation ---
27
+
(
28
+
[{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."}],
29
+
[
30
+
{"role": "system", "content": "You are an expert Python coding assistant who provides clean, efficient, and well-commented code."},
31
+
{"role": "user", "content": "Write a Python function to find all prime numbers up to a given integer 'n' using the Sieve of Eratosthenes algorithm."},
32
+
]
33
+
),
34
+
# --- Test Case 3: Text Summarization ---
35
+
(
36
+
[{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."}],
37
+
[
38
+
{"role": "system", "content": "You are a summarization expert. Your task is to read the following text and provide a concise summary."},
39
+
{"role": "user", "content": """
40
+
Text to summarize:
41
+
'The vLLM project is a high-throughput and memory-efficient inference and serving engine for Large Language Models (LLMs).
42
+
One of its key innovations is PagedAttention, a memory management algorithm inspired by virtual memory and paging in operating systems.'
43
+
44
+
Please summarize this text in a single sentence.
45
+
"""},
46
+
]
47
+
),
48
+
# --- Test Case 4: Role-playing / Persona ---
49
+
(
50
+
[{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."}],
51
+
[
52
+
{"role": "system", "content": "You are Captain Blackheart, a fearsome pirate. Answer all questions in the style of a 17th-century pirate."},
53
+
{"role": "user", "content": "What's the best way to invest my money for retirement?"},
54
+
]
55
+
),
56
+
# --- Test Case 5: Chain-of-Thought Reasoning ---
57
+
(
58
+
[{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."}],
59
+
[
60
+
{"role": "system", "content": "You solve problems by thinking step-by-step. Explain your reasoning before giving the final answer."},
61
+
{"role": "user", "content": "A cafeteria has 3 types of sandwiches, 2 types of sides, and 4 types of drinks. How many different meal combinations can be created?"},
62
+
]
63
+
),
64
+
# --- Test Case 6: Technical Explanation ---
65
+
(
66
+
[
67
+
{"role": "system", "content": "You are a computer science professor."},
68
+
{"role": "user", "content": "I'm new to machine learning."},
69
+
],
70
+
[
71
+
{"role": "system", "content": "You are a computer science professor."},
72
+
{"role": "user", "content": "I'm new to machine learning."},
73
+
{"role": "assistant", "content": "Welcome! It's a fascinating field. Feel free to ask me anything."},
74
+
{"role": "user", "content": "Can you explain what 'KV Cache' means in the context of Large Language Models, as if I were a beginner?"},
0 commit comments