|
1 | | -ANALYSIS_PROMPT = """ |
| 1 | +ANALYSIS_PROMPT_EN = """ |
2 | 2 | Analyze the LLM stress test configuration: {test_config} and performance results: {results}, then produce a concise, technical evaluation focused on the metrics below. |
3 | 3 |
|
4 | | - Rules |
| 4 | + Rules: |
5 | 5 | - First_token_latency assessment: Good (<1.00s), Moderate (1.00–2.00s), Poor (>2.00s). |
6 | 6 | - Total_time assessment: Good (<60.00s), Moderate (60.00–180.00s), Poor (>180.00s). |
7 | 7 | - if Total_time is Poor, highlight how First_token_latency, Total_tps, and Avg_total_tokens influence Total_time. |
8 | 8 | - failure_request: If there is a failed request, please indicate it in the `Identified Issues` and direct the user to check the task log for the specific error information. |
9 | 9 | - If a metric is missing, display N/A (do not infer). |
10 | 10 | - Keep output under 300 words, technical, and prioritize the most severe issues. |
11 | 11 |
|
12 | | - Required Output Format |
| 12 | + Required Output Format: |
13 | 13 | ### Performance Summary |
14 | 14 | [1–3 sentence overall assessment, including UX judgment and the dominant bottleneck(s).] |
15 | 15 |
|
|
20 | 20 | | First_token_latency(s) | X.XX | Good (<1.00s), Moderate (1.00–2.00s), Poor (>2.00s) | Good/Moderate/Poor | |
21 | 21 | | Total_time(s) | X.XX | Good (<60.00s), Moderate (60.00–180.00s), Poor (>180.00s) | Good/Moderate/Poor | |
22 | 22 | | RPS(req/s) | X.XX | — | — | |
23 | | - | Completion_tps(tokens/s) | X.XX | — | — | |
24 | | - | Total_tps(tokens/s) | X.XX | — | — | |
25 | | - | Avg_completion_tokens(tokens/req) | N | — | — | |
26 | | - | Avg_total_tokens(tokens/req) | N | — | — | |
| 23 | + | Completion Tps(Tokens/s) | X.XX | — | — | |
| 24 | + | Total Tps(Tokens/s) | X.XX | — | — | |
| 25 | + | Avg_completion_tokens(Tokens/req) | N | — | — | |
| 26 | + | Avg_total_tokens(Tokens/req) | N | — | — | |
27 | 27 | |Failure_request| N | — | — | |
28 | 28 |
|
29 | 29 | ### Identified Issues |
30 | 30 | 1. [Most critical issue with metric value and impact, if any] |
31 | 31 | 2. [Highlight failure_request, if any] |
32 | 32 | """ |
| 33 | + |
| 34 | +ANALYSIS_PROMPT_CN = """ |
| 35 | + 请分析 LLM 压测配置:{test_config} 和性能结果:{results},然后针对以下指标和要求生成一份简明的技术评估报告。 |
| 36 | +
|
| 37 | + 规则: |
| 38 | + - First_token_latency 评估:良好(<1.00 秒),中等(1.00-2.00 秒),较差(>2.00 秒)。 |
| 39 | + - Total_time 评估:良好(<60.00 秒),中等(60.00-180.00 秒),较差(>180.00 秒)。 |
| 40 | + - 如果 Total_time 为“较差”,请重点说明和分析 First_token_latency、Total_tps 和 Avg_total_tokens 对 Total_time 的影响。 |
| 41 | + - Failure_request:如果存在失败的请求,请在“已识别问题”中指出。 |
| 42 | + - 如果缺少某个指标,则显示 N/A(不推断)。 |
| 43 | + - 输出内容应控制在 300 字以内,技术性强,并优先处理最严重的问题。 |
| 44 | +
|
| 45 | + 输出格式要求: |
| 46 | + ### 性能总结 |
| 47 | + [1-3 句总体评估,包括用户体验判断和主要瓶颈。] |
| 48 | +
|
| 49 | + ### 关键指标 |
| 50 | + | 指标 | 值(平均值/最大值) | 阈值/目标 | 结论 | |
| 51 | + |---|---|---|---| |
| 52 | + | 并发用户数 | N | — | — | |
| 53 | + | 首Token时延 (s) | X.XX | 良好 (<1.00 秒)、中等 (1.00-2.00 秒)、较差 (>2.00 秒) | 良好/中等/较差 | |
| 54 | + | 总时间 (s) | X.XX | 良好 (<60.00 秒)、中等 (60.00-180.00 秒)、较差 (>180.00 秒) | 良好/中等/较差 | |
| 55 | + | RPS(请求/秒)| X.XX | — | — | |
| 56 | + | Completion Tokens 吞吐量(Tokens/秒)| X.XX | — | — | |
| 57 | + | Total Tokens 吞吐量(Tokens/秒)| X.XX | — | — | |
| 58 | + | 平均每请求输出Token数量(Tokens/请求)| N | — | — | |
| 59 | + | 平均每请求总Token数量(Tokens/请求)| N | — | — | |
| 60 | + | 失败请求| N | — | — | |
| 61 | +
|
| 62 | + ### 已识别的问题 |
| 63 | + 1. [具有指标值和影响的最关键问题(如果有)] |
| 64 | + 2. [重点说明是否存在失败请求,并指引用户查看任务日志以获取具体的错误信息(如果有)] |
| 65 | + """ |
| 66 | + |
| 67 | + |
| 68 | +def get_analysis_prompt(language: str = "en") -> str: |
| 69 | + """ |
| 70 | + 根据语言获取相应的分析提示词 |
| 71 | +
|
| 72 | + Args: |
| 73 | + language: 语言代码,支持 'en'(英文)和 'zh'(中文) |
| 74 | +
|
| 75 | + Returns: |
| 76 | + str: 相应语言的分析提示词 |
| 77 | + """ |
| 78 | + if language == "zh": |
| 79 | + return ANALYSIS_PROMPT_CN |
| 80 | + else: |
| 81 | + return ANALYSIS_PROMPT_EN |
0 commit comments