|
14 | 14 | [1–3 sentence overall assessment, including UX judgment and the dominant bottleneck(s).] |
15 | 15 |
|
16 | 16 | ### Key Metrics |
17 | | - | Metric | Value(avg/max) | Threshold/Target | Verdict | |
| 17 | + | Metric | Description |Value(avg/max)| Verdict | |
18 | 18 | |---|---|---|---| |
19 | | - | Concurrent_users | N | — | — | |
20 | | - | First_token_latency(s) | X.XX | Good (<1.00s), Moderate (1.00–2.00s), Poor (>2.00s) | Good/Moderate/Poor | |
21 | | - | Total_time(s) | X.XX | Good (<60.00s), Moderate (60.00–180.00s), Poor (>180.00s) | Good/Moderate/Poor | |
22 | | - | RPS(req/s) | X.XX | — | — | |
23 | | - | Completion Tps(Tokens/s) | X.XX | — | — | |
24 | | - | Total Tps(Tokens/s) | X.XX | — | — | |
25 | | - | Avg_completion_tokens(Tokens/req) | N | — | — | |
26 | | - | Avg_total_tokens(Tokens/req) | N | — | — | |
27 | | - |Failure_request| N | — | — | |
| 19 | + | Concurrent_users |Number of simultaneous users accessing the system | N | — | |
| 20 | + | First_token_latency(s) |Time taken to receive the first token in response| X.XX | Good/Moderate/Poor | |
| 21 | + | Total_time(s) |Total time taken to complete all requests| X.XX | Good/Moderate/Poor | |
| 22 | + | RPS |Requests processed per second | X.XX | — | |
| 23 | + | Completion_tps |Tokens generated per second for completion only| X.XX | — | |
| 24 | + | Total_tps|Total tokens processed per second (including prompt and completion)| X.XX | — | |
| 25 | + | Avg_completion_tokens/req |Average number of completion tokens per request| X.XX | — | |
| 26 | + | Avg_total_tokens/req |Average total tokens (prompt + completion) per request| X.XX | — | |
| 27 | + | Failure_request|Number of failed requests| N | — | |
28 | 28 |
|
29 | 29 | ### Identified Issues |
30 | 30 | 1. [Most critical issue with metric value and impact, if any] |
|
37 | 37 | 规则: |
38 | 38 | - First_token_latency 评估:良好(<1.00 秒),中等(1.00-2.00 秒),较差(>2.00 秒)。 |
39 | 39 | - Total_time 评估:良好(<60.00 秒),中等(60.00-180.00 秒),较差(>180.00 秒)。 |
40 | | - - 如果 Total_time 为“较差”,请重点说明和分析 First_token_latency、Total_tps 和 Avg_total_tokens 对 Total_time 的影响。 |
| 40 | + - 如果 Total_time 为“较差”,请重点说明和分析 First_token_latency、Total_tps 和 Avg_total_tokens/req 对 Total_time 的影响。 |
41 | 41 | - Failure_request:如果存在失败的请求,请在“已识别问题”中指出。 |
42 | 42 | - 如果缺少某个指标,则显示 N/A(不推断)。 |
43 | 43 | - 输出内容应控制在 300 字以内,技术性强,并优先处理最严重的问题。 |
|
47 | 47 | [1-3 句总体评估,包括用户体验判断和主要瓶颈。] |
48 | 48 |
|
49 | 49 | ### 关键指标 |
50 | | - | 指标 | 值(平均值/最大值) | 阈值/目标 | 结论 | |
| 50 | + | 指标 |描述| 值(平均值/最大值)| 结论 | |
51 | 51 | |---|---|---|---| |
52 | | - | 并发用户数 | N | — | — | |
53 | | - | 首Token时延 (s) | X.XX | 良好 (<1.00 秒)、中等 (1.00-2.00 秒)、较差 (>2.00 秒) | 良好/中等/较差 | |
54 | | - | 总时间 (s) | X.XX | 良好 (<60.00 秒)、中等 (60.00-180.00 秒)、较差 (>180.00 秒) | 良好/中等/较差 | |
55 | | - | RPS(请求/秒)| X.XX | — | — | |
56 | | - | Completion Tokens 吞吐量(Tokens/秒)| X.XX | — | — | |
57 | | - | Total Tokens 吞吐量(Tokens/秒)| X.XX | — | — | |
58 | | - | 平均每请求输出Token数量(Tokens/请求)| N | — | — | |
59 | | - | 平均每请求总Token数量(Tokens/请求)| N | — | — | |
60 | | - | 失败请求| N | — | — | |
| 52 | + | Concurrent_users | 同时访问系统的用户数 | N | — | |
| 53 | + | First_token_latency(s) | 接收第一个token所需的时间| X.XX | 好/中等/差 | |
| 54 | + | Total_time(s) | 完成所有请求所需的总时间| X.XX | 好/中等/差 | |
| 55 | + | RPS | 每秒处理的请求数| X.XX | — | |
| 56 | + | Completion_tps | 每秒输出生成的token数| X.XX | — | |
| 57 | + | Total_tps| 每秒输入和输出总token总数| X.XX | — | |
| 58 | + | Avg_completion_tokens/req | 每个请求的平均输出token数| X.XX | — | |
| 59 | + | Avg_total_tokens/req |每个请求的平均输入和输出token总数| X.XX | — | |
| 60 | + | Failure_request|失败请求数| N | — | |
61 | 61 |
|
62 | | - ### 已识别的问题 |
| 62 | + ### 问题总结 |
63 | 63 | 1. [具有指标值和影响的最关键问题(如果有)] |
64 | 64 | 2. [重点说明是否存在失败请求,并指引用户查看任务日志以获取具体的错误信息(如果有)] |
65 | 65 | """ |
|
0 commit comments