feat:Add evals-related configurations and CI commandsAdd evals-relate…#29
feat:Add evals-related configurations and CI commandsAdd evals-relate…#29krislavten merged 2 commits intomainfrom
Conversation
…d configurations and CI commands
📊 Agent-aware 评估结果📊 评估结果
📈 任务得分明细
Generated by Agent-aware Evals |
|
@claude review this PR |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude review this PR |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude review this PR |
|
Claude finished @krislavten's task in 2m 19s —— View job 🔍 PR Review Complete
PR SummaryThis PR adds a comprehensive evaluation system for the Agent-aware project, implementing ~8,900 lines of new code across 52 files. The system follows best practices outlined in Anthropic's evaluation guide and includes:
Review FeedbackStrengths
Issues to Address1. Missing Error Handler in Process Spawn (Medium)File: The proc.on('error', (err) => {
if (!started) {
reject(new Error(`Dev server error: ${err.message}`));
}
// After started=true, errors are silently ignored
});Suggestion: Add logging for post-startup errors to aid debugging. 2. Duplicate
|
[2 tools called]
对比 main 分支的修改总结
提交记录
修改统计
主要修改内容
1. 新增评估系统 (
evals/)harness/graders/tasks/rubrics/2. CI/CD 配置
.github/workflows/evals.yml- GitHub Actions 评估流水线(441 行)3. 项目配置
package.json- 新增eval脚本和依赖vitest.evals.config.ts- 评估测试配置pnpm-lock.yaml- 依赖锁定4. 小修改
packages/sdk/src/index.ts- SDK 小改动packages/server/src/cli.ts- Server CLI 小改动评估任务(精简后 6 个)