Skip to content

Run AI evals on PRs and fix doc links #3

Merged
kfinkels merged 22 commits intomainfrom
eval_github_actions
Feb 24, 2026
Merged

Run AI evals on PRs and fix doc links #3
kfinkels merged 22 commits intomainfrom
eval_github_actions

Conversation

@kfinkels
Copy link
Collaborator

Summary

  • Add pull_request trigger to eval workflow (previously manual-only)
  • Fix broken link to deleted AI-EVALS-WORKPLAN.md in EVAL.md
  • Fix incorrect relative path in WORKFLOWS.md

@github-actions
Copy link

github-actions bot commented Jan 20, 2026

⚠️ No evaluation results found

Keren Finkelstein added 12 commits January 20, 2026 13:00
2. Rate limiting protection - run tests sequentially with delay
…tarting with 思考: (Chinese)

3. Only keep the actual spec/plan content for assertion checks
…on level where they were appearing, rather than only mentioning them in the general constraints section at the bottom.
…on level where they were appearing, rather than only mentioning them in the general constraints section at the bottom.
…on level where they were appearing, rather than only mentioning them in the general constraints section at the bottom.
@github-actions
Copy link

⚠️ No evaluation results found

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

⚠️ No evaluation results found

@github-actions
Copy link

github-actions bot commented Feb 19, 2026

📊 Eval Results

Overall: 10/25 tests passed (40%)

Metric Value
✅ Passed 10
❌ Failed 15
📈 Pass Rate 40%
🪙 Total Tokens 84,086
💾 Cached Tokens 0

❌ Failed Tests

  • Unknown (score: 0.89)
  • Unknown (score: 0.83)
  • Unknown (score: 0.80)
  • Unknown (score: 0.80)
  • Unknown (score: 0.83)
  • Unknown (score: 0.80)
  • Unknown (score: 0.78)
  • Unknown (score: 0.83)
  • Unknown (score: 0.80)
  • Unknown (score: 0.89)
  • Unknown (score: 0.57)
  • Unknown (score: 0.83)
  • Unknown (score: 0.83)
  • Unknown (score: 0.88)
  • Unknown (score: 0.80)

⚠️ Quality thresholds not met (target: 85%). Please review failures.

@kfinkels kfinkels merged commit 4007491 into main Feb 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant