Skip to content

Stabilize disabled-tool real API CI test#587

Draft
cursor[bot] wants to merge 1 commit intomasterfrom
cursor/ci-pipeline-failure-c8fa
Draft

Stabilize disabled-tool real API CI test#587
cursor[bot] wants to merge 1 commit intomasterfrom
cursor/ci-pipeline-failure-c8fa

Conversation

@cursor
Copy link
Copy Markdown

@cursor cursor bot commented Apr 1, 2026

Summary

The failing CI check was e2e-tool-config-real-apis. After reviewing the failed job log for job 69552028161, the actual failure was not the Node 20 deprecation warning in the summary output; it was a flaky assertion in e2e/tests/tool-config/real-api/disabled-tool.spec.ts.

This PR updates that real-API Playwright test to assert the behavior we actually need to guarantee:

  • read_post is filtered out of the embedded tool list
  • the disabled Read Post tool does not surface as a tool invocation in the RHS

It no longer fails when a real provider answers by using other still-enabled tools (for example channel-history retrieval), which is a valid outcome and was causing the OpenAI variant to fail intermittently.

Validation:

  • Reviewed failed CI log with gh run view --job 69552028161 --log-failed --repo mattermost/mattermost-plugin-agents
  • Confirmed there was no existing open PR for branch cursor/ci-pipeline-failure-c8fa
  • cd e2e && npx playwright test tests/tool-config/real-api/disabled-tool.spec.ts --project=chromium --list

Ticket Link

None

Screenshots

None

Release Note

NONE
Open in Web View Automation 

Co-authored-by: Christopher Speller <crspeller@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 18/19 tests passed (94.7%)

Provider Total Passed Failed Pass Rate
⚠️ OPENAI 19 18 1 94.7%

❌ Failed Evaluations

Show 1 failures

OPENAI

1. TestReactEval/[openai]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text "smile_cat", which is neither a cat emoji (e.g., 🐱/😺) nor a heart/love emoji (e.g., ❤️/😍).

Anthropic

Overall: 19/19 tests passed (100.0%)

Provider Total Passed Failed Pass Rate
✅ ANTHROPIC 19 19 0 100.0%

This comment was automatically generated by the eval CI pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant