Skip to content

Commit ce2f934

Browse files
mini2smrubensroomotedaniel-lxsroomote[bot]
authored
Roo to main (#803)
* Default grok code fast to native tools (RooCodeInc#9717) * Bedrock native tool calling (RooCodeInc#9698) * Support tool calling in native ollama provider (RooCodeInc#9696) Co-authored-by: Roo Code <roomote@roocode.com> * feat: add native tool support for LiteLLM provider (RooCodeInc#9719) * fix: prevent navigation buttons from wrapping on smaller screens (RooCodeInc#9721) Co-authored-by: Roo Code <roomote@roocode.com> * chore: add changeset for v3.35.0 (RooCodeInc#9724) * Changeset version bump (RooCodeInc#9725) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * chore: bump version to v1.89.0 (RooCodeInc#9718) * fix: flush pending tool results before task delegation (RooCodeInc#9726) When tools are called in parallel (e.g., update_todo_list + new_task), the tool results accumulate in userMessageContent but aren't saved to API history until all tools complete. When new_task triggers delegation, the parent is disposed before these pending results are saved, causing 400 errors when the parent resumes (missing tool_result for tool_use). This fix: - Adds flushPendingToolResultsToHistory() method in Task.ts that saves pending userMessageContent to API history - Calls this method in delegateParentAndOpenChild() before disposing the parent task - Safe for both native/XML protocols and sequential/parallel execution (returns early if there's nothing to flush) * Better IPC error logging (RooCodeInc#9727) * chore: add changeset for v3.35.1 (RooCodeInc#9728) * Changeset version bump (RooCodeInc#9729) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Pass app version to provider (RooCodeInc#9730) * Allow models to contain default temperature (RooCodeInc#9734) * Look for a tag in the Roo provider to default the model to native tool calling (RooCodeInc#9735) * Assume all LiteLLM models support native tools (RooCodeInc#9736) * chore: add changeset for v3.35.2 (RooCodeInc#9737) * Changeset version bump (RooCodeInc#9738) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Merge remote-tracking branch 'upstream/main' into roo-to-main * Switch to new welcome view (RooCodeInc#9741) * web: Homepage changes (RooCodeInc#9675) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Add vendor confidentiality section to the system prompt for stealth models (RooCodeInc#9742) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * chore: add changeset for v3.35.3 (RooCodeInc#9743) * Changeset version bump (RooCodeInc#9745) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Refactor: Remove line_count parameter from write_to_file tool (RooCodeInc#9667) * fix: remove reasoning toggles for GLM-4.5 and GLM-4.6 on z.ai provider (RooCodeInc#9752) Co-authored-by: Roo Code <roomote@roocode.com> * fix: handle malformed native tool calls to prevent hanging (RooCodeInc#9758) Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * chore: add changeset for v3.35.4 (RooCodeInc#9763) * Changeset version bump (RooCodeInc#9764) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Convert the Roo provider tools for OpenAI (RooCodeInc#9769) * Update the evals keygen command (RooCodeInc#9754) * feat: Add provider routing selection for OpenRouter embeddings (RooCodeInc#9144) (RooCodeInc#9693) Co-authored-by: Sannidhya <sann@Sannidhyas-MacBook-Pro.local> * ux: Updates to CloudView (RooCodeInc#9776) * refactor: remove TabHeader and onDone callback from CloudView - Removed TabHeader component from CloudView as it is no longer needed - Removed onDone prop from CloudView component definition and usage - Updated all test files to reflect the removal of onDone prop - Kept Button import that was accidentally removed initially * Updates upsell copy to reflect today's product * Update webview-ui/src/components/cloud/CloudView.tsx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update webview-ui/src/i18n/locales/ko/cloud.json Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update webview-ui/src/i18n/locales/zh-CN/cloud.json Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Test fixes --------- Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Bruno Bergher <bruno@roocode.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update model key for minimax in MODEL_DEFAULTS (RooCodeInc#9778) Co-authored-by: Roo Code <roomote@roocode.com> * Release v3.35.5 (RooCodeInc#9781) * Changeset version bump (RooCodeInc#9783) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Use search_and_replace for minimax (RooCodeInc#9780) * fix: restore context when rewinding after condense (RooCodeInc#8295) (RooCodeInc#9665) * fix: remove omission detection logic to fix false positives (RooCodeInc#9787) Co-authored-by: Roo Code <roomote@roocode.com> * Fix Vercel AI Gateway model fetching (RooCodeInc#9791) Co-authored-by: Roo Code <roomote@roocode.com> * refactor: remove insert_content tool (RooCodeInc#9751) Co-authored-by: Roo Code <roomote@roocode.com> * feat: add reasoning_details support to Roo provider (RooCodeInc#9796) - Add currentReasoningDetails accumulator to track reasoning details - Add getReasoningDetails() method to expose accumulated details - Handle reasoning_details array format in streaming responses - Accumulate reasoning details by type-index key - Support reasoning.text, reasoning.summary, and reasoning.encrypted types - Maintain backward compatibility with legacy reasoning format - Follows same pattern as OpenRouter provider Co-authored-by: Roo Code <roomote@roocode.com> * chore: hide parallel tool calls experiment and disable feature (RooCodeInc#9798) * Update next.js (RooCodeInc#9799) * Fix the download count on the homepage (RooCodeInc#9807) * Default to native tools for all models in the Roo provider (RooCodeInc#9811) Co-authored-by: Roo Code <roomote@roocode.com> * Fix/cerebras conservative max tokens (RooCodeInc#9804) Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Release v3.36.0 (RooCodeInc#9814) * Changeset version bump (RooCodeInc#9828) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Merge remote-tracking branch 'upstream/main' into roo-to-main * ux: improved error messages and documentation links (RooCodeInc#9777) * Minor ui tweaks * Basic setup for richer API request errors * Better errors messages and contact link * i18n * Update webview-ui/src/i18n/locales/en/chat.json Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Update webview-ui/src/i18n/locales/en/chat.json Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Empty better than null * Update webview-ui/src/i18n/locales/nl/chat.json Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * i18n * Start retryAttempt at 1 * Reverse retryAttempt number, just ommit it from the message --------- Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * web: New Pricing Page (RooCodeInc#9821) * Removes Pro, restructures pricing page * Solves provider/credits * Update apps/web-roo-code/src/app/pricing/page.tsx Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Updates agent landing pages to not mention a trial that doesn't exist * Updates agent-specific landing pages to reflect new home and trial * Indicate the agent landing page the user came from * Clean up the carousel --------- Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Ignore input to the execa terminal process (RooCodeInc#9827) * fix: Overly round follow-up question suggestions (RooCodeInc#9829) Not that rounded * Always enabled reasoning for models that require it (RooCodeInc#9836) * ChatView: smoother stick-to-bottom during streaming (RooCodeInc#8999) * feat: add symlink support for slash commands in .roo/commands folder (RooCodeInc#9838) Co-authored-by: Roo Code <roomote@roocode.com> * fix: sanitize reasoning_details IDs to remove invalid characters (RooCodeInc#9839) * feat(evals-ui): Add filtering, bulk delete, tool consolidation, and run notes (RooCodeInc#9837) * Be safer about large file reads (RooCodeInc#9843) validateFileTokenBudget wasn't being called considering the output budget. * Revert "fix: sanitize reasoning_details IDs to remove invalid characters" (RooCodeInc#9846) * Merge remote-tracking branch 'upstream/main' into roo-to-main * Exclude the ID from Roo reasoning details (RooCodeInc#9847) * fix: prevent cascading truncation loop by only truncating visible messages (RooCodeInc#9844) * FIX + feat: add MessageManager layer for centralized history coordination (RooCodeInc#9842) * feat(web-evals): add multi-model launch and UI improvements (RooCodeInc#9845) Co-authored-by: Roo Code <roomote@roocode.com> * Revert "Exclude the ID from Roo reasoning details" (RooCodeInc#9850) * fix: handle unknown/invalid native tool calls to prevent extension freeze (RooCodeInc#9834) * feat: add gpt-5.1-codex-max model to OpenAI provider (RooCodeInc#9848) * Delete .changeset/symlink-commands.md * Release v3.36.1 (RooCodeInc#9851) * Changeset version bump (RooCodeInc#9840) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * feat: add dynamic settings support for Roo models from API (RooCodeInc#9852) * chore: restrict gpt-5 tool set to apply_patch (RooCodeInc#9853) * Fix chutes model fetching (RooCodeInc#9854) * Release v3.36.2 (RooCodeInc#9855) * Changeset version bump (RooCodeInc#9856) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Better error logs for parseToolCall exceptions (RooCodeInc#9857) * (update): Add DeepSeek V3-2 Support for Baseten Provider (RooCodeInc#9861) Co-authored-by: AlexKer <AlexKer@users.noreply.github.com> * web: Product pages (RooCodeInc#9865) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * fix: sanitize removed/invalid API providers to prevent infinite loop (RooCodeInc#9869) * Update xAI models catalog (RooCodeInc#9872) * refactor: decouple tools from system prompt (RooCodeInc#9784) * Stop making count_tokens requests (RooCodeInc#9884) * Default to using native tools when supported on openrouter (RooCodeInc#9878) * feat: change defaultToolProtocol default from xml to native (RooCodeInc#9892) * feat: change defaultToolProtocol to default to native instead of xml * fix: add missing getMcpHub mock to Subtask Rate Limiting tests --------- Co-authored-by: Roo Code <roomote@roocode.com> * Refactor: Unified context-management architecture with improved UX (RooCodeInc#9795) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Make eval runs deleteable (RooCodeInc#9909) * fix: add Kimi, MiniMax, and Qwen model configurations for Bedrock (RooCodeInc#9905) * fix: add Kimi, MiniMax, and Qwen model configurations for Bedrock - Add moonshot.kimi-k2-thinking with 32K max tokens and 256K context - Add minimax.minimax-m2 with 16K max tokens and 230K context - Add qwen.qwen3-next-80b-a3b with 8K max tokens and 262K context - Add qwen.qwen3-coder-480b-a35b-v1:0 with 8K max tokens and 262K context All models configured with native tool support and appropriate pricing. Fixes RooCodeInc#9902 * fix: add preserveReasoning flag and update Kimi K2 context window - Added preserveReasoning: true to moonshot.kimi-k2-thinking model - Added preserveReasoning: true to minimax.minimax-m2 model - Updated Kimi K2 context window from 256_000 to 262_144 These changes ensure: 1. Reasoning traces are properly preserved for both models 2. Roo correctly recognizes task completion 3. Tool calls within reasoning traces are handled appropriately 4. Context window matches AWS Console specification * fix: update MiniMax M2 context window to 196_608 for Bedrock Based on AWS CLI testing, the actual context window limit for MiniMax M2 on Bedrock is 196,608 tokens, not 230,000 as initially configured. * Update packages/types/src/providers/bedrock.ts Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> --------- Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * fix: use foreground color for context-management icons (RooCodeInc#9912) * feat: add xhigh reasoning effort for gpt-5.1-codex-max (RooCodeInc#9900) * feat: add xhigh reasoning effort for gpt-5.1-codex-max * fix: Address openai-native.spec.ts test failure * chore: Localisation of 'Extra high' * chore: revert unrelated CustomModesManager refactoring --------- Co-authored-by: Hannes Rudolph <hrudolph@gmail.com> * feat: add search_replace native tool for single-replacement operations (RooCodeInc#9918) Adds a new search_replace tool that performs a single search and replace operation on a file, requiring the old_string to uniquely identify the target text with 3-5 lines of context. Parameters: - file_path: Path to file (relative or absolute) - old_string: Text to find (must be unique in file) - new_string: Replacement text (must differ from old_string) * Improve cloud job error logging for RCC provider errors (RooCodeInc#9924) * feat: configure tool preferences for xAI models (RooCodeInc#9923) * fix: process finish_reason to emit tool_call_end events (RooCodeInc#9927) * fix: suppress 'ask promise was ignored' error in handleError (RooCodeInc#9914) * fix: exclude apply_diff from native tools when diffEnabled is false (RooCodeInc#9920) Co-authored-by: Roo Code <roomote@roocode.com> * Try to make OpenAI errors more useful (RooCodeInc#9639) * refactor: consolidate ThinkingBudget components and fix disable handling (RooCodeInc#9930) * Add timeout to OpenAI Compatible Provider Client (RooCodeInc#9898) * fix: add finish_reason processing to xai.ts provider (RooCodeInc#9929) * Remove defaultTemperature from Roo provider configuration (RooCodeInc#9932) Co-authored-by: Roo Code <roomote@roocode.com> * feat: forbid time estimates in architect mode (RooCodeInc#9931) Co-authored-by: Roo Code <roomote@roocode.com> * feat: streaming tool stats + token usage throttling (RooCodeInc#9926) Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * feat: Make Architect save to `/plans` and gitignore it (RooCodeInc#9944) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: Roo Code <roomote@roocode.com> * feat: add announcement support CTA and social icons (RooCodeInc#9945) * fix: display actual API error message instead of generic text on retry (RooCodeInc#9954) * feat(roo): add versioned settings support with minPluginVersion gating (RooCodeInc#9934) * Revert "feat: change defaultToolProtocol default from xml to native" (RooCodeInc#9956) * fix: return undefined instead of 0 for disabled API timeout (RooCodeInc#9960) * feat(deepseek): update DeepSeek models to V3.2 with new pricing (RooCodeInc#9962) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Add a way to save screenshots from the browser tool (RooCodeInc#9963) * Add a way to save screenshots from the browser tool * fix: use cross-platform paths in BrowserSession screenshot tests * fix: validate screenshot paths to prevent filesystem escape --------- Co-authored-by: Roo Code <roomote@roocode.com> * Tweaks to baseten model definitions (RooCodeInc#9866) * fix: always show tool protocol selector for openai-compatible (RooCodeInc#9966) * feat: add API error telemetry to OpenRouter provider (RooCodeInc#9953) Co-authored-by: Roo Code <roomote@roocode.com> * fix: validate and fix tool_result IDs before API requests (RooCodeInc#9952) Co-authored-by: cte <cestreich@gmail.com> Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Hannes Rudolph <hrudolph@gmail.com> * fix: respect explicit supportsReasoningEffort array values (RooCodeInc#9970) * v3.36.3 (RooCodeInc#9972) * fix(activate): unify webview panel identifier to use consistent tabPanelId * feat(gemini): add minimal and medium reasoning effort levels (RooCodeInc#9973) Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: cte <cestreich@gmail.com> * Delete changeset files (RooCodeInc#9977) * Add missing release notes for v3.36.3 (RooCodeInc#9979) * feat: add error details modal with on-demand display (RooCodeInc#9985) * feat: add error details modal with on-demand display - Add errorDetails prop to ErrorRow component - Show Info icon on hover in error header when errorDetails is provided - Display detailed error message in modal dialog on Info icon click - Add Copy to Clipboard button in error details modal - Update generic error case to show localized message with details on demand - Add i18n translations for error details UI * UI Tweaks * Properly handles error details * i18n * Lighter visual treatment for errors --------- Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Bruno Bergher <bruno@roocode.com> * Fix: Correct TODO list display order in chat view (ROO-107) (RooCodeInc#9991) Co-authored-by: Roo Code <roomote@roocode.com> * fix: prevent premature rawChunkTracker clearing for MCP tools (RooCodeInc#9993) * fix: filter out 429 rate limit errors from API error telemetry (RooCodeInc#9987) Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: cte <cestreich@gmail.com> * Release v3.36.4 (RooCodeInc#9994) * Changeset version bump (RooCodeInc#9995) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> * feat(telemetry): add app version to exception captures and filter 402 errors (RooCodeInc#9996) Co-authored-by: cte <cestreich@gmail.com> * Remove Glama provider (RooCodeInc#9801) * @roo-code/types v1.90.0 (RooCodeInc#9998) * fix: apply versioned settings on nightly builds (RooCodeInc#9997) * feat: add toggle for Enter key behavior in chat input (RooCodeInc#10002) * chore: remove list_code_definition_names tool (RooCodeInc#10005) Co-authored-by: cte <cestreich@gmail.com> * Update roomotes.yml (RooCodeInc#10008) * fix: add general API endpoints for Z.ai provider (RooCodeInc#9894) Co-authored-by: Roo Code <roomote@roocode.com> * fix: handle empty Gemini responses and reasoning loops (RooCodeInc#10007) * fix: add missing tool_result blocks to prevent API errors (RooCodeInc#10015) * feat: add gpt-5.2 model to openai-native provider (RooCodeInc#10024) * test: update built-in commands count to 9 * fix: filter orphaned tool_results when more results than tool_uses (RooCodeInc#10027) * Release v3.36.5 (RooCodeInc#10029) * Changeset version bump (RooCodeInc#10032) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> * fix: merge settings and versionedSettings for Roo provider models (RooCodeInc#10030) * Revert "fix: merge settings and versionedSettings for Roo provider models" (RooCodeInc#10034) * Revert the 3.6.5 release (we halted it) (RooCodeInc#10036) * Release v3.36.5 (RooCodeInc#10037) * Changeset version bump (RooCodeInc#10038) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> * test: adjust terminal count limits in TerminalRegistry tests * ux: improve auto-approve timer visibility in follow-up suggestions (RooCodeInc#10048) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: Roo Code <roomote@roocode.com> * fix: cancel auto-approval timeout when user starts typing (RooCodeInc#9937) Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * fix: extract raw error message from OpenRouter metadata (RooCodeInc#10039) OpenRouter wraps upstream provider errors in a generic message but includes the actual error in metadata.raw. This change: - Adds OpenRouterErrorResponse interface for proper typing - Creates handleStreamingError() helper for DRY error handling - Extracts metadata.raw for actionable error messages in PostHog - Includes nested error structure so getErrorMessage() can extract raw message Before: PostHog receives '400 Provider returned error' (generic) After: PostHog receives 'Model xyz not found' (actionable) This enables proper error tracking and debugging via PostHog telemetry. * feat: add tool alias support for model-specific tool customization (RooCodeInc#9989) Co-authored-by: Hannes Rudolph <hrudolph@gmail.com> * fix: show tool protocol dropdown for LiteLLM provider (RooCodeInc#10053) * feat: add WorkspaceTaskVisibility type for organization cloud settings (RooCodeInc#10020) * feat: add WorkspaceTaskVisibility type and workspaceTaskVisibility property to OrganizationCloudSettings * refactor: create workspaceTaskVisibilitySchema and derive WorkspaceTaskVisibility type from it --------- Co-authored-by: Roo Code <roomote@roocode.com> * Release: v1.91.0 (RooCodeInc#10055) chore: bump version to v1.91.0 * feat: sanitize MCP server/tool names for API compatibility (RooCodeInc#10054) * Release v3.36.6 (RooCodeInc#10057) * Changeset version bump (RooCodeInc#10058) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> * fix: use JavaScript-based hover for checkpoint menu visibility (RooCodeInc#10056) * feat: remove auto-approve toggles for to-do and retry actions (RooCodeInc#10062) * feat(openrouter): add improvements to openrouter provider (RooCodeInc#10082) * feat: Add Amazon Nova 2 Lite model to Bedrock provider (RooCodeInc#9830) Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * feat: add AWS Bedrock service tier support (RooCodeInc#9955) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * Capture more of OpenRouter's provider specific error details (RooCodeInc#10073) * Capture more of OpenRouter's provider specific error details * Actually match the openrouter structure * feat(web-evals): improve run logs and formatters (RooCodeInc#10081) * Move isToolAllowedForMode out of shared directory (RooCodeInc#10089) * chore: add changeset for v3.36.7 (RooCodeInc#10091) * Changeset version bump (RooCodeInc#10092) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> * fix: prevent duplicate MCP tools error by deduplicating servers at source (RooCodeInc#10096) * feat: add metadata to error details dialog (RooCodeInc#10050) * feat: add metadata to error details dialog - Prepends extension version, provider, model, and repository info to error details - Helps users provide better bug reports with context - Uses useExtensionState and useSelectedModel hooks for data * Tweaks --------- Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Bruno Bergher <bruno@roocode.com> * web: Fixes link to provider pricing page (RooCodeInc#10107) * feat(read-file): implement incremental token-budgeted file reading (RooCodeInc#10052) * Add config to control public sharing (RooCodeInc#10105) Co-authored-by: Roo Code <roomote@roocode.com> * Release: v1.92.0 (RooCodeInc#10116) * Remove the description from bedrock service tiers (RooCodeInc#10118) * feat: remove strict ARN validation for Bedrock custom ARN users (RooCodeInc#10110) Co-authored-by: Roo Code <roomote@roocode.com> * fix: prevent race condition from deleting wrong API messages (RooCodeInc#10113) Co-authored-by: daniel-lxs <ricciodaniel98@gmail.com> * feat(anthropic): enable native tools by default and add telemetry tracking (RooCodeInc#10021) * feat: enable native tools by default for multiple providers (RooCodeInc#10059) * Release v3.36.8 (RooCodeInc#10119) * fix: add additionalProperties: false to nested MCP tool schemas (RooCodeInc#10109) * fix: normalize tool call IDs for cross-provider compatibility via OpenRouter (RooCodeInc#10102) * feat: add full error details to streaming failure dialog (RooCodeInc#10131) Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: cte <cestreich@gmail.com> * fix: validate tool_result IDs in delegation resume flow (RooCodeInc#10135) * feat(evals): improve evals UI with tool groups and duration fix (RooCodeInc#10133) Co-authored-by: Roo Code <roomote@roocode.com> * Changeset version bump (RooCodeInc#10120) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Release v3.36.9 (RooCodeInc#10138) * Changeset version bump (RooCodeInc#10137) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> * fix: correct token counting for context truncation display (RooCodeInc#9961) * feat(deepseek): implement interleaved thinking mode for deepseek-reasoner (RooCodeInc#9969) --------- Co-authored-by: Matt Rubens <mrubens@users.noreply.github.com> Co-authored-by: Roo Code <roomote@roocode.com> Co-authored-by: Daniel <57051444+daniel-lxs@users.noreply.github.com> Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Chris Estreich <cestreich@gmail.com> Co-authored-by: Bruno Bergher <bruno@roocode.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: Hannes Rudolph <hrudolph@gmail.com> Co-authored-by: SannidhyaSah <sah_sannidhya@outlook.com> Co-authored-by: Sannidhya <sann@Sannidhyas-MacBook-Pro.local> Co-authored-by: John Richmond <5629+jr@users.noreply.github.com> Co-authored-by: Seb Duerr <sebastian.duerr@cerebras.net> Co-authored-by: Alex Ker <thealexker@gmail.com> Co-authored-by: AlexKer <AlexKer@users.noreply.github.com> Co-authored-by: Andrew Ginns <ginns.aw@gmail.com> Co-authored-by: Dennise Bartlett <bartlett.dc.1@gmail.com> Co-authored-by: daniel-lxs <ricciodaniel98@gmail.com>
1 parent 9a2cd4c commit ce2f934

File tree

70 files changed

+2589
-365
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+2589
-365
lines changed

apps/web-evals/src/app/runs/[id]/run.tsx

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,15 @@ export function Run({ run }: { run: Run }) {
321321
void usageUpdatedAt
322322
const metrics: Record<number, TaskMetrics> = {}
323323

324+
// Helper to calculate duration from database timestamps when streaming duration
325+
// is unavailable (e.g., page was loaded after TaskStarted event was published)
326+
const calculateDurationFromTimestamps = (task: TaskWithMetrics): number => {
327+
if (!task.startedAt) return 0
328+
const startTime = new Date(task.startedAt).getTime()
329+
const endTime = task.finishedAt ? new Date(task.finishedAt).getTime() : Date.now()
330+
return endTime - startTime
331+
}
332+
324333
tasks?.forEach((task) => {
325334
const streamingUsage = tokenUsage.get(task.id)
326335
const dbMetrics = task.taskMetrics
@@ -331,26 +340,54 @@ export function Run({ run }: { run: Run }) {
331340
// Check if DB metrics have meaningful values (not just default/empty)
332341
const dbHasData = dbMetrics && (dbMetrics.tokensIn > 0 || dbMetrics.tokensOut > 0 || dbMetrics.cost > 0)
333342
if (dbHasData) {
334-
metrics[task.id] = dbMetrics
343+
// If DB duration is 0 but we have timestamps, calculate from timestamps
344+
const duration = dbMetrics.duration || calculateDurationFromTimestamps(task)
345+
metrics[task.id] = { ...dbMetrics, duration }
335346
} else if (streamingUsage) {
336347
// Fall back to streaming values if DB is empty/stale
348+
// Use streaming duration, or calculate from timestamps if not available
349+
const duration = streamingUsage.duration || calculateDurationFromTimestamps(task)
337350
metrics[task.id] = {
338351
tokensIn: streamingUsage.totalTokensIn,
339352
tokensOut: streamingUsage.totalTokensOut,
340353
tokensContext: streamingUsage.contextTokens,
341-
duration: streamingUsage.duration ?? 0,
354+
duration,
342355
cost: streamingUsage.totalCost,
343356
}
357+
} else {
358+
// Task finished but no DB metrics and no streaming data
359+
// (e.g., page loaded after task completed, metrics not persisted)
360+
// Still provide duration calculated from timestamps
361+
metrics[task.id] = {
362+
tokensIn: 0,
363+
tokensOut: 0,
364+
tokensContext: 0,
365+
duration: calculateDurationFromTimestamps(task),
366+
cost: 0,
367+
}
344368
}
345369
} else if (streamingUsage) {
346370
// For running tasks, use streaming values
371+
// Use streaming duration, or calculate from task.startedAt if not available
372+
// (happens when page loads after TaskStarted event was already published)
373+
const duration = streamingUsage.duration || calculateDurationFromTimestamps(task)
347374
metrics[task.id] = {
348375
tokensIn: streamingUsage.totalTokensIn,
349376
tokensOut: streamingUsage.totalTokensOut,
350377
tokensContext: streamingUsage.contextTokens,
351-
duration: streamingUsage.duration ?? 0,
378+
duration,
352379
cost: streamingUsage.totalCost,
353380
}
381+
} else if (task.startedAt) {
382+
// Task has started (has startedAt in DB) but no streaming data yet
383+
// This can happen when page loads after TaskStarted but before TokenUsageUpdated
384+
metrics[task.id] = {
385+
tokensIn: 0,
386+
tokensOut: 0,
387+
tokensContext: 0,
388+
duration: calculateDurationFromTimestamps(task),
389+
cost: 0,
390+
}
354391
}
355392
})
356393

apps/web-evals/src/components/home/run.tsx

Lines changed: 71 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,22 @@ import {
4444
ScrollArea,
4545
} from "@/components/ui"
4646

47+
// Tool group type (same as in runs.tsx)
48+
type ToolGroup = {
49+
id: string
50+
name: string
51+
icon: string
52+
tools: string[]
53+
}
54+
4755
type RunProps = {
4856
run: EvalsRun
4957
taskMetrics: EvalsTaskMetrics | null
5058
toolColumns: ToolName[]
51-
consolidatedToolColumns: string[]
59+
toolGroups: ToolGroup[]
5260
}
5361

54-
export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }: RunProps) {
62+
export function Run({ run, taskMetrics, toolColumns, toolGroups }: RunProps) {
5563
const router = useRouter()
5664
const [deleteRunId, setDeleteRunId] = useState<number>()
5765
const [showSettings, setShowSettings] = useState(false)
@@ -143,6 +151,62 @@ export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }:
143151
[router, run.id],
144152
)
145153

154+
// Helper to render a tool group cell
155+
const renderToolGroupCell = (group: ToolGroup) => {
156+
if (!taskMetrics?.toolUsage) {
157+
return <span className="text-muted-foreground">-</span>
158+
}
159+
160+
let totalAttempts = 0
161+
let totalFailures = 0
162+
const breakdown: Array<{ tool: string; attempts: number; rate: string }> = []
163+
164+
for (const toolName of group.tools) {
165+
const usage = taskMetrics.toolUsage[toolName as ToolName]
166+
if (usage) {
167+
totalAttempts += usage.attempts
168+
totalFailures += usage.failures
169+
const rate =
170+
usage.attempts > 0
171+
? `${Math.round(((usage.attempts - usage.failures) / usage.attempts) * 100)}%`
172+
: "0%"
173+
breakdown.push({ tool: toolName, attempts: usage.attempts, rate })
174+
}
175+
}
176+
177+
if (totalAttempts === 0) {
178+
return <span className="text-muted-foreground">-</span>
179+
}
180+
181+
const successRate = ((totalAttempts - totalFailures) / totalAttempts) * 100
182+
const rateColor =
183+
successRate === 100 ? "text-muted-foreground" : successRate >= 80 ? "text-yellow-500" : "text-red-500"
184+
185+
return (
186+
<Tooltip>
187+
<TooltipTrigger>
188+
<div className="flex flex-col items-center">
189+
<span className="font-medium">{totalAttempts}</span>
190+
<span className={rateColor}>{Math.round(successRate)}%</span>
191+
</div>
192+
</TooltipTrigger>
193+
<TooltipContent>
194+
<div className="text-xs">
195+
<div className="font-semibold mb-1">{group.name}</div>
196+
{breakdown.map(({ tool, attempts, rate }) => (
197+
<div key={tool} className="flex justify-between gap-4">
198+
<span>{tool}:</span>
199+
<span>
200+
{attempts} ({rate})
201+
</span>
202+
</div>
203+
))}
204+
</div>
205+
</TooltipContent>
206+
</Tooltip>
207+
)
208+
}
209+
146210
return (
147211
<>
148212
<TableRow className="cursor-pointer hover:bg-muted/50" onClick={handleRowClick}>
@@ -170,68 +234,12 @@ export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }:
170234
</div>
171235
)}
172236
</TableCell>
173-
{consolidatedToolColumns.length > 0 && (
174-
<TableCell className="text-xs text-center">
175-
{taskMetrics?.toolUsage ? (
176-
(() => {
177-
// Calculate aggregated stats for consolidated tools
178-
let totalAttempts = 0
179-
let totalFailures = 0
180-
const breakdown: Array<{ tool: string; attempts: number; rate: string }> = []
181-
182-
for (const toolName of consolidatedToolColumns) {
183-
const usage = taskMetrics.toolUsage[toolName as ToolName]
184-
if (usage) {
185-
totalAttempts += usage.attempts
186-
totalFailures += usage.failures
187-
const rate =
188-
usage.attempts > 0
189-
? `${Math.round(((usage.attempts - usage.failures) / usage.attempts) * 100)}%`
190-
: "0%"
191-
breakdown.push({ tool: toolName, attempts: usage.attempts, rate })
192-
}
193-
}
194-
195-
const consolidatedRate =
196-
totalAttempts > 0 ? ((totalAttempts - totalFailures) / totalAttempts) * 100 : 100
197-
const rateColor =
198-
consolidatedRate === 100
199-
? "text-muted-foreground"
200-
: consolidatedRate >= 80
201-
? "text-yellow-500"
202-
: "text-red-500"
203-
204-
return totalAttempts > 0 ? (
205-
<Tooltip>
206-
<TooltipTrigger>
207-
<div className="flex flex-col items-center">
208-
<span className="font-medium">{totalAttempts}</span>
209-
<span className={rateColor}>{Math.round(consolidatedRate)}%</span>
210-
</div>
211-
</TooltipTrigger>
212-
<TooltipContent>
213-
<div className="text-xs">
214-
<div className="font-semibold mb-1">Consolidated Tools:</div>
215-
{breakdown.map(({ tool, attempts, rate }) => (
216-
<div key={tool} className="flex justify-between gap-4">
217-
<span>{tool}:</span>
218-
<span>
219-
{attempts} ({rate})
220-
</span>
221-
</div>
222-
))}
223-
</div>
224-
</TooltipContent>
225-
</Tooltip>
226-
) : (
227-
<span className="text-muted-foreground">-</span>
228-
)
229-
})()
230-
) : (
231-
<span className="text-muted-foreground">-</span>
232-
)}
237+
{/* Tool Group Columns */}
238+
{toolGroups.map((group) => (
239+
<TableCell key={group.id} className="text-xs text-center">
240+
{renderToolGroupCell(group)}
233241
</TableCell>
234-
)}
242+
))}
235243
{toolColumns.map((toolName) => {
236244
const usage = taskMetrics?.toolUsage?.[toolName]
237245
const successRate =

0 commit comments

Comments
 (0)