Skip to content

feat: Enable Video Uploads for Multimodal AnalysisΒ #6144

@VooDisss

Description

@VooDisss

What specific problem does this solve?

Currently, Roo Code does not support video uploads, which limits users' ability to provide rich context for UI-related issues. Developers working on graphical user interfaces (GUIs) often encounter problems that are difficult to describe with text alone.

Who is affected: All Roo Code users, especially those developing applications with a GUI.
When this happens: When a developer needs to troubleshoot a complex UI bug or explain a visual interaction.
Current behavior: Users can only upload images.
Expected behavior: Users should be able to upload video files (e.g., screen recordings) for analysis.
Impact: This limitation causes frustration and wastes significant time, as developers struggle to articulate visual problems in writing.


πŸ› οΈ Contributing & Technical Analysis

βœ… I'm interested in implementing this feature
βœ… I understand this needs approval before implementation begins

πŸ” Comprehensive Technical Analysis

Implementation Target

The goal is to enable video uploads for multimodal analysis in Roo Code, specifically for models that support it (e.g., Gemini 2.5 Pro). This will allow users to provide video context for UI troubleshooting and other development tasks.

Affected Components

  • Primary File: webview-ui/src/components/chat/ChatView.tsx

Current Implementation Analysis

The codebase has foundational support for video handling. Recent commits have prepared the backend (gemini-format.ts) and UI components (ChatTextArea.tsx, MediaThumbnails.tsx) to process and display video files. The main blocker is the acceptedFileTypes logic in ChatView.tsx, which currently only allows image formats.

Proposed Implementation

Step 1: Update acceptedFileTypes

  • File: webview-ui/src/components/chat/ChatView.tsx
  • Changes: Modify the acceptedFileTypes useMemo hook to include video MIME types (e.g., mp4, mov) when a video-capable model is selected.
  • Rationale: This is the primary change required to enable the feature.

Testing Requirements

  • Unit Tests: No new unit tests are required as the change is a configuration update.
  • Integration Tests:
    • Verify that video files can be attached when a supported model is selected.
    • Confirm that video files are rejected for unsupported models.
  • Manual Tests:
    • Drag-and-drop and paste a video file to ensure it's accepted.
    • Send a prompt with a video to confirm the API call is successful and the response is relevant.

Performance Impact

  • Expected performance change: Neutral for the UI, but processing large videos into base64 may be resource-intensive.
  • Benchmarking needed: No.

Security Considerations

  • None.

Migration Strategy

  • Not applicable.

Rollback Plan

  • Revert the change in ChatView.tsx.

Dependencies and Breaking Changes

  • None.

Implementation Complexity

  • Estimated effort: Small
  • Risk level: Low

Acceptance Criteria

Given a user has selected a Gemini 2.5 Pro model,
When they drag and drop or paste a supported video file (e.g., MP4, MOV) into the chat text area,
Then the video file should be accepted and a video thumbnail should appear in the attachments area.

Given a user has attached a video file,
When they send a prompt asking for analysis of the video,
Then the message should be sent to the Gemini API with the video data correctly formatted.

Given the Gemini API processes the video successfully,
When the user receives a response,
Then the response should contain an accurate analysis of the video's content.

Given a user has selected a model that does not support video,
When they attempt to attach a video file,
Then the file should be rejected, and no thumbnail should appear.

Metadata

Metadata

Assignees

Labels

Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions