Skip to content

Comments

feat(stage-tamagotchi): vision#917

Draft
nekomeowww wants to merge 4 commits intomainfrom
neko/dev/vision
Draft

feat(stage-tamagotchi): vision#917
nekomeowww wants to merge 4 commits intomainfrom
neko/dev/vision

Conversation

@nekomeowww
Copy link
Member

@nekomeowww nekomeowww commented Jan 8, 2026

Description

Depends on #941

Linked Issues

Additional Context

@nekomeowww nekomeowww marked this pull request as draft January 8, 2026 19:27
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @nekomeowww, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for a new 'Vision' module within the application, providing the foundational capabilities for screen content analysis. It integrates system-level screen capture functionality with a user-facing interface for selecting and configuring vision models, and a developer tool for testing screen frame acquisition. The changes span across the Electron main process for core functionality, the renderer process for user interaction, and the shared UI components for state management and model cataloging.

Highlights

  • New Vision Module Implementation: Introduced a new 'Vision' module, enabling screen capture and integration with various vision models. This includes both Electron main process logic for screen source selection and a comprehensive frontend UI for configuration and testing.
  • Electron Main Process Enhancements: Added desktopCapturer and session imports to the Electron main process. A new invoke handler, modulesVisionPrepareScreenSourceSelection, was implemented to manage screen source selection for vision capabilities, automatically granting access to the first screen found.
  • Renderer Process UI and Routing: New routes and corresponding UI components have been added for /settings/modules/vision to configure vision providers and models, and /devtools/vision for a dedicated page to capture and inspect screen frames.
  • Vision Module State Management: A new Pinia store, useVisionStore, was created to manage the state of the Vision module, including the active provider, selected model, loading states, and error handling. It also integrates with the existing providers store to fetch available models.
  • Vision Model Catalog: A VISION_MODEL_CATALOG was introduced, defining various vision models with details such as ID, name, description, tags, and recommended use cases, supporting both listed and custom models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'vision' feature, complete with a screen capture devtool, settings pages for provider and model configuration, and a new Pinia store for state management. My review focuses on enhancing the robustness and correctness of this new functionality. I've identified a critical conflict with an existing feature, proposed improvements for error handling and resource management, and pointed out a state mutation anti-pattern and a logic bug within the new store.

Comment on lines +41 to +46
session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => {
desktopCapturer.getSources({ types: ['screen'] }).then((sources) => {
// Grant access to the first screen found.
callback({ video: sources[0], audio: 'loopback' })
})
}, { useSystemPicker: false })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a couple of issues with the implementation of setDisplayMediaRequestHandler:

  1. Critical Conflict: As noted in the TODO on line 40, this implementation overwrites any existing handler. This creates a direct conflict with the beat-sync feature, meaning only one can work at a time. This is a critical architectural issue that could lead to unpredictable behavior and should be resolved by implementing a centralized manager for this handler before this feature is enabled.

  2. Missing Error Handling: The call to desktopCapturer.getSources() lacks proper error handling. If the promise rejects or if no screen sources are found, it will lead to an unhandled promise rejection or a runtime error when accessing sources[0]. You should add a .catch() block and check if the sources array is not empty.

Here is a suggestion to make it more robust:

    session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => {
      desktopCapturer.getSources({ types: ['screen'] })
        .then((sources) => {
          if (sources?.length > 0) {
            // Grant access to the first screen found.
            callback({ video: sources[0], audio: 'loopback' })
          } 
          else {
            console.error('No screen sources found.')
            callback({}) // Deny the request
          }
        })
        .catch((error) => {
          console.error('Failed to get desktop sources:', error)
          callback({}) // Deny the request
        })
    }, { useSystemPicker: false })

<RadioCardManySelect
v-model="activeModel"
v-model:search-query="modelSearchQuery"
:items="providerModels.sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

You are sorting the providerModels array in-place. Since providerModels is a computed property that returns an array from the Pinia store state, this is a direct mutation of the state from a component, which is an anti-pattern and can lead to unpredictable behavior. You should create a shallow copy of the array before sorting it.

            :items="[...providerModels].sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)"

Comment on lines +57 to +61
async function loadModelsForProvider(provider: string) {
if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
await providersStore.fetchModelsForProvider(provider)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function incorrectly uses providerMetadata.value, which is a computed property based on activeProvider.value. However, the function receives a provider argument that might be different. This will lead to incorrect behavior when trying to load models for a provider that is not currently active. You should get the metadata for the provider passed as an argument.

Suggested change
async function loadModelsForProvider(provider: string) {
if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
await providersStore.fetchModelsForProvider(provider)
}
}
async function loadModelsForProvider(provider: string) {
const metadata = providersStore.providerMetadata[provider]
if (provider && metadata?.capabilities.listModels !== undefined) {
await providersStore.fetchModelsForProvider(provider)
}
}

Comment on lines +63 to +69
async function getModelsForProvider(provider: string) {
if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
return providersStore.getModelsForProvider(provider)
}

return []
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to loadModelsForProvider, this function incorrectly uses providerMetadata.value, which depends on activeProvider.value. It should use the provider argument to get the correct metadata to check for capabilities. This can lead to bugs where models for a non-active provider cannot be retrieved.

Suggested change
async function getModelsForProvider(provider: string) {
if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
return providersStore.getModelsForProvider(provider)
}
return []
}
async function getModelsForProvider(provider: string) {
const metadata = providersStore.providerMetadata[provider]
if (provider && metadata?.capabilities.listModels !== undefined) {
return providersStore.getModelsForProvider(provider)
}
return []
}

Comment on lines 29 to 78
async function captureFrame() {
errorMessage.value = ''
screenshotDataUrl.value = ''
isCapturing.value = true

try {
await prepareScreenSourceSelection()

const stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false })
const video = videoRef.value

if (!video) {
stream.getTracks().forEach(track => track.stop())
throw new Error('Video element not ready')
}

video.srcObject = stream
await video.play()

await new Promise<void>((resolve) => {
if (video.readyState >= 2) {
resolve()
return
}

video.onloadedmetadata = () => resolve()
})

const canvas = document.createElement('canvas')
canvas.width = video.videoWidth
canvas.height = video.videoHeight

const ctx = canvas.getContext('2d')
if (!ctx)
throw new Error('Failed to create canvas context')

ctx.drawImage(video, 0, 0, canvas.width, canvas.height)
screenshotDataUrl.value = canvas.toDataURL('image/png')

stream.getTracks().forEach(track => track.stop())
video.pause()
video.srcObject = null
}
catch (error) {
errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}`
}
finally {
isCapturing.value = false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The captureFrame function has duplicated logic for cleaning up the MediaStream. The stream tracks are stopped in two different places. This can be simplified by using a finally block to ensure resources are always released, making the code cleaner and more robust.

async function captureFrame() {
  errorMessage.value = ''
  screenshotDataUrl.value = ''
  isCapturing.value = true

  let stream: MediaStream | null = null
  try {
    await prepareScreenSourceSelection()

    stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false })
    const video = videoRef.value

    if (!video) {
      throw new Error('Video element not ready')
    }

    video.srcObject = stream
    await video.play()

    await new Promise<void>((resolve) => {
      if (video.readyState >= 2) {
        resolve()
        return
      }

      video.onloadedmetadata = () => resolve()
    })

    const canvas = document.createElement('canvas')
    canvas.width = video.videoWidth
    canvas.height = video.videoHeight

    const ctx = canvas.getContext('2d')
    if (!ctx)
      throw new Error('Failed to create canvas context')

    ctx.drawImage(video, 0, 0, canvas.width, canvas.height)
    screenshotDataUrl.value = canvas.toDataURL('image/png')

    video.pause()
    video.srcObject = null
  }
  catch (error) {
    errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}`
  }
  finally {
    stream?.getTracks().forEach(track => track.stop())
    isCapturing.value = false
  }
}

@sumimakito
Copy link
Member

Update: Currently working on bringing the screen capturing features to our repo with some code from electron-audio-loopback.

@Neko-233 Neko-233 mentioned this pull request Jan 10, 2026
62 tasks
@nekomeowww nekomeowww force-pushed the neko/dev/vision branch 2 times, most recently from e44caab to eaf2538 Compare January 12, 2026 19:35
@nekomeowww nekomeowww changed the title [DO NOT MERGE] feat(stage-tamagotchi): vision feat(stage-tamagotchi): vision Jan 12, 2026
@nekomeowww nekomeowww force-pushed the neko/dev/vision branch 2 times, most recently from d4b379c to 1cbec33 Compare January 14, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants