feat(stage-tamagotchi): vision by nekomeowww · Pull Request #917 · moeru-ai/airi

nekomeowww · 2026-01-08T19:27:02Z

Description

Depends on #941

Linked Issues

Additional Context

gemini-code-assist · 2026-01-08T19:27:25Z

Summary of Changes

Hello @nekomeowww, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for a new 'Vision' module within the application, providing the foundational capabilities for screen content analysis. It integrates system-level screen capture functionality with a user-facing interface for selecting and configuring vision models, and a developer tool for testing screen frame acquisition. The changes span across the Electron main process for core functionality, the renderer process for user interaction, and the shared UI components for state management and model cataloging.

Highlights

New Vision Module Implementation: Introduced a new 'Vision' module, enabling screen capture and integration with various vision models. This includes both Electron main process logic for screen source selection and a comprehensive frontend UI for configuration and testing.
Electron Main Process Enhancements: Added desktopCapturer and session imports to the Electron main process. A new invoke handler, modulesVisionPrepareScreenSourceSelection, was implemented to manage screen source selection for vision capabilities, automatically granting access to the first screen found.
Renderer Process UI and Routing: New routes and corresponding UI components have been added for /settings/modules/vision to configure vision providers and models, and /devtools/vision for a dedicated page to capture and inspect screen frames.
Vision Module State Management: A new Pinia store, useVisionStore, was created to manage the state of the Vision module, including the active provider, selected model, loading states, and error handling. It also integrates with the existing providers store to fetch available models.
Vision Model Catalog: A VISION_MODEL_CATALOG was introduced, defining various vision models with details such as ID, name, description, tags, and recommended use cases, supporting both listed and custom models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-01-08T19:27:28Z

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name	Link
🔭 Waiting for approval	For maintainers, approve here

Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

gemini-code-assist

Code Review

This pull request introduces a new 'vision' feature, complete with a screen capture devtool, settings pages for provider and model configuration, and a new Pinia store for state management. My review focuses on enhancing the robustness and correctness of this new functionality. I've identified a critical conflict with an existing feature, proposed improvements for error handling and resource management, and pointed out a state mutation anti-pattern and a logic bug within the new store.

gemini-code-assist · 2026-01-08T19:29:55Z

apps/stage-tamagotchi/src/main/windows/settings/rpc/index.electron.ts

+    session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => {
+      desktopCapturer.getSources({ types: ['screen'] }).then((sources) => {
+        // Grant access to the first screen found.
+        callback({ video: sources[0], audio: 'loopback' })
+      })
+    }, { useSystemPicker: false })


There are a couple of issues with the implementation of setDisplayMediaRequestHandler:

Critical Conflict: As noted in the TODO on line 40, this implementation overwrites any existing handler. This creates a direct conflict with the beat-sync feature, meaning only one can work at a time. This is a critical architectural issue that could lead to unpredictable behavior and should be resolved by implementing a centralized manager for this handler before this feature is enabled.

Missing Error Handling: The call to desktopCapturer.getSources() lacks proper error handling. If the promise rejects or if no screen sources are found, it will lead to an unhandled promise rejection or a runtime error when accessing sources[0]. You should add a .catch() block and check if the sources array is not empty.

Here is a suggestion to make it more robust:

session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => { desktopCapturer.getSources({ types: ['screen'] }) .then((sources) => { if (sources?.length > 0) { // Grant access to the first screen found. callback({ video: sources[0], audio: 'loopback' }) } else { console.error('No screen sources found.') callback({}) // Deny the request } }) .catch((error) => { console.error('Failed to get desktop sources:', error) callback({}) // Deny the request }) }, { useSystemPicker: false })

gemini-code-assist · 2026-01-08T19:29:55Z

packages/stage-pages/src/pages/settings/modules/vision.vue

+          <RadioCardManySelect
+            v-model="activeModel"
+            v-model:search-query="modelSearchQuery"
+            :items="providerModels.sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)"


You are sorting the providerModels array in-place. Since providerModels is a computed property that returns an array from the Pinia store state, this is a direct mutation of the state from a component, which is an anti-pattern and can lead to unpredictable behavior. You should create a shallow copy of the array before sorting it.

:items="[...providerModels].sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)"

gemini-code-assist · 2026-01-08T19:29:55Z

packages/stage-ui/src/stores/modules/vision/store.ts

+  async function loadModelsForProvider(provider: string) {
+    if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
+      await providersStore.fetchModelsForProvider(provider)
+    }
+  }


This function incorrectly uses providerMetadata.value, which is a computed property based on activeProvider.value. However, the function receives a provider argument that might be different. This will lead to incorrect behavior when trying to load models for a provider that is not currently active. You should get the metadata for the provider passed as an argument.

Suggested change

async function loadModelsForProvider(provider: string) {

if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {

await providersStore.fetchModelsForProvider(provider)

}

}

async function loadModelsForProvider(provider: string) {

const metadata = providersStore.providerMetadata[provider]

if (provider && metadata?.capabilities.listModels !== undefined) {

await providersStore.fetchModelsForProvider(provider)

}

}

gemini-code-assist · 2026-01-08T19:29:55Z

packages/stage-ui/src/stores/modules/vision/store.ts

+  async function getModelsForProvider(provider: string) {
+    if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {
+      return providersStore.getModelsForProvider(provider)
+    }
+
+    return []
+  }


Similar to loadModelsForProvider, this function incorrectly uses providerMetadata.value, which depends on activeProvider.value. It should use the provider argument to get the correct metadata to check for capabilities. This can lead to bugs where models for a non-active provider cannot be retrieved.

Suggested change

async function getModelsForProvider(provider: string) {

if (provider && providerMetadata.value?.capabilities.listModels !== undefined) {

return providersStore.getModelsForProvider(provider)

}

return []

}

async function getModelsForProvider(provider: string) {

const metadata = providersStore.providerMetadata[provider]

if (provider && metadata?.capabilities.listModels !== undefined) {

return providersStore.getModelsForProvider(provider)

}

return []

}

gemini-code-assist · 2026-01-08T19:29:55Z

apps/stage-tamagotchi/src/renderer/pages/devtools/vision.vue

+async function captureFrame() {
+  errorMessage.value = ''
+  screenshotDataUrl.value = ''
+  isCapturing.value = true
+
+  try {
+    await prepareScreenSourceSelection()
+
+    const stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false })
+    const video = videoRef.value
+
+    if (!video) {
+      stream.getTracks().forEach(track => track.stop())
+      throw new Error('Video element not ready')
+    }
+
+    video.srcObject = stream
+    await video.play()
+
+    await new Promise<void>((resolve) => {
+      if (video.readyState >= 2) {
+        resolve()
+        return
+      }
+
+      video.onloadedmetadata = () => resolve()
+    })
+
+    const canvas = document.createElement('canvas')
+    canvas.width = video.videoWidth
+    canvas.height = video.videoHeight
+
+    const ctx = canvas.getContext('2d')
+    if (!ctx)
+      throw new Error('Failed to create canvas context')
+
+    ctx.drawImage(video, 0, 0, canvas.width, canvas.height)
+    screenshotDataUrl.value = canvas.toDataURL('image/png')
+
+    stream.getTracks().forEach(track => track.stop())
+    video.pause()
+    video.srcObject = null
+  }
+  catch (error) {
+    errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}`
+  }
+  finally {
+    isCapturing.value = false
+  }
+}


The captureFrame function has duplicated logic for cleaning up the MediaStream. The stream tracks are stopped in two different places. This can be simplified by using a finally block to ensure resources are always released, making the code cleaner and more robust.

async function captureFrame() { errorMessage.value = '' screenshotDataUrl.value = '' isCapturing.value = true let stream: MediaStream | null = null try { await prepareScreenSourceSelection() stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false }) const video = videoRef.value if (!video) { throw new Error('Video element not ready') } video.srcObject = stream await video.play() await new Promise<void>((resolve) => { if (video.readyState >= 2) { resolve() return } video.onloadedmetadata = () => resolve() }) const canvas = document.createElement('canvas') canvas.width = video.videoWidth canvas.height = video.videoHeight const ctx = canvas.getContext('2d') if (!ctx) throw new Error('Failed to create canvas context') ctx.drawImage(video, 0, 0, canvas.width, canvas.height) screenshotDataUrl.value = canvas.toDataURL('image/png') video.pause() video.srcObject = null } catch (error) { errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}` } finally { stream?.getTracks().forEach(track => track.stop()) isCapturing.value = false } }

sumimakito · 2026-01-09T18:25:14Z

Update: Currently working on bringing the screen capturing features to our repo with some code from electron-audio-loopback.

nekomeowww marked this pull request as draft January 8, 2026 19:27

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

nekomeowww mentioned this pull request Jan 9, 2026

proposal: Implement Alaya memory layer with intelligent semantic retrieval #879

Open

Neko-233 mentioned this pull request Jan 10, 2026

Roadmap v0.9 #840

Open

62 tasks

sumimakito mentioned this pull request Jan 11, 2026

feat(electron-screen-capture): introduce new screen capture utilities #937

Merged

nekomeowww force-pushed the neko/dev/vision branch 2 times, most recently from e44caab to eaf2538 Compare January 12, 2026 19:35

nekomeowww changed the title ~~[DO NOT MERGE] feat(stage-tamagotchi): vision~~ feat(stage-tamagotchi): vision Jan 12, 2026

nekomeowww force-pushed the neko/dev/vision branch 2 times, most recently from d4b379c to 1cbec33 Compare January 14, 2026 04:02

nekomeowww mentioned this pull request Jan 15, 2026

feat: add messaging and vision modules with UI components and stores #802

Closed

nekomeowww force-pushed the neko/dev/vision branch from bb5b955 to 86a5ace Compare January 16, 2026 05:50

nekomeowww added 3 commits January 18, 2026 03:52

feat(stage-tamagotchi): vision

fb54e1a

chore: updated

5c5b370

refactor: drop resettable

9ee260e

nekomeowww force-pushed the neko/dev/vision branch from fb4ddef to 9ee260e Compare January 17, 2026 19:56

[autofix.ci] apply automated fixes

f25f68d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat(stage-tamagotchi): vision#917

feat(stage-tamagotchi): vision#917
nekomeowww wants to merge 4 commits intomainfrom
neko/dev/vision

nekomeowww commented Jan 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

sumimakito commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

nekomeowww commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Linked Issues

Additional Context

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sumimakito commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nekomeowww commented Jan 8, 2026 •

edited

Loading

github-actions bot commented Jan 8, 2026 •

edited

Loading