Conversation
Summary of ChangesHello @nekomeowww, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request lays the groundwork for a new 'Vision' module within the application, providing the foundational capabilities for screen content analysis. It integrates system-level screen capture functionality with a user-facing interface for selecting and configuring vision models, and a developer tool for testing screen frame acquisition. The changes span across the Electron main process for core functionality, the renderer process for user interaction, and the shared UI components for state management and model cataloging. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.
Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏 |
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'vision' feature, complete with a screen capture devtool, settings pages for provider and model configuration, and a new Pinia store for state management. My review focuses on enhancing the robustness and correctness of this new functionality. I've identified a critical conflict with an existing feature, proposed improvements for error handling and resource management, and pointed out a state mutation anti-pattern and a logic bug within the new store.
| session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => { | ||
| desktopCapturer.getSources({ types: ['screen'] }).then((sources) => { | ||
| // Grant access to the first screen found. | ||
| callback({ video: sources[0], audio: 'loopback' }) | ||
| }) | ||
| }, { useSystemPicker: false }) |
There was a problem hiding this comment.
There are a couple of issues with the implementation of setDisplayMediaRequestHandler:
-
Critical Conflict: As noted in the
TODOon line 40, this implementation overwrites any existing handler. This creates a direct conflict with thebeat-syncfeature, meaning only one can work at a time. This is a critical architectural issue that could lead to unpredictable behavior and should be resolved by implementing a centralized manager for this handler before this feature is enabled. -
Missing Error Handling: The call to
desktopCapturer.getSources()lacks proper error handling. If the promise rejects or if no screen sources are found, it will lead to an unhandled promise rejection or a runtime error when accessingsources[0]. You should add a.catch()block and check if thesourcesarray is not empty.
Here is a suggestion to make it more robust:
session.defaultSession.setDisplayMediaRequestHandler((_request, callback) => {
desktopCapturer.getSources({ types: ['screen'] })
.then((sources) => {
if (sources?.length > 0) {
// Grant access to the first screen found.
callback({ video: sources[0], audio: 'loopback' })
}
else {
console.error('No screen sources found.')
callback({}) // Deny the request
}
})
.catch((error) => {
console.error('Failed to get desktop sources:', error)
callback({}) // Deny the request
})
}, { useSystemPicker: false })| <RadioCardManySelect | ||
| v-model="activeModel" | ||
| v-model:search-query="modelSearchQuery" | ||
| :items="providerModels.sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)" |
There was a problem hiding this comment.
You are sorting the providerModels array in-place. Since providerModels is a computed property that returns an array from the Pinia store state, this is a direct mutation of the state from a component, which is an anti-pattern and can lead to unpredictable behavior. You should create a shallow copy of the array before sorting it.
:items="[...providerModels].sort((a, b) => a.id === activeModel ? -1 : b.id === activeModel ? 1 : 0)"
| async function loadModelsForProvider(provider: string) { | ||
| if (provider && providerMetadata.value?.capabilities.listModels !== undefined) { | ||
| await providersStore.fetchModelsForProvider(provider) | ||
| } | ||
| } |
There was a problem hiding this comment.
This function incorrectly uses providerMetadata.value, which is a computed property based on activeProvider.value. However, the function receives a provider argument that might be different. This will lead to incorrect behavior when trying to load models for a provider that is not currently active. You should get the metadata for the provider passed as an argument.
| async function loadModelsForProvider(provider: string) { | |
| if (provider && providerMetadata.value?.capabilities.listModels !== undefined) { | |
| await providersStore.fetchModelsForProvider(provider) | |
| } | |
| } | |
| async function loadModelsForProvider(provider: string) { | |
| const metadata = providersStore.providerMetadata[provider] | |
| if (provider && metadata?.capabilities.listModels !== undefined) { | |
| await providersStore.fetchModelsForProvider(provider) | |
| } | |
| } |
| async function getModelsForProvider(provider: string) { | ||
| if (provider && providerMetadata.value?.capabilities.listModels !== undefined) { | ||
| return providersStore.getModelsForProvider(provider) | ||
| } | ||
|
|
||
| return [] | ||
| } |
There was a problem hiding this comment.
Similar to loadModelsForProvider, this function incorrectly uses providerMetadata.value, which depends on activeProvider.value. It should use the provider argument to get the correct metadata to check for capabilities. This can lead to bugs where models for a non-active provider cannot be retrieved.
| async function getModelsForProvider(provider: string) { | |
| if (provider && providerMetadata.value?.capabilities.listModels !== undefined) { | |
| return providersStore.getModelsForProvider(provider) | |
| } | |
| return [] | |
| } | |
| async function getModelsForProvider(provider: string) { | |
| const metadata = providersStore.providerMetadata[provider] | |
| if (provider && metadata?.capabilities.listModels !== undefined) { | |
| return providersStore.getModelsForProvider(provider) | |
| } | |
| return [] | |
| } |
| async function captureFrame() { | ||
| errorMessage.value = '' | ||
| screenshotDataUrl.value = '' | ||
| isCapturing.value = true | ||
|
|
||
| try { | ||
| await prepareScreenSourceSelection() | ||
|
|
||
| const stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false }) | ||
| const video = videoRef.value | ||
|
|
||
| if (!video) { | ||
| stream.getTracks().forEach(track => track.stop()) | ||
| throw new Error('Video element not ready') | ||
| } | ||
|
|
||
| video.srcObject = stream | ||
| await video.play() | ||
|
|
||
| await new Promise<void>((resolve) => { | ||
| if (video.readyState >= 2) { | ||
| resolve() | ||
| return | ||
| } | ||
|
|
||
| video.onloadedmetadata = () => resolve() | ||
| }) | ||
|
|
||
| const canvas = document.createElement('canvas') | ||
| canvas.width = video.videoWidth | ||
| canvas.height = video.videoHeight | ||
|
|
||
| const ctx = canvas.getContext('2d') | ||
| if (!ctx) | ||
| throw new Error('Failed to create canvas context') | ||
|
|
||
| ctx.drawImage(video, 0, 0, canvas.width, canvas.height) | ||
| screenshotDataUrl.value = canvas.toDataURL('image/png') | ||
|
|
||
| stream.getTracks().forEach(track => track.stop()) | ||
| video.pause() | ||
| video.srcObject = null | ||
| } | ||
| catch (error) { | ||
| errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}` | ||
| } | ||
| finally { | ||
| isCapturing.value = false | ||
| } | ||
| } |
There was a problem hiding this comment.
The captureFrame function has duplicated logic for cleaning up the MediaStream. The stream tracks are stopped in two different places. This can be simplified by using a finally block to ensure resources are always released, making the code cleaner and more robust.
async function captureFrame() {
errorMessage.value = ''
screenshotDataUrl.value = ''
isCapturing.value = true
let stream: MediaStream | null = null
try {
await prepareScreenSourceSelection()
stream = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: false })
const video = videoRef.value
if (!video) {
throw new Error('Video element not ready')
}
video.srcObject = stream
await video.play()
await new Promise<void>((resolve) => {
if (video.readyState >= 2) {
resolve()
return
}
video.onloadedmetadata = () => resolve()
})
const canvas = document.createElement('canvas')
canvas.width = video.videoWidth
canvas.height = video.videoHeight
const ctx = canvas.getContext('2d')
if (!ctx)
throw new Error('Failed to create canvas context')
ctx.drawImage(video, 0, 0, canvas.width, canvas.height)
screenshotDataUrl.value = canvas.toDataURL('image/png')
video.pause()
video.srcObject = null
}
catch (error) {
errorMessage.value = `Failed to capture screen: ${errorMessageFrom(error)}`
}
finally {
stream?.getTracks().forEach(track => track.stop())
isCapturing.value = false
}
}
|
Update: Currently working on bringing the screen capturing features to our repo with some code from electron-audio-loopback. |
e44caab to
eaf2538
Compare
d4b379c to
1cbec33
Compare
bb5b955 to
86a5ace
Compare
fb4ddef to
9ee260e
Compare
Description
Depends on #941
Linked Issues
Additional Context