Conversation
- Add screenshot checkbox to text input panel with Camera icon - Implement screenshot capture using Electron's desktopCapturer API - Update message format to support multimodal content (text + image) - Add screenshot parameter throughout the processing pipeline - Update LLM API calls to handle both string and array content formats - Support OpenAI-compatible vision API format with base64 images - Expose desktopCapturer API in preload script for renderer access Fixes #217
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe changes add screenshot capture and multimodal support end-to-end: renderer UI captures screenshots, preload exposes desktopCapturer, IPC threads screenshot through tipc, LLM layers accept structured (text+image) message content and update token estimation and prompt construction for non-string content. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant UI as TextInputPanel
participant Panel as panel.tsx
participant Preload as preload
participant Main as main/tipc
participant Agent as Agent Processing
participant LLM as LLM API
User->>UI: Enable "Include screenshot" + enter text
UI->>Preload: api.desktopCapturer.getSources()
Preload-->>UI: Screenshot DataURL
User->>UI: Submit (text + screenshot)
UI->>Panel: onSubmit(text, screenshot)
Panel->>Main: createTextInput/createMcpTextInput({text, screenshot})
Main->>Agent: processWithAgentMode(text, conversationId, screenshot)
Agent->>Agent: Transform history -> [{type:'text', text}, {type:'image_url', image_url}]
Agent->>LLM: Call LLM with multimodal message.content (array)
LLM-->>Agent: Response
Agent-->>Panel: Result
Panel-->>UI: Clear input & screenshot
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const captureScreenshot = async () => { | ||
| setIsCapturingScreenshot(true) | ||
| try { | ||
| // Use Electron's desktopCapturer API to capture screenshot | ||
| const sources = await (window as any).electron.desktopCapturer.getSources({ | ||
| types: ['screen'], |
There was a problem hiding this comment.
Wire screenshot API to the wrong global
The screenshot checkbox calls window.electron.desktopCapturer.getSources(...), but the preload script only exposes desktopCapturer on the custom electronAPI object (contextBridge.exposeInMainWorld("electronAPI", api)). window.electron still points to the default Electron toolkit API and has no desktopCapturer, so the call will throw and screenshots are never captured. Expose the capturer on window.electron or update the renderer to use window.electronAPI.desktopCapturer.
Useful? React with 👍 / 👎.
| const handleSubmit = () => { | ||
| if (text.trim() && !isProcessing) { | ||
| onSubmit(text.trim()) | ||
| onSubmit(text.trim(), screenshot || undefined) | ||
| setText("") |
There was a problem hiding this comment.
Send screenshots even after user unchecks option
handleSubmit always forwards screenshot || undefined to onSubmit and then clears the state afterwards. The checkbox only toggles includeScreenshot; it never clears screenshot when the user turns the option off. If a screenshot was captured and the user unchecks the box before sending, the image is still submitted despite the UI indicating otherwise, which is a privacy risk. Either gate the submission on includeScreenshot or reset screenshot when the checkbox is unchecked.
Useful? React with 👍 / 👎.
|
augment review |
apps/desktop/src/preload/index.ts
Outdated
| @@ -10,6 +10,10 @@ const api = { | |||
| getOAuthStatus: (serverName: string) => ipcRenderer.invoke('getOAuthStatus', serverName), | |||
| revokeOAuthTokens: (serverName: string) => ipcRenderer.invoke('revokeOAuthTokens', serverName), | |||
| testMCPServer: (serverName: string, config: any) => ipcRenderer.invoke('testMCPServer', { serverName, config }), | |||
| // Screenshot API | |||
| desktopCapturer: { | |||
There was a problem hiding this comment.
The desktopCapturer API is exposed under window.electronAPI, but the renderer uses window.electron.desktopCapturer; this mismatch will make screenshot capture undefined at runtime. Consider aligning the exposure with the usage (related to the usage in text-input-panel.tsx).
🤖 React with 👍 or 👎 to let us know if the comment was useful, or 🚀 if it prevented an incident/outage.
| setIsCapturingScreenshot(true) | ||
| try { | ||
| // Use Electron's desktopCapturer API to capture screenshot | ||
| const sources = await (window as any).electron.desktopCapturer.getSources({ |
There was a problem hiding this comment.
This references window.electron.desktopCapturer, but preload exposes desktopCapturer on window.electronAPI, so this will be undefined at runtime. Related to the exposure in src/preload/index.ts.
🤖 React with 👍 or 👎 to let us know if the comment was useful, or 🚀 if it prevented an incident/outage.
| @@ -570,7 +570,10 @@ async function makeOpenAICompatibleCall( | |||
| } | |||
|
|
|||
| const model = getModel(providerId, "mcp") | |||
| const estimatedTokens = Math.ceil(messages.reduce((sum, msg) => sum + msg.content.length, 0) / 4) | |||
| const estimatedTokens = Math.ceil(messages.reduce((sum, msg) => { | |||
There was a problem hiding this comment.
With multimodal messages (array content), the request logging in makeAPICallAttempt that truncates message content via substring assumes string content and will throw in debug mode. Consider making the debug logging robust to array content to avoid runtime errors when isDebugLLM() is true.
🤖 React with 👍 or 👎 to let us know if the comment was useful, or 🚀 if it prevented an incident/outage.
| }) | ||
|
|
||
| if (sources && sources.length > 0) { | ||
| // Get the first screen (primary display) |
There was a problem hiding this comment.
Selecting sources[0] as the "primary display" isn’t guaranteed by desktopCapturer; source ordering can vary across platforms. This can capture the wrong screen on multi‑monitor setups (also applies to other locations if replicated).
🤖 React with 👍 or 👎 to let us know if the comment was useful, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/llm-fetch.ts (1)
827-835: Handle multimodal responses without.trim()runtime errorsUpstream changes now allow
messageObj.contentto be an array (e.g.,{ type: "text" },{ type: "image_url" }). This line still assumes a string and calls.trim(), so the first multimodal assistant reply will crash withTypeError: rawContent.trim is not a function, aborting the whole agent run. Normalize array content before trimming.- let content: string | undefined = (messageObj.content ?? "").trim() + const rawContent = messageObj.content + let content: string | undefined + + if (Array.isArray(rawContent)) { + content = rawContent + .map((part: any) => { + if (typeof part?.text === "string") return part.text + if (typeof part?.image_url?.url === "string") return `[image:${part.image_url.url}]` + return "" + }) + .filter(Boolean) + .join("\n") + .trim() + } else { + content = (rawContent ?? "").trim() + }
🧹 Nitpick comments (2)
src/renderer/src/pages/panel.tsx (2)
135-156: Consider adding client-side validation for screenshot data.The mutation accepts and forwards the screenshot to the backend without validation. Adding client-side checks would improve UX by catching issues earlier and providing clearer error messages.
Consider adding validation before the backend call:
const textInputMutation = useMutation({ mutationFn: async ({ text, screenshot }: { text: string; screenshot?: string }) => { + // Validate screenshot if provided + if (screenshot) { + if (!screenshot.startsWith('data:image/')) { + throw new Error('Invalid screenshot format. Please try capturing again.') + } + const sizeInMB = (screenshot.length * 0.75) / (1024 * 1024) + if (sizeInMB > 10) { + throw new Error(`Screenshot is too large (${sizeInMB.toFixed(1)}MB). Maximum size is 10MB.`) + } + } await tipcClient.createTextInput({ text, screenshot }) },
158-193: Apply the same validation to MCP text input mutation.Similar to
textInputMutation, this mutation should validate the screenshot parameter before sending to the backend to catch errors early and provide better UX.Apply the same validation pattern suggested for
textInputMutationat lines 135-156.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/main/llm-fetch.ts(6 hunks)src/main/llm.ts(5 hunks)src/main/tipc.ts(5 hunks)src/preload/index.d.ts(1 hunks)src/preload/index.ts(2 hunks)src/renderer/src/components/text-input-panel.tsx(4 hunks)src/renderer/src/pages/panel.tsx(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/main/tipc.ts (1)
src/main/config.ts (1)
configStore(142-142)
src/renderer/src/pages/panel.tsx (1)
src/renderer/src/lib/tipc-client.ts (1)
tipcClient(12-14)
🔇 Additional comments (3)
src/main/tipc.ts (2)
619-688: Screenshot parameter properly threaded through agent mode flow.The implementation correctly forwards the screenshot parameter through the MCP text input flow to
processWithAgentMode. The screenshot will be included in the conversation history and multimodal LLM prompts as described in the PR objectives.Note: Ensure screenshot validation is added as suggested in the earlier comment on
processWithAgentMode.
566-617: Screenshot parameter accepted but unused in createTextInput function.The
createTextInputfunction accepts ascreenshotparameter (line 569) but never uses it anywhere in the function body. The screenshot is passed to this function whenmcpToolsEnabledis false (line 630), but it's silently discarded.Screenshots are only processed in the MCP flow via
processWithAgentMode, where they're sent to LLMs as multimodal content. The fallback text-input path doesn't persist or process screenshots despite accepting them in the input type definition.Recommendation: Either remove
screenshotfrom thecreateTextInputinput type since it's not used, or document this limitation. Alternatively, if screenshots should be stored for non-MCP mode, add ascreenshotfield toRecordingHistoryItemand persist them.src/renderer/src/pages/panel.tsx (1)
352-375: Screenshot parameter properly propagated through submission flow.The implementation correctly forwards the screenshot parameter to both MCP and standard text input mutations, ensuring it reaches the backend processing layer. The conversation message creation appropriately includes only the text, keeping the screenshot as context for the LLM rather than display content.
src/main/tipc.ts
Outdated
| async function processWithAgentMode( | ||
| text: string, | ||
| conversationId?: string, | ||
| screenshot?: string, | ||
| ): Promise<string> { |
There was a problem hiding this comment.
Add validation for screenshot parameter.
The screenshot parameter accepts base64-encoded image data that can be 2-3MB for the configured 1920x1080 resolution. Without validation, this could lead to memory issues or API payload size limit violations downstream.
Consider adding validation before processing:
async function processWithAgentMode(
text: string,
conversationId?: string,
screenshot?: string,
): Promise<string> {
+ // Validate screenshot if provided
+ if (screenshot) {
+ // Check format (should be data URL)
+ if (!screenshot.startsWith('data:image/')) {
+ throw new Error('Invalid screenshot format')
+ }
+ // Check size (e.g., 10MB limit)
+ const sizeInMB = (screenshot.length * 0.75) / (1024 * 1024) // approximate base64 size
+ if (sizeInMB > 10) {
+ throw new Error(`Screenshot too large: ${sizeInMB.toFixed(1)}MB (max 10MB)`)
+ }
+ }
+
const config = configStore.get()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function processWithAgentMode( | |
| text: string, | |
| conversationId?: string, | |
| screenshot?: string, | |
| ): Promise<string> { | |
| async function processWithAgentMode( | |
| text: string, | |
| conversationId?: string, | |
| screenshot?: string, | |
| ): Promise<string> { | |
| // Validate screenshot if provided | |
| if (screenshot) { | |
| // Check format (should be data URL) | |
| if (!screenshot.startsWith('data:image/')) { | |
| throw new Error('Invalid screenshot format') | |
| } | |
| // Check size (e.g., 10MB limit) | |
| const sizeInMB = (screenshot.length * 0.75) / (1024 * 1024) // approximate base64 size | |
| if (sizeInMB > 10) { | |
| throw new Error(`Screenshot too large: ${sizeInMB.toFixed(1)}MB (max 10MB)`) | |
| } | |
| } | |
| const config = configStore.get() |
apps/desktop/src/preload/index.ts
Outdated
| // Screenshot API | ||
| desktopCapturer: { | ||
| getSources: (options: any) => desktopCapturer.getSources(options) | ||
| } |
There was a problem hiding this comment.
Expose desktopCapturer on the object the renderer actually uses
contextBridge.exposeInMainWorld("electron", electronAPI) still publishes the unmodified toolkit API, so the newly added screenshot helper only exists on window.electronAPI.desktopCapturer. The renderer code calls window.electron.desktopCapturer.getSources(...), which will be undefined at runtime—screenshot capture never fires and throws. Please merge the desktopCapturer proxy into the object you expose as window.electron so the preload and renderer stay in sync.
const api = {
@@
- desktopCapturer: {
- getSources: (options: any) => desktopCapturer.getSources(options)
- }
+ desktopCapturer: {
+ getSources: (options: any) => desktopCapturer.getSources(options)
+ }
}
+const electronWithDesktopCapturer = {
+ ...electronAPI,
+ desktopCapturer: api.desktopCapturer,
+}
+
if (process.contextIsolated) {
try {
- contextBridge.exposeInMainWorld("electron", electronAPI)
+ contextBridge.exposeInMainWorld("electron", electronWithDesktopCapturer)
contextBridge.exposeInMainWorld("electronAPI", api)
@@
- window.electron = electronAPI
+ window.electron = electronWithDesktopCapturer
window.electronAPI = api
}Committable suggestion skipped: line range outside the PR's diff.
- Only send screenshot when includeScreenshot is true - Clear screenshot data when user unchecks the box - Prevents unintended screenshot transmission - Respects user privacy expectations
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/renderer/src/components/text-input-panel.tsx (3)
68-68: Still using wrong API path - critical runtime error.Past reviews flagged that
window.electron.desktopCaptureris undefined; the preload script exposes it onwindow.electronAPI.desktopCapturer. This will throw at runtime and prevent screenshot capture.Apply this diff:
- const sources = await (window as any).electron.desktopCapturer.getSources({ + const sources = await (window as any).electronAPI.desktopCapturer.getSources({
74-75: Primary display assumption unreliable on multi-monitor setups.
desktopCapturerdoesn't guarantee source ordering, sosources[0]may capture a secondary screen. Consider filtering for the primary display or documenting this limitation.
96-103: Missing dependency in useEffect.The effect references
screenshotin line 97 and 99 but doesn't include it in the dependency array. React's exhaustive-deps rule flags this. Addscreenshotto the dependencies or use a callback to avoid potential stale closure issues.- }, [includeScreenshot]) + }, [includeScreenshot, screenshot])Note: You may also need to add
isCapturingScreenshotas a guard to prevent re-capture loops.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/renderer/src/components/text-input-panel.tsx(4 hunks)
🔇 Additional comments (3)
src/renderer/src/components/text-input-panel.tsx (3)
85-93: Privacy concern properly addressed.The submission now correctly respects the
includeScreenshottoggle and only sends the screenshot when both the checkbox is checked AND a screenshot exists. This resolves the privacy issue flagged in previous reviews.
96-103: Screenshot clearing properly implemented.The effect correctly clears the screenshot when the user unchecks the box, addressing the privacy concern from previous reviews.
193-212: UI implementation looks good.The screenshot checkbox, status indicators, and disabled states are properly implemented with clear visual feedback.
| } catch (error) { | ||
| console.error('Failed to capture screenshot:', error) | ||
| } finally { | ||
| setIsCapturingScreenshot(false) | ||
| } |
There was a problem hiding this comment.
Provide user feedback when screenshot capture fails.
Errors are logged but users see no indication that capture failed. The checkbox stays checked and shows "Capturing..." indefinitely if an error occurs, creating a confusing state.
Consider showing a brief error message or toast notification when capture fails.
🤖 Prompt for AI Agents
In src/renderer/src/components/text-input-panel.tsx around lines 78 to 82, the
catch block only logs the error so the UI stays in a "Capturing..." state and
the checkbox remains checked; update the catch to set the capturing state back
to false (or ensure finally still runs) and surface a brief user-facing error
(e.g., trigger an existing toast/notification or set a local error message state
that displays a short inline message) so users know the capture failed; keep the
console.error for debugging but call the toast/notification helper (or setError)
with a friendly message like "Screenshot failed" so the checkbox and label are
restored and the user receives feedback.
|
augment review |
Resolves merge conflicts from monorepo restructure: - Fixed content conflict in llm-fetch.ts (combined multimodal content type with sessionId/onRetryProgress) - Applied screenshot/multimodal changes to new file locations: - apps/desktop/src/main/llm.ts: Added screenshot parameter and multimodal message handling - apps/desktop/src/main/tipc.ts: Added screenshot parameter to processWithAgentMode and router - apps/desktop/src/renderer/src/pages/panel.tsx: Added screenshot to mutations and handleTextSubmit - Removed old src/ files that were deleted in main but modified in PR Fixes #671
|
augment review |
🤖 Augment PR SummarySummary: Adds optional screenshot capture so users can send multimodal (text + image) prompts to vision-capable models. Changes:
Technical Notes: Screenshots are sent as 🤖 Was this summary useful? React with 👍 or 👎 |
| // Clear screenshot when user unchecks the box | ||
| setScreenshot(null) | ||
| } | ||
| }, [includeScreenshot]) |
There was a problem hiding this comment.
Because this effect only depends on includeScreenshot, an in-flight captureScreenshot() can still call setScreenshot(...) after the user unchecks the box, leaving a “captured” image in state while the toggle is off. Consider making the capture lifecycle resilient to that race so the UI state can’t re-populate a screenshot after opt-out.
🤖 Was this useful? React with 👍 or 👎
apps/desktop/src/main/llm-fetch.ts
Outdated
| // Convert messages to Gemini format | ||
| const prompt = messages.map((m) => `${m.role}: ${m.content}`).join("\n\n") | ||
| const prompt = messages.map((m) => { | ||
| const content = typeof m.content === 'string' ? m.content : JSON.stringify(m.content) |
There was a problem hiding this comment.
For Gemini, JSON.stringify(m.content) will embed the full screenshot data URL into the text prompt, which is likely to blow past payload/token limits and isn’t Gemini’s supported multimodal format. Consider stripping image parts (or mapping them to Gemini-native parts) when the provider isn’t OpenAI-compatible vision.
🤖 Was this useful? React with 👍 or 👎
In Electron 31+, desktopCapturer is only available in the main process. This was causing the preload script to fail to load, breaking the entire UI because window.electron.ipcRenderer was undefined. Changes: - Remove desktopCapturer import from preload script - Add IPC handler 'getScreenSources' in main process - Update text-input-panel to use new IPC-based API - Update type definitions for the new API
- Add captureWantedRef to prevent setting screenshot state after user unchecks - Add screenshotError state to show user feedback when capture fails - Uncheck the checkbox automatically when capture fails - Add proper dependencies to useEffect for React best practices Addresses PR review comments about race conditions and missing error feedback.
- Add extractTextFromContent helper to avoid embedding full image data URLs - Replace image parts with '[image attached]' placeholder for Gemini - Prevents payload/token limit issues when screenshots are in messages Addresses PR review comment about Gemini multimodal content handling.
- Validate screenshot format (must be data:image/ data URL) - Validate screenshot size (maximum 10MB) - Throw descriptive errors for invalid screenshots Addresses PR review comment about screenshot validation.
|
augment review |
| // Pass existingSessionId to reuse the session if found | ||
| // When fromTile=true, start snoozed so the floating panel doesn't appear | ||
| processWithAgentMode(input.text, conversationId, existingSessionId, input.fromTile ?? false) | ||
| processWithAgentMode(input.text, conversationId, existingSessionId, input.fromTile ?? false, input.screenshot) |
There was a problem hiding this comment.
This is a fire-and-forget promise chain without a .catch(), so the new screenshot validation errors (and other failures inside processWithAgentMode) can surface as unhandled promise rejections. Consider ensuring failures are caught/logged (and ideally surfaced to the UI) to avoid silent session failures.
🤖 Was this useful? React with 👍 or 👎
| */ | ||
| async function makeLLMCallAttempt( | ||
| messages: Array<{ role: string; content: string }>, | ||
| messages: Array<{ role: string; content: string | any[] }>, |
There was a problem hiding this comment.
makeLLMCallAttempt now allows content to be an array, but the debug log inside this function still does messages[...].content.substring(...) which will throw when isDebugLLM() is enabled and a screenshot is attached. The debug/logging path should tolerate non-string content values.
🤖 Was this useful? React with 👍 or 👎
- Fix macOS Screen Recording permission detection (empty sources array) - Add helper functions to handle multimodal content (string or array) - Add detail: high to image_url for xAI/Grok vision support - Improve error handling in text-input-panel for screenshot errors
… monitor selection - Add screenshot support for voice/recording input (passes through triggerMcpRecording and createMcpRecording) - Add 'Always include screenshot' setting in Settings > Shortcuts > Text Input - Add preview button (eye icon) next to screenshot checkbox to preview before sending - Add monitor/display selection in Settings > Screenshot section - Screenshot capture now uses configured display with fallback to primary
|
augment review |
|
This pull request is too large for Augment to review. The PR exceeds the maximum size limit of 100000 tokens (approximately 400000 characters) for automated code review. Please consider breaking this PR into smaller, more focused changes. |
|
augment review |
|
This pull request is too large for Augment to review. The PR exceeds the maximum size limit of 100000 tokens (approximately 400000 characters) for automated code review. Please consider breaking this PR into smaller, more focused changes. |
|
augment review |
| onSubmit(text.trim(), screenshotToSend) | ||
| setText("") | ||
| setScreenshot(null) | ||
| setIncludeScreenshot(false) |
There was a problem hiding this comment.
handleSubmit() always does setIncludeScreenshot(false), which means the alwaysIncludeScreenshot setting won’t persist across submissions (since the sync effect only reruns when alwaysIncludeScreenshot changes). Consider resetting includeScreenshot back to alwaysIncludeScreenshot after submit so the config is consistently honored.
🤖 Was this useful? React with 👍 or 👎
apps/desktop/src/main/window.ts
Outdated
| }) | ||
|
|
||
| if (sources.length === 0) { | ||
| logApp('[captureScreenshotFromMain] No screen sources available') |
There was a problem hiding this comment.
On macOS, desktopCapturer.getSources() can return an empty array when Screen Recording permission isn’t granted; captureScreenshotFromMain() currently just logs and returns undefined, so screenshotForVoiceCommands can silently do nothing. Consider surfacing a clearer error path/message (similar to the IPC handler) so users know they need to grant permission and restart.
🤖 Was this useful? React with 👍 or 👎
- Reset includeScreenshot to alwaysIncludeScreenshot config value after submit (instead of always resetting to false) - Improve captureScreenshotFromMain error logging for macOS permission issues to help users understand they need to grant Screen Recording permission
|
augment review |
| const mapConversationToMessages = ( | ||
| addSummaryPrompt: boolean = false | ||
| ): Array<{ role: "user" | "assistant"; content: string }> => { | ||
| ): Array<{ role: "user" | "assistant"; content: string | any[] }> => { |
There was a problem hiding this comment.
mapConversationToMessages now emits content as string | any[] for screenshot messages, but downstream context-budgeting (shrinkMessagesForLLM / LLMMessage in context-budget.ts) still assumes content is a string and uses string-only operations. This can break agent runs when a screenshot is attached and context reduction is enabled; consider normalizing multimodal content before context reduction (or extending the reducer to handle multimodal arrays).
🤖 Was this useful? React with 👍 or 👎
| logApp("IPC main registered") | ||
|
|
||
| // Register desktopCapturer handler (available only in main process in Electron 31+) | ||
| ipcMain.handle('getScreenSources', async (_event, options: { types: ('screen' | 'window')[], thumbnailSize?: { width: number, height: number } }) => { |
There was a problem hiding this comment.
getScreenSources forwards renderer-supplied options directly into desktopCapturer.getSources; consider validating/clamping types and thumbnailSize so a renderer bug (or injected script) can’t request unexpected capture modes or extreme sizes. Even if the UI currently uses only screen at 1920×1080, tightening this IPC surface helps prevent accidental DoS/privacy regressions.
🤖 Was this useful? React with 👍 or 👎
- Fix captureScreenshotFromMain to throw errors instead of silently returning undefined - On macOS, throws clear error about Screen Recording permission - Callers now properly handle errors with try-catch - Fix context-budget.ts to handle multimodal content (string | any[]) - Add getContentLength() helper for multimodal content length calculation - Add getContentAsString() helper for extracting text from multimodal content - Update LLMMessage type to support multimodal content - Update all content operations to use helper functions - Add input validation to getScreenSources IPC handler - Filter types to only allow 'screen' (not 'window') for privacy - Clamp thumbnailSize to reasonable bounds (100-4096 pixels) - Set sensible defaults if options not provided
|
augment review |
|
augment review |
| <DialogHeader> | ||
| <DialogTitle>Screenshot Preview</DialogTitle> | ||
| <DialogDescription> | ||
| This is what will be sent with your message |
There was a problem hiding this comment.
The preview dialog says “This is what will be sent with your message”, but handlePreviewScreenshot() captures a fresh image into previewImage without updating the screenshot state used by submit, so the preview can diverge from the actual payload (potential privacy surprise). Consider ensuring the preview reflects the same screenshot that will be submitted.
🤖 Was this useful? React with 👍 or 👎
| if (!content) return null | ||
|
|
||
| // Handle multimodal content (text + screenshot) | ||
| if (entry.screenshot) { |
There was a problem hiding this comment.
Because screenshots are stored on conversationHistory entries and emitted into LLM messages, follow-up turns will resend all prior screenshots in the conversation. This can grow request payloads quickly (even with a per-image cap) and may hit provider/request-size limits on longer sessions; consider how/when older screenshots should be dropped or summarized.
🤖 Was this useful? React with 👍 or 👎
Address PR review comment about preview divergence. The preview dialog now uses the existing screenshot state instead of capturing a fresh one, ensuring what users see matches what will actually be sent with their message. - Use existing screenshot state for preview if available - Only capture fresh screenshot when none exists - Update screenshot state after capture to keep preview in sync
Address PR review comment about consistent error logging. The catch handler for the fire-and-forget processWithAgentMode call now uses logApp for consistency with other error logging in the tipc module.
|
augment review |
Summary
This PR implements screenshot capture functionality for multimodal input, allowing users to include screenshots with their text messages when using vision-capable models.
Changes
UI Components
Backend Implementation
desktopCapturerAPIdesktopCapturerAPI in preload script for renderer accesscreateTextInputandcreateMcpTextInputTIPC proceduresprocessWithAgentModefunctionprocessTranscriptWithAgentModefunctionMultimodal Message Format
{ "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } } ] }Type Safety
llm-fetch.tsandllm.tsTesting
Fixes
Closes #217
Notes
Summary by CodeRabbit
New Features
Chores