Skip to content

Adding VLM support#25

Open
sanchitmonga22 wants to merge 9 commits intomainfrom
VLM
Open

Adding VLM support#25
sanchitmonga22 wants to merge 9 commits intomainfrom
VLM

Conversation

@sanchitmonga22
Copy link
Collaborator

No description provided.

sanchitmonga22 and others added 9 commits March 13, 2026 00:42
Implements screencapture CLI wrapper, active window detection,
terminal PID skip logic, app tracking via polling, and visual
overlay mode via a subprocess helper (rcli_overlay) communicating
over stdin/stdout pipes.
Separate process with its own AppKit event loop that shows a
draggable/resizable green-bordered transparent overlay window.
Communicates with parent RCLI via stdin/stdout pipe protocol.
Add screen_capture.mm to rcli library sources, rcli_overlay as
a standalone executable target, and link CoreGraphics framework.
Detect screen-related voice intents via keyword combinations,
capture screen (overlay or behind-terminal), analyze with VLM,
and speak response using sentence-level streaming TTS for low TTFA.
[S] key toggles visual overlay on/off, status bar shows active
state, screen/visual text commands trigger capture + VLM analysis,
switched from rcli_speak to rcli_speak_streaming for lower TTFA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants