AILens is an on-demand desktop assistant that explains what’s on your screen when you ask. It does not automate actions or run continuously.
It combines screenshot capture, OCR, app/window metadata, optional voice input (PTT), and model analysis to explain what’s visible and suggest neutral next steps.
AILens is built for moments where switching tabs, searching manually, or re-explaining context is slow. Instead of asking a model in the abstract, you can ask with immediate visual context from your live desktop.
AILens:
- does not monitor your screen continuously
- does not click buttons, type, trade, or execute actions on your behalf
- does not provide financial/legal/medical advice
- Explain unfamiliar UI, forms, dashboards, charts, dialogs, or error messages
- Summarize what changed on screen and what to do next
- Interpret chat/email tone and intent from visible text
- Compare options shown on screen (plans, prices, compatibility choices)
- Use push-to-talk for hands-free “what am I looking at?” workflows
- Support and operations teams triaging tools quickly
- Developers and power users debugging UI-heavy workflows
- Anyone juggling multiple apps who needs fast contextual answers
- Capture active window (or full screen) via hotkey or button.
- Extract visible text + metadata and assemble model context.
- Ask a follow-up in chat (typed or voice transcript).
- Receive a grounded response with optional voice playback.
- Chat window with message history, prompt input, Send, and Capture + Ask
- Separate Settings window:
- OpenAI Base URL
- API key
- Model name
- Capture mode (
Active WindoworFull Screen, defaultActive Window)
- Global hotkey:
- Windows/Linux:
Ctrl+Shift+Space - macOS:
Cmd+Shift+Space
- Windows/Linux:
- IPC-only architecture (renderer never calls OpenAI directly)
- OpenAI Responses API call (vision-ready)
- Packaging with
electron-builder
- Non-secret settings are stored in
electron-store. - API key storage:
- Tries OS keychain via
keytar(optional dependency). - Falls back to local encrypted storage using Electron
safeStoragewhen available.
- Tries OS keychain via
AILens can optionally fetch external snippets for context (for example game items, error messages, or UI terms).
- This is opt-in via Settings.
- Queries are derived from on-screen text and your prompt.
- If disabled, AILens stays local except for your chosen LLM/transcription provider.
Cross-platform exact active-window image capture is not uniformly reliable from Electron alone. AILens uses a pragmatic method:
- Capture full screen.
- Get active-window bounds (
get-windows). - Crop screenshot to those bounds.
If active-window bounds are unavailable due OS/privacy limitations, it falls back to full-screen and returns a note.
Minor variation due to display DPI scaling is expected across multi-monitor setups.
For debugging capture quality, use Open Last Screenshot (and Open Last Zoom Screenshot) in Settings.
For detailed local setup (Ollama + whisper.cpp + troubleshooting), see LOCAL_SETUP.md.
npm installnpm run devThis starts:
- Vite dev server for React renderer
- TypeScript watch for Electron main/preload
- Electron with auto-reload
npm run startThis builds main + renderer, then launches Electron from compiled output.
npm run buildOutput artifacts are generated under release/.
npm run dev- full development loop (renderer + Electron)npm run electron:dev- Electron-side dev loop (expects renderer on port 5173)npm run build- build renderer/main and package appnpm run capture:once- one-shot capture from CLI (saves image, opens it, exits)
You can run AILens in a scriptable one-shot capture mode:
electron dist-electron/main/main.js --capture-active-windowOptional flags:
--out <path>: output PNG path (default: app userDatadebug/cli-capture.png)--no-open: save without opening the image viewer
- Open Settings from app menu: File → Settings
- View About information from app menu: Help → About AILens
AILens also creates a system tray icon with quick actions:
- Show Chat
- Settings
- About AILens
- Quit
Clicking the tray icon focuses/restores the Chat window.
On Windows, closing the Chat window minimizes AILens to tray and shows a one-time balloon tip. Use Quit AILens from menu/tray to fully exit.
In Settings, you can configure Capture hotkey (global shortcut).
- Windows/Linux example:
Control+Shift+Space - macOS example:
Command+Shift+Space
The Chat Capture + Ask button tooltip shows the currently active hotkey.
AILens supports hold-to-record voice capture for assistant prompts:
- Enable Push-to-Talk in Settings → General.
- Hold the configured PTT hotkey (default on Windows:
RightAlt). - Release the key to stop recording, transcribe speech, then run capture + analysis using that transcript.
If transcription returns empty text, AILens falls back to: Explain this screen.
- Auto speak assistant replies is configured in Settings → General.
- Cancel in chat cancels the active request and also stops active voice playback.
- Stop Voice immediately stops text-to-speech playback.
You can run transcription locally with whisper.cpp:
- Build/download
whisper.cppand getwhisper-cli(whisper-cli.exeon Windows). - Download a GGML model file (for example
ggml-base.en.bin). - In Settings → General:
- Enable Push-to-Talk
- Set Transcription provider to Local whisper.cpp (CLI)
- Set whisper.cpp binary path (you can use the
...button to browse) - Set whisper.cpp model path (you can use the
...button to browse)
Notes:
- This keeps transcription fully local.
- If transcription fails, chat falls back to screenshot analysis with
Explain this screen. - On startup, AILens validates whisper paths when whisper.cpp is selected and shows a clear status message if paths are invalid.
The About dialog shows app/runtime details including app version, Electron, Chromium, and Node.js versions.
If you are starting from a local folder:
git init
git remote add origin https://github.com/rmorgan1973/ailens
git add .
git commit -m "Initial commit"
git push -u origin masterTo use custom app icons in packaged builds, add:
build/icon.ico(Windows)build/icon.icns(macOS)
These paths are already configured in package.json build settings.
This app uses the OpenAI Responses API shape, and Ollama 0.16+ supports a compatible /v1/responses endpoint.
Make sure Ollama is running locally (default: http://127.0.0.1:11434).
Recommended for this MVP:
ollama pull gemma3:4bOptional secondary model:
ollama pull llava:7b- OpenAI Base URL:
http://127.0.0.1:11434/v1 - API Key: any non-empty value (for local Ollama, e.g.
ollama) - Model name:
gemma3:4b
curl http://127.0.0.1:11434/v1/modelsIf captures fail on macOS, confirm Screen Recording permission is enabled.
To capture screen content, macOS requires Screen Recording permission:
- System Settings → Privacy & Security → Screen Recording
- Enable permission for your app (or Electron during local dev)
Without this, capture may fail or return blank/limited images.
Microphone access is also required for Push-to-Talk:
- System Settings → Privacy & Security → Microphone
- Enable access for AILens/Electron
If microphone access is denied or no input device exists, PTT will show a clear chat error and manual typing still works.
- Blank capture on macOS: enable Screen Recording permission.
- PTT not recording: enable Microphone permission and confirm an input device exists.
- Hotkey does not fire: choose a different global shortcut (some apps reserve combos).
- OCR seems empty: try Full Screen mode or increase zoom-crop size.
- Ollama errors: confirm Base URL ends with
/v1and model supports images.
settings:get/settings:savecapture:active-windowopenai:ask-with-image