Skip to content

rmorgan1973/ailens

Repository files navigation

AILens (Electron + React + TypeScript)

AILens is an on-demand desktop assistant that explains what’s on your screen when you ask. It does not automate actions or run continuously.

It combines screenshot capture, OCR, app/window metadata, optional voice input (PTT), and model analysis to explain what’s visible and suggest neutral next steps.

Purpose

AILens is built for moments where switching tabs, searching manually, or re-explaining context is slow. Instead of asking a model in the abstract, you can ask with immediate visual context from your live desktop.

Non-goals (important)

AILens:

  • does not monitor your screen continuously
  • does not click buttons, type, trade, or execute actions on your behalf
  • does not provide financial/legal/medical advice

Typical use cases

  • Explain unfamiliar UI, forms, dashboards, charts, dialogs, or error messages
  • Summarize what changed on screen and what to do next
  • Interpret chat/email tone and intent from visible text
  • Compare options shown on screen (plans, prices, compatibility choices)
  • Use push-to-talk for hands-free “what am I looking at?” workflows

Who this helps

  • Support and operations teams triaging tools quickly
  • Developers and power users debugging UI-heavy workflows
  • Anyone juggling multiple apps who needs fast contextual answers

How it works (quick flow)

  1. Capture active window (or full screen) via hotkey or button.
  2. Extract visible text + metadata and assemble model context.
  3. Ask a follow-up in chat (typed or voice transcript).
  4. Receive a grounded response with optional voice playback.

MVP included

  • Chat window with message history, prompt input, Send, and Capture + Ask
  • Separate Settings window:
    • OpenAI Base URL
    • API key
    • Model name
    • Capture mode (Active Window or Full Screen, default Active Window)
  • Global hotkey:
    • Windows/Linux: Ctrl+Shift+Space
    • macOS: Cmd+Shift+Space
  • IPC-only architecture (renderer never calls OpenAI directly)
  • OpenAI Responses API call (vision-ready)
  • Packaging with electron-builder

Security / settings storage

  • Non-secret settings are stored in electron-store.
  • API key storage:
    1. Tries OS keychain via keytar (optional dependency).
    2. Falls back to local encrypted storage using Electron safeStorage when available.

Optional web lookup

AILens can optionally fetch external snippets for context (for example game items, error messages, or UI terms).

  • This is opt-in via Settings.
  • Queries are derived from on-screen text and your prompt.
  • If disabled, AILens stays local except for your chosen LLM/transcription provider.

Active-window capture implementation note

Cross-platform exact active-window image capture is not uniformly reliable from Electron alone. AILens uses a pragmatic method:

  1. Capture full screen.
  2. Get active-window bounds (get-windows).
  3. Crop screenshot to those bounds.

If active-window bounds are unavailable due OS/privacy limitations, it falls back to full-screen and returns a note.

Minor variation due to display DPI scaling is expected across multi-monitor setups.

For debugging capture quality, use Open Last Screenshot (and Open Last Zoom Screenshot) in Settings.

Run locally

For detailed local setup (Ollama + whisper.cpp + troubleshooting), see LOCAL_SETUP.md.

Install

npm install

Development run

npm run dev

This starts:

  • Vite dev server for React renderer
  • TypeScript watch for Electron main/preload
  • Electron with auto-reload

Production-style smoke test

npm run start

This builds main + renderer, then launches Electron from compiled output.

Package build

npm run build

Output artifacts are generated under release/.

Scripts

  • npm run dev - full development loop (renderer + Electron)
  • npm run electron:dev - Electron-side dev loop (expects renderer on port 5173)
  • npm run build - build renderer/main and package app
  • npm run capture:once - one-shot capture from CLI (saves image, opens it, exits)

CLI capture mode

You can run AILens in a scriptable one-shot capture mode:

electron dist-electron/main/main.js --capture-active-window

Optional flags:

  • --out <path>: output PNG path (default: app userData debug/cli-capture.png)
  • --no-open: save without opening the image viewer

App menu

  • Open Settings from app menu: File → Settings
  • View About information from app menu: Help → About AILens

Tray menu

AILens also creates a system tray icon with quick actions:

  • Show Chat
  • Settings
  • About AILens
  • Quit

Clicking the tray icon focuses/restores the Chat window.

On Windows, closing the Chat window minimizes AILens to tray and shows a one-time balloon tip. Use Quit AILens from menu/tray to fully exit.

Hotkey customization

In Settings, you can configure Capture hotkey (global shortcut).

  • Windows/Linux example: Control+Shift+Space
  • macOS example: Command+Shift+Space

The Chat Capture + Ask button tooltip shows the currently active hotkey.

Push-to-talk (PTT)

AILens supports hold-to-record voice capture for assistant prompts:

  • Enable Push-to-Talk in Settings → General.
  • Hold the configured PTT hotkey (default on Windows: RightAlt).
  • Release the key to stop recording, transcribe speech, then run capture + analysis using that transcript.

If transcription returns empty text, AILens falls back to: Explain this screen.

Voice output controls

  • Auto speak assistant replies is configured in Settings → General.
  • Cancel in chat cancels the active request and also stops active voice playback.
  • Stop Voice immediately stops text-to-speech playback.

Local whisper.cpp transcription

You can run transcription locally with whisper.cpp:

  1. Build/download whisper.cpp and get whisper-cli (whisper-cli.exe on Windows).
  2. Download a GGML model file (for example ggml-base.en.bin).
  3. In Settings → General:
  • Enable Push-to-Talk
  • Set Transcription provider to Local whisper.cpp (CLI)
  • Set whisper.cpp binary path (you can use the ... button to browse)
  • Set whisper.cpp model path (you can use the ... button to browse)

Notes:

  • This keeps transcription fully local.
  • If transcription fails, chat falls back to screenshot analysis with Explain this screen.
  • On startup, AILens validates whisper paths when whisper.cpp is selected and shows a clear status message if paths are invalid.

The About dialog shows app/runtime details including app version, Electron, Chromium, and Node.js versions.

GitHub push (optional)

If you are starting from a local folder:

git init
git remote add origin https://github.com/rmorgan1973/ailens
git add .
git commit -m "Initial commit"
git push -u origin master

App icon files

To use custom app icons in packaged builds, add:

  • build/icon.ico (Windows)
  • build/icon.icns (macOS)

These paths are already configured in package.json build settings.

Local testing with Ollama

This app uses the OpenAI Responses API shape, and Ollama 0.16+ supports a compatible /v1/responses endpoint.

1) Install / start Ollama

Make sure Ollama is running locally (default: http://127.0.0.1:11434).

2) Pull a vision-capable model

Recommended for this MVP:

ollama pull gemma3:4b

Optional secondary model:

ollama pull llava:7b

3) Configure AILens Settings

  • OpenAI Base URL: http://127.0.0.1:11434/v1
  • API Key: any non-empty value (for local Ollama, e.g. ollama)
  • Model name: gemma3:4b

4) Quick connectivity check

curl http://127.0.0.1:11434/v1/models

If captures fail on macOS, confirm Screen Recording permission is enabled.

macOS permissions

To capture screen content, macOS requires Screen Recording permission:

  • System Settings → Privacy & Security → Screen Recording
  • Enable permission for your app (or Electron during local dev)

Without this, capture may fail or return blank/limited images.

Microphone access is also required for Push-to-Talk:

  • System Settings → Privacy & Security → Microphone
  • Enable access for AILens/Electron

If microphone access is denied or no input device exists, PTT will show a clear chat error and manual typing still works.

Quick troubleshooting

  • Blank capture on macOS: enable Screen Recording permission.
  • PTT not recording: enable Microphone permission and confirm an input device exists.
  • Hotkey does not fire: choose a different global shortcut (some apps reserve combos).
  • OCR seems empty: try Full Screen mode or increase zoom-crop size.
  • Ollama errors: confirm Base URL ends with /v1 and model supports images.

IPC channels

  • settings:get / settings:save
  • capture:active-window
  • openai:ask-with-image

About

AI Lens application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages