AILens (Electron + React + TypeScript)

AILens is an on-demand desktop assistant that explains what’s on your screen when you ask. It does not automate actions or run continuously.

It combines screenshot capture, OCR, app/window metadata, optional voice input (PTT), and model analysis to explain what’s visible and suggest neutral next steps.

Purpose

AILens is built for moments where switching tabs, searching manually, or re-explaining context is slow. Instead of asking a model in the abstract, you can ask with immediate visual context from your live desktop.

Non-goals (important)

AILens:

does not monitor your screen continuously
does not click buttons, type, trade, or execute actions on your behalf
does not provide financial/legal/medical advice

Typical use cases

Explain unfamiliar UI, forms, dashboards, charts, dialogs, or error messages
Summarize what changed on screen and what to do next
Interpret chat/email tone and intent from visible text
Compare options shown on screen (plans, prices, compatibility choices)
Use push-to-talk for hands-free “what am I looking at?” workflows

Who this helps

Support and operations teams triaging tools quickly
Developers and power users debugging UI-heavy workflows
Anyone juggling multiple apps who needs fast contextual answers

How it works (quick flow)

Capture active window (or full screen) via hotkey or button.
Extract visible text + metadata and assemble model context.
Ask a follow-up in chat (typed or voice transcript).
Receive a grounded response with optional voice playback.

MVP included

Chat window with message history, prompt input, Send, and Capture + Ask
Separate Settings window:
- OpenAI Base URL
- API key
- Model name
- Capture mode (Active Window or Full Screen, default Active Window)
Global hotkey:
- Windows/Linux: Ctrl+Shift+Space
- macOS: Cmd+Shift+Space
IPC-only architecture (renderer never calls OpenAI directly)
OpenAI Responses API call (vision-ready)
Packaging with electron-builder

Security / settings storage

Non-secret settings are stored in electron-store.
API key storage:
1. Tries OS keychain via keytar (optional dependency).
2. Falls back to local encrypted storage using Electron safeStorage when available.

Optional web lookup

AILens can optionally fetch external snippets for context (for example game items, error messages, or UI terms).

This is opt-in via Settings.
Queries are derived from on-screen text and your prompt.
If disabled, AILens stays local except for your chosen LLM/transcription provider.

Active-window capture implementation note

Cross-platform exact active-window image capture is not uniformly reliable from Electron alone. AILens uses a pragmatic method:

Capture full screen.
Get active-window bounds (get-windows).
Crop screenshot to those bounds.

If active-window bounds are unavailable due OS/privacy limitations, it falls back to full-screen and returns a note.

Minor variation due to display DPI scaling is expected across multi-monitor setups.

For debugging capture quality, use Open Last Screenshot (and Open Last Zoom Screenshot) in Settings.

Run locally

For detailed local setup (Ollama + whisper.cpp + troubleshooting), see LOCAL_SETUP.md.

Install

npm install

Development run

npm run dev

This starts:

Vite dev server for React renderer
TypeScript watch for Electron main/preload
Electron with auto-reload

Production-style smoke test

npm run start

This builds main + renderer, then launches Electron from compiled output.

Package build

npm run build

Output artifacts are generated under release/.

Scripts

npm run dev - full development loop (renderer + Electron)
npm run electron:dev - Electron-side dev loop (expects renderer on port 5173)
npm run build - build renderer/main and package app
npm run capture:once - one-shot capture from CLI (saves image, opens it, exits)

CLI capture mode

You can run AILens in a scriptable one-shot capture mode:

electron dist-electron/main/main.js --capture-active-window

Optional flags:

--out <path>: output PNG path (default: app userData debug/cli-capture.png)
--no-open: save without opening the image viewer

App menu

Open Settings from app menu: File → Settings
View About information from app menu: Help → About AILens

Tray menu

AILens also creates a system tray icon with quick actions:

Show Chat
Settings
About AILens
Quit

Clicking the tray icon focuses/restores the Chat window.

On Windows, closing the Chat window minimizes AILens to tray and shows a one-time balloon tip. Use Quit AILens from menu/tray to fully exit.

Hotkey customization

In Settings, you can configure Capture hotkey (global shortcut).

Windows/Linux example: Control+Shift+Space
macOS example: Command+Shift+Space

The Chat Capture + Ask button tooltip shows the currently active hotkey.

Push-to-talk (PTT)

AILens supports hold-to-record voice capture for assistant prompts:

Enable Push-to-Talk in Settings → General.
Hold the configured PTT hotkey (default on Windows: RightAlt).
Release the key to stop recording, transcribe speech, then run capture + analysis using that transcript.

If transcription returns empty text, AILens falls back to: Explain this screen.

Voice output controls

Auto speak assistant replies is configured in Settings → General.
Cancel in chat cancels the active request and also stops active voice playback.
Stop Voice immediately stops text-to-speech playback.

Local whisper.cpp transcription

You can run transcription locally with whisper.cpp:

Build/download whisper.cpp and get whisper-cli (whisper-cli.exe on Windows).
Download a GGML model file (for example ggml-base.en.bin).
In Settings → General:

Enable Push-to-Talk
Set Transcription provider to Local whisper.cpp (CLI)
Set whisper.cpp binary path (you can use the ... button to browse)
Set whisper.cpp model path (you can use the ... button to browse)

Notes:

This keeps transcription fully local.
If transcription fails, chat falls back to screenshot analysis with Explain this screen.
On startup, AILens validates whisper paths when whisper.cpp is selected and shows a clear status message if paths are invalid.

The About dialog shows app/runtime details including app version, Electron, Chromium, and Node.js versions.

GitHub push (optional)

If you are starting from a local folder:

git init
git remote add origin https://github.com/rmorgan1973/ailens
git add .
git commit -m "Initial commit"
git push -u origin master

App icon files

To use custom app icons in packaged builds, add:

build/icon.ico (Windows)
build/icon.icns (macOS)

These paths are already configured in package.json build settings.

Local testing with Ollama

This app uses the OpenAI Responses API shape, and Ollama 0.16+ supports a compatible /v1/responses endpoint.

1) Install / start Ollama

Make sure Ollama is running locally (default: http://127.0.0.1:11434).

2) Pull a vision-capable model

Recommended for this MVP:

ollama pull gemma3:4b

Optional secondary model:

ollama pull llava:7b

3) Configure AILens Settings

OpenAI Base URL: http://127.0.0.1:11434/v1
API Key: any non-empty value (for local Ollama, e.g. ollama)
Model name: gemma3:4b

4) Quick connectivity check

curl http://127.0.0.1:11434/v1/models

If captures fail on macOS, confirm Screen Recording permission is enabled.

macOS permissions

To capture screen content, macOS requires Screen Recording permission:

System Settings → Privacy & Security → Screen Recording
Enable permission for your app (or Electron during local dev)

Without this, capture may fail or return blank/limited images.

Microphone access is also required for Push-to-Talk:

System Settings → Privacy & Security → Microphone
Enable access for AILens/Electron

If microphone access is denied or no input device exists, PTT will show a clear chat error and manual typing still works.

Quick troubleshooting

Blank capture on macOS: enable Screen Recording permission.
PTT not recording: enable Microphone permission and confirm an input device exists.
Hotkey does not fire: choose a different global shortcut (some apps reserve combos).
OCR seems empty: try Full Screen mode or increase zoom-crop size.
Ollama errors: confirm Base URL ends with /v1 and model supports images.

IPC channels

settings:get / settings:save
capture:active-window
openai:ask-with-image

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
build		build
src		src
.gitignore		.gitignore
LOCAL_SETUP.md		LOCAL_SETUP.md
README.md		README.md
eng.traineddata		eng.traineddata
generate_icon.bat		generate_icon.bat
icon.png		icon.png
index.html		index.html
jest.config.cjs		jest.config.cjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.jest.json		tsconfig.jest.json
tsconfig.json		tsconfig.json
tsconfig.main.json		tsconfig.main.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

AILens (Electron + React + TypeScript)

Purpose

Non-goals (important)

Typical use cases

Who this helps

How it works (quick flow)

MVP included

Security / settings storage

Optional web lookup

Active-window capture implementation note

Run locally

Install

Development run

Production-style smoke test

Package build

Scripts

CLI capture mode

App menu

Tray menu

Hotkey customization

Push-to-talk (PTT)

Voice output controls

Local whisper.cpp transcription

GitHub push (optional)

App icon files

Local testing with Ollama

1) Install / start Ollama

2) Pull a vision-capable model

3) Configure AILens Settings

4) Quick connectivity check

macOS permissions

Quick troubleshooting

IPC channels

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages