AIPromptBridge

AIPromptBridge is a Windows desktop application that brings AI assistance to your fingertips. Use global hotkeys to edit text using AI, capture and analyze audio or screen content, and chat with models, all from a lightweight system tray app.

✨ Features

🎯 TextEditTool

Press Ctrl+Space anywhere to invoke AI on selected text:

Understand - Explain, Generate Summaries, or Keypoints
Edit - Proofread (✏️), Rewrite (📝), or make it Casual (😎)
Q&A - Use the second input box in the popup to ask any question about the text
Compare - Use the 🔀 Compare button to compare selected text with another text selection
Custom prompts - Define and group your own actions in the Prompt Editor

Works in any application: browsers, IDEs, Notepad, Word, everywhere.

📸 Screen Snip (SnipTool)

Press Ctrl+Alt+X to capture a region of your screen and analyze it with AI:

OCR - Extract Text or OCR to Markdown for clean formatting
Analysis - Describe, Summarize, or Explain Code
Data - Extract Data to tables, Transcribe handwriting, or Smart Cleanup notes
Compare - Compare Images to analyze differences between two screenshots
Response Modes - Choose to show result in Chat Window, Copy to Clipboard, or Type directly into active field

🎤 Audio Analyzer

Press Ctrl+Alt+A to record and analyze audio:

Record - Capture microphone input or system audio (loopback)
Transcribe - High-fidelity transcription with timestamps and speaker identification
Analyze - Summarize meetings, extract key points, or analyze tone
Controls - Visual level meter, compression settings (Opus/MP3), and preview
Integration - Send audio directly to chat context for follow-up questions

🔊 Text-to-Speech (TTS)

Convert text into expressive speech using Gemini TTS models:

30 Voices - Choose from 30 prebuilt voices with distinct styles (Bright, Firm, Upbeat, etc.)
AI Director - Automatically generates style instructions for expressive, nuanced speech
Two Models - Flash (fast) and Pro (quality) TTS model options
Multi-Speaker - Support for up to 2 speakers with individual voice assignment
Playback - Built-in audio preview with play/pause and seek controls
Export - Save generated audio as WAV files
Entry Points - 🔊 button in popups, [T] terminal key, hotkey Ctrl+Alt+T, and system tray menu

💬 Chat Interface

Lightweight chat windows with:

Streaming responses (real-time typing)
Markdown rendering
Session history (browse and restore)
Multi-theme UI with 7 color schemes

🎨 Theme System

The app supports 7 distinct themes with both Dark and Light variants:

Catppuccin	Dracula	Nord

Gruvbox	OneDark	Minimal

High Contrast

Customizable appearance with:

7 themes: Catppuccin, Dracula, Nord, Gruvbox, OneDark, Minimal, High Contrast
Dark/Light modes: Each theme has both variants
System detection: Auto-switches based on Windows theme
Live preview: See theme changes instantly in Settings

🔄 Robust Backend

Multi-provider support - Google Gemini, OpenRouter, custom endpoints
Automatic key rotation - Switch API keys on rate limits (429, 401, 403)
Smart retry logic - Handles errors gracefully with configurable delays
Empty response detection - Automatically retries with next key
Streaming support - Real-time responses
Batch Processing - Async processing for large workloads (Gemini Batch API)
Attachment Manager - Efficient external storage for session images, audio, and files

🧰 Tools System (Not accessible in No Console mode)

The File Processor tool enables bulk operations:

Batch Processing: Process folders of Images, Audio, Code, Text, or PDFs
Audio Optimization: Reduce file size (mono, sample rate) for efficient AI processing
Configurable: On-demand tools_config.json creation
Smart Handling:
- Large Files: Auto-switches to Gemini Files API or Chunking logic
- Checkpoints: Resume interrupted jobs or retry failures
- Interactive Mode: Pause (P), Stop (S), or Abort (Esc) during processing

The TTS Processor tool enables batch text-to-speech generation:

Text Splitting: Lines, paragraphs, sentences, or whole file modes
Voice Selection: 30 prebuilt Gemini voices with single or multi-speaker support
Style Instructions: Manual, default, no style, or AI Director (single/per-segment)
AI Director: Auto-generates expressive style instructions for nuanced speech
Output Modes: Individual WAV files per segment or merged into single file
Checkpoints: Full resume support with failure retry
Interactive Mode: Pause (P), Stop (S) during generation

🚀 Quick Start

Download (Recommended)

Download AIPromptBridge.zip from GitHub Releases
Extract and run AIPromptBridge.exe (use AIPromptBridge-NoConsole.exe to hide console)
On first launch, it automatically opens the Settings window in API Keys tab. Enter your API keys (see Getting API Keys), enter key name (Optional), and click Add
Optionally configure selected provider, endpoint URL or models in Provider tab and click Save
The app starts minimized to system tray

From Source (Alternative)

git clone https://github.com/zaxx-q/AIPromptBridge.git
cd AIPromptBridge
pip install -r requirements.txt
python main.py

📋 Usage

System Tray

Right-click the tray icon for:

Toggle Console or Double click tray icon - Toggle console visibility (Not visible in No Console mode)
Session Browser - View chat history
Direct Chat - Open text input popup (Ctrl+Space)
Screen Snip - Trigger screen capture (Ctrl+Alt+X)
Audio Analyzer - Open audio tool (Ctrl+Alt+A)
TTS - Open Text-to-Speech window (Ctrl+Alt+T)
Settings - Open GUI settings editor
Prompt Editor - Edit TextEditTool prompts
Edit config.ini - Open configuration file (only visible with --show-console arg)
Edit prompts.json - Open prompts file (only visible with --show-console arg)
Restart - Restart the application
Quit - Exit completely

TextEditTool

Select text in any application
Press Ctrl+Space
Choose an action (Proofread, Rewrite, etc.)
Text is replaced or opened in chat

Without selection: Opens a quick input bar for direct questions.

SnipTool (Screen Snipping)

Press Ctrl+Alt+X
Click and drag to select a screen region
Choose an action (Describe, Extract Text, etc.) or ask a question
Results open in a chat window with the image attached, can also be copied to clipboard or typed directly into the active field.

Audio Tool

Press Ctrl+Alt+A to open the Audio Analyzer
Select input device (Microphone or System Audio)
Click Record to capture audio
Choose an action (Transcribe, Analyze, etc.)
Results are streamed to a chat window or displayed in the result panel

API Endpoints

Note: These endpoints are largely deprecated.

ShareX Users: ShareX 19.0.1+ now has a native "Analyze image" feature.
Desktop Users: The built-in SnipTool (Ctrl+Alt+X) offers better integration.

Endpoints allow HTTP POST access (disabled by default). See ShareX Setup Guide if needed.

Console Commands

When console is visible, press these keys:

Key	Action
`S`	Open session browser (Sessions)
`A`	Open Audio Analyzer
`T`	Open TTSTool window (Text-to-Speech)
`X`	Open Tools menu
`L`	List recent saved sessions
`I`	Show system info (Status)
`K`	Toggle thinking mode
`P`	Switch AI provider
`M`	List available models (Use `?N` for details, e.g., `?1`)
`R`	Toggle streaming mode
`G`	Open Settings window
`W`	Open prompt editor
`H`	Show help

⚙️ Configuration

AIPromptBridge features a comprehensive GUI for all configuration needs, making it easy to manage settings without touching configuration files.

🎛️ Settings Window

Access via System Tray > Settings. This window manages the core application configuration (config.ini):

API Keys: Manage keys for Google Gemini, OpenRouter, and Custom providers.
Providers: Select default models and configure endpoint URLs.
Tools: Configure hotkeys and behavior for TextEditTool, SnipTool, and AudioTool.
Theme: Switch between 7 themes and toggle Dark/Light modes.
System: Configure server host/port and startup options.

✏️ Prompt Editor

Access via System Tray > Prompt Editor. This window lets you customize how the AI responds (prompts.json):

Actions: Create, edit, and organize actions for Text, Snip, and Audio tools.
Modifiers: Customize the modifier bar buttons (e.g., "Shorter", "Professional").
Playground: Test your prompts in real-time with text, images, or audio before saving.
Hot-Reload: Changes apply immediately without restarting the app.

📂 Manual Configuration

For advanced users, configuration files are stored in the application root:

config.ini: Core settings and API keys.
prompts.json: AI system prompts and tool configurations.

💡 Tips

For Faster Responses

Use non-reasoning models (e.g., gemini-2.0-flash instead of gemini-2.5-pro)
Disable thinking mode: Press T in console or set thinking_enabled = false
Keep streaming enabled for perceived faster responses

For Better Results

Enable thinking mode for complex tasks
Use specific prompts in TextEditTool
Add context when asking questions

Getting API Keys

Google Gemini (Recommended): Get a free API key from Google AI Studio
OpenRouter: Get an API key from OpenRouter
OpenAI: Get an API key from OpenAI Platform

API Key Management

Add multiple API keys (one per line) for automatic rotation
If one key hits rate limits, the next one is used automatically
The system tracks exhausted keys and skips them
Keys rotate on: 429 (rate limit), 401/402/403 (auth errors), empty responses
Security: You can also provide keys via Environment Variables instead of config.ini:
- GEMINI_API_KEY
- OPENROUTER_API_KEY
- CUSTOM_API_KEY

🔧 Command Line Options

AIPromptBridge.exe --no-tray          # No tray icon
AIPromptBridge.exe --show-console     # Doesn't automatically hide console at startup, also enable debug logs
AIPromptBridge.exe --no-wt            # Skip Windows Terminal detection and redirection (handled by launcher)

💡 Console View: For the best console experience (including full color emoji support), it is highly recommended to use Windows Terminal. AIPromptBridge will attempt to automatically relaunch in Windows Terminal if detected.

📖 Documentation

Project Structure - File organization
Architecture - Technical details
ShareX Setup - Screenshot integration

📝 Requirements

Windows 10/11 (uses Windows-specific APIs for tray, console, snipping, and audio capture mechanims)
Windows Terminal (Highly recommended for better console view and colors)
Python 3.13+ (if running from source)
FFmpeg (Required for audio compression and conversion features)
- Download FFmpeg
- Install Guide - Ensure it is added to your system PATH
API keys for at least one provider (Google Gemini recommended)

📄 License

MIT License

Attribution & Third-Party Licenses

This project uses Twemoji graphics, licensed under CC-BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
src		src
test		test
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
deprecated.py		deprecated.py
icon.ico		icon.ico
main.py		main.py
requirements.txt		requirements.txt

License

zaxx-q/AIPromptBridge

Folders and files

Latest commit

History

Repository files navigation