AI Image OCR Plugin

A plugin for Obsidian that extracts text from images using OCR powered by AI image recognition.

This is a simple plugin for extremely accurate and reliable text and handwriting recognition in images.

AI models are vastly more effective at text extraction compared to traditional tools such as Tesseract.

Wiki

Visit the Plugin Wiki for detailed documentation.

Supported Models

Tip

The Google Gemini Flash 2.5 free tier (no credit card required)
has a rate limit of 250 RPD (requests per day).
Flash-Lite allows up to 1,000 RPD.
For most users, Gemini is the recommended model family
as it is fast, highly accurate, and free to use.

OpenAI Models

GPT-4o (`gpt-4o`)

A powerful model for text extraction
Not free, but very inexpensive — see Pricing
Requires OpenAI API key
See Notes for API access requirements

GPT-4o Mini (`gpt-4o-mini`)

Lower cost and latency than GPT-4o
Slightly reduced accuracy
Requires OpenAI API key

GPT-4.1 (`gpt-4.1`)

Successor to GPT-4, optimized for production use
Requires GPT-4 API access and billing
See Pricing

GPT-4.1 Mini (`gpt-4.1-mini`)

Lightweight version of GPT-4.1
Faster and more affordable, with slightly reduced capabilities

GPT-4.1 Nano (`gpt-4.1-nano`)

Extremely low-latency and low-cost version of GPT-4.1
Suitable for fast, low-resource scenarios

Google Gemini Models

Gemini 2.5 Flash (`gemini-2.5-flash`)

A fast and efficient model for text extraction
Free tier available with generous rate limits — see Rate Limits
Requires Google API key

Gemini 2.5 Flash-Lite Preview (`gemini-2.5-flash-lite-preview-06-17`)

Lightweight version of Gemini Flash
Free tier with especially generous limits
Useful for large volumes of low-latency OCR
Requires Google API key

Gemini 2.5 Pro (`gemini-2.5-pro`)

Slower but extremely accurate model for OCR
Requires paid tier access — see Pricing
Requires Google API key

Local Models

Ollama

Run models like llava, llava:13b, or bakllava entirely on your machine
No internet required
Must have Ollama installed and running

LM Studio

Compatible with local models that support the OpenAI Chat Completions API
Requires LM Studio to be installed and running
Works with any vision-capable model that accepts base64 image input

Custom OpenAI-Compatible Providers

Bring-your-own endpoint support for any service that follows the OpenAI-compatible Chat Completions API
Allows integration with services like:
- DeepInfra
- Fireworks.ai
- Together.ai
- Groq
- Custom self-hosted APIs
Specify the full endpoint URL, model ID, and API key (if required)

Note

Custom providers are untested. Successful use will depend on compatibility with the OpenAI API. User must enter the correct address and model ID. Where applicable a valid API key must also be provided.

Features

Extract text from images directly into your Obsidian notes
Supports multiple AI models — cloud and local
Use local models via Ollama or LM Studio (no API key or billing required)
Add your own OpenAI-compatible provider and model ID
Works with common image formats (PNG, JPG, WEBM, etc.)
Clean, markdown-formatted output
Use custom prompt text or stick with the default
Choose where to send extracted text:
- Replace image embed
- Insert at cursor
- Create or append to another note
Header and footer template creation with {{placeholder}} support
File/folder naming template creation with {{placeholder}} support
Use {{image.image}} to embed source image in extracted output header/footer
Extract from embedded images or via OS-native file/folder pickers

Note

Support for {{placeholder}} options is still being tested. Unexpected behavior may occur.
Refer to the Wiki for available placeholders. Please report any placeholder issues or suggestions on GitHub.

Installation

Install via Obsidian Community Plugin Browser

Note

This option is not yet available.

Open Obsidian settings.
Under "Community plugins", ensure "Safe mode" is disabled.
Click "Browse" to open the Community Plugin Browser.
Search for "AI Image OCR".
Click "Install" to download the plugin.

Install via BRAT

If you have the BRAT plugin installed, you can install this plugin using the BRAT plugin manager:

Open the BRAT plugin settings.
Click Add beta plugin.
Enter https://github.com/rootiest/obsidian-ai-image-ocr in the Repository URL field.
(Optionally) Check the Enable after installing the plugin checkbox to enable the plugin immediately after installation.
Click Add plugin

Manual Installation

Clone this repository to your vault plugins directory:

git clone https://github.com/rootiest/obsidian-ai-image-ocr.git \
  .obsidian/plugins/obsidian-ai-image-ocr

Or download the plugin archive and extract to your plugins directory.

Configuration

Choose a model provider (OpenAI, Gemini, Ollama, etc.)
Select a model ID (e.g. gpt-4o, llava:13b, etc.)
If using a cloud model, enter the corresponding API key

Several addition optional configuration option are available with which you may customize the output behavior.

{{placeholder}} options are detailed in the wiki.

Usage

Open An Image For Extraction

Use the command palette (Ctrl+P) and search for "Extract text from image".
Select an image file.
Text will be extracted and inserted per your configuration.

Extract Text From An Embedded Image

Place your cursor below the embedded image.
Use the "Extract Text from Embedded Image" command.
The nearest image above the cursor will be used as the source.
The embed will be replaced by the extracted text.

Select A Folder For Extraction

Use the command palette (Ctrl+P) and search for "Extract text from image folder".
Select a directory which contains images.
Text will be extracted from each image and inserted per your configuration.

Tip

See the Token Limits Wiki for tips on maximizing token use when extracting from batch images.

Notes

Tip

You can select an image embed in your note to use it as the source and replace it with the extracted text.

Note

When using OpenAI:
You must use a user or service account key (not a sk-proj key).

Requirements

Internet connection (unless using a local model)
For OpenAI/Gemini: API key
For local models: Ollama or LM Studio installed and running

🚧 Roadmap

The following features are under consideration for future releases of the plugin:

Extend Placeholder Support

Add created/modified placeholders for images.
- Support moment.js formatting of image placeholders.
Add other {{placeholder}} options.

Reverse Placeholder Support

Support using a keyword to indicate where extracted text should be place in a note.

Note

These goals are exploratory and may evolve based on user feedback and API capabilities. Have a suggestion? Open an issue or discussion on GitHub!

🔐 Privacy

The AI Image OCR Plugin does not collect or store any personal data, images, or extracted text. A proxy server may be used in specific cases to retrieve external images securely. Basic proxy request metadata may be temporarily logged for debugging, but is automatically removed within 7 days.

For full details, see the Privacy & Anonymity Wiki.

License

MIT

Built with ❤️ for Obsidian. Inspired by the limitations of traditional OCR.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github		.github
assets		assets
providers		providers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.ts		main.ts
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
settings-tab.ts		settings-tab.ts
styles.css		styles.css
types.ts		types.ts

License

WOODSEE-DIGI/obsidian-ai-image-ocr

Folders and files

Latest commit

History

Repository files navigation

AI Image OCR Plugin

Wiki

Supported Models

OpenAI Models

GPT-4o (gpt-4o)

GPT-4o Mini (gpt-4o-mini)

GPT-4.1 (gpt-4.1)

GPT-4.1 Mini (gpt-4.1-mini)

GPT-4.1 Nano (gpt-4.1-nano)

Google Gemini Models

Gemini 2.5 Flash (gemini-2.5-flash)

Gemini 2.5 Flash-Lite Preview (gemini-2.5-flash-lite-preview-06-17)

Gemini 2.5 Pro (gemini-2.5-pro)

Local Models

Ollama

LM Studio

Custom OpenAI-Compatible Providers

Features

Installation

Install via Obsidian Community Plugin Browser

Install via BRAT

Manual Installation

Configuration

Usage

Open An Image For Extraction

Extract Text From An Embedded Image

Select A Folder For Extraction

Notes

Requirements

🚧 Roadmap

Extend Placeholder Support

Reverse Placeholder Support

🔐 Privacy

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

GPT-4o (`gpt-4o`)

GPT-4o Mini (`gpt-4o-mini`)

GPT-4.1 (`gpt-4.1`)

GPT-4.1 Mini (`gpt-4.1-mini`)

GPT-4.1 Nano (`gpt-4.1-nano`)

Gemini 2.5 Flash (`gemini-2.5-flash`)

Gemini 2.5 Flash-Lite Preview (`gemini-2.5-flash-lite-preview-06-17`)

Gemini 2.5 Pro (`gemini-2.5-pro`)

Packages