25 Aug 15:33

rootiest

7d1e01c

0.9.1 Fixes for Obsidian plugin Standards Latest

Latest

Update settings to use Obsidian APIs and fix type casting

Use ctx?.file?.path directly instead of casting to any.
Change heading "Single image extraction" to "Batch image extraction".
Use new Setting(containerEl).setName().setHeading() for section headings.
Import moment from obsidian instead of accessing (window as any).moment.

What's Changed

Fixes for Obsidian plugin Standards by @rootiest in #15

Full Changelog: 0.9.0...0.9.1

Contributors

rootiest

Assets 5

25 Aug 15:20

rootiest

0.9.0

3fa3de8

0.9.0

🚀 Version 0.9.0 — Embed Source Images

This release introduces the first working implementation of batch image extraction for the AI Image OCR plugin.

New Features

Allows embedding source image in output template
Implements a debug mode for extended console output

Fixes

Improved formatting across the plugin:
- Follows Obsidian plugin guidelines (sentence case, etc)
Use Obsidian methods and standards:
- Follows Obsidian plugin guidelines
- Eliminates need for hosted CORS proxy

⚠️ Known Limitations

Token limits are not currently checked — large batches may cause errors or fail silently.
- This is particularly relevant with OpenAI models

Full Changelog: 0.8.0...0.9.0

Assets 5

0 Join discussion

02 Jul 02:12

github-actions

0.7.0

4cfed6c

0.7.0 Local and Custom Models

This release implements several new model options and custom prompt text.
The code base has been refactored to aid in future maintenance and updates.
This release also rolls out some GitHub features for user interaction and issue reporting.

Please read the notes at the end of this release text.

Features

Adds support for Ollama local models
Adds support for LM Studio local models
Adds support for any custom OpenAI-compatible models (local or remote)
Adds custom prompt text option

Fixes

Several improvements to settings page style and content

Development Updates

Code base has been refactored into multiple files for ease of readability and maintenance
Descriptive comments added to all functions and classes to describe their purpose
Removed a few unused helper functions

Repository Updates

The Issues Tracker now has templates for Bug Reports and Feature Requests.
Discussions page:
- Q&A for user questions and support
- Show and Tell for sharing tips and tricks such as provider/model configurations or prompt text
- General for other general topics
- Announcements and Polls will also be posted here
- New releases will be posted in the Announcements for release-specific discussion.
The Readme now contains a Roadmap section
- This section lists potential future enhancements and features
- The items listed are prospective. No roadmap features are guaranteed.
- Features may be removed without implementation or held in the roadmap indefinitely
- Features may not (will not) be implemented in the order they are listed

Submitting Provider Requests

Please try using the Custom OpenAI-compatible provider option first.
Verify your API endpoint, API key, and Model ID are correct.
Confirm the model being used works with image input (Vision and/or multi-modal support)
Verify that the image you wish to extract from works on other providers
You may use the free Gemini model as a control for this step
If possible, identify the part of the API that differs from the OpenAI API Reference
We use the OpenAI REST API to make requests.
Create a Feature Request in the Issues Tracker with the following details:
- The website and name of the provider you are trying to use
- The API endpoint
- The model you are using
- Any error messages returned (Check the Obsidian Dev Console)
- Describe the issue and steps you have tried to fix it
- If possible, include a link the to the API docs and/or any tips on what differs from the standard API
- If you have submitted or found a PR related to this issue, please include a link to it
- Feel free to include any other relevant information

Please note that I cannot test all providers and models.
If your provider or model does not offer a free API then I will be unable to test it and likely unable to support it.
Please still submit a Request. Exceptions or other arrangements can be made in some cases.

If we compile enough data on providers/models/etc I will add a wiki with a database of user reports on compatibility, syntax, and other details like the best models or prompt text to use.

Notes

Discussions and Issues: Please use the Issues Tracker only for Feature Requests and Bug Reports.
The Discussions page is available for user support, questions, or other topics.

Local models require a user-provided local model service. See Ollama or LM Studio for more details.
Custom providers must be compatible with the OpenAI API. Several examples are listed in the README.
Users are responsible for correctly entering the API endpoint and model ID.
Some custom providers (particular remote providers) require an API key. If yours does not, leave that field blank.
Models must support Vision and/or multi-modal input to parse images.
Not all models that support images are trained to recognize text in images. YMMV.
In general, larger models like OpenAI and Gemini tend to perform better at this task than open-source or local models.

Full Changelog: 0.5.0...0.7.0

Assets 5

0 Join discussion

01 Jul 17:20

github-actions

0.6.0-beta.3

8f263a6

0.6.0-beta.3 Custom Providers Pre-release

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.

This pre-release adds an option to use any custom OpenAI-compatible model.
This adds support for local or remote providers which comply precisely with the OpenAI API structure.

Features

Custom Providers:
You can now connect your plugin to any OpenAI-compatible endpoints for image extraction.
The following settings options are now available:
- Provider: Custom OpenAI-compatible
- AI Endpoint: This is where you enter the full endpoint address (including the /v1/chat/completions, etc)
- Model ID: This is the model you wish to use on that provider
- API Key: This is the API key for the provider (leave empty when no key is required)

Notes

Custom provider support is dependent on strict compatibility with the OpenAI API format.
If your provider has variations (particularly with image attachment formats) then it may not work with this option.
Model must have Vision and/or multi-modal support. Image attachments will fail on unsupported models.
Not all vision models are trained for text recognition (even if they perform well at describing an image)
Some models may return hallucinated text rather than a clear failure when they are unable to process the image.
I cannot test all providers and models. If your preferred provider doesn't work, let me know and I will look into it.
- Understand that for testing-cost reasons I cannot implement tailored support for providers who don't have a free API.

Full Changelog: 0.5.0...0.6.0-beta.3

Assets 5

29 Jun 20:53

github-actions

0.6.0-beta.2

c46e81b

0.6.0-beta.2 LMStudio Local Models Pre-release

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.
Please read the release notes of the previous beta release as well.

This release adds support for LMStudio with mostly the same functions and behavior as the Ollama integration.

Features

LMStudio Integration:
You can now connect your plugin to local LMStudio server endpoints for image extraction.
The following settings options are now available:
- Provider: LMStudio (Local)
- LMStudio Server Url: This is where you enter the endpoint address and port
- LMStudio Model Name: This is the model you wish to use (you must download and install it in advance)

Notes

LMStudio is now supported. However there are several caveats:
- All of the caveats mentioned in the previous beta release regarding Ollama also apply here.
- Additionally, LMStudio models sometimes appear to use different prompt formats that what is supported.
- Preliminary testing shows the current format works with google/gemma-3-4b and qwen/qwen2.5-vl-7b
- Models must have Vision listed in their capabilities and use the same prompt format as the above models. (most of them should)
- If a model fails via the plugin but works in the LMStudio UI, please let me know and I will try to look into it.

Beta Notes

Beta warning: This is an early release of LMStudio support. Some rough edges and surprise gremlins may appear!
Your feedback is super valuable! Please report bugs, unexpected behavior, or even weird vibes.
I may not be able to test every model so reports on which models perform well/poorly are very helpful!

Full Changelog: 0.5.0...0.6.0-beta.2

Assets 5

29 Jun 05:01

github-actions

0.6.0-beta.1

ee3d6af

0.6.0-beta.1 Beta Test: Ollama Support Pre-release

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.

This plugin does not spawn an Ollama server for you and it does not download models for you.
It allows you to use an Ollama server you have already set up.

Please understand that local Ollama models may not (often don't) perform as well as the cloud providers.
Currently out of the few models I've tried I have had the best results with llama3.2-vision.
However I have not tested all the vision-capable models and there may be others that perform better.

Features

Ollama Integration:
You can now connect your plugin to local Ollama endpoints for image extraction.
The following settings options are now available:
- Provider: Ollama (Local)
- Ollama Server Url: This is where you enter the endpoint address and port
- Ollama Model Name: This is the model you wish to use (you must download and install it in advance)

Notes

Ollama is now supported. However there are several caveats:
- This plugin does not provide or initialize an Ollama server.
- You are required to install, initialize, and host the server on your own.
- ONLY "multi-modal" or "vision" models are supported. Other models are unable to parse images in prompts.
- Some models perform better than others.
- Not all vision models are trained for text recognition (even if they perform well at describing an image)
- Beware that sometimes the model may not respond with "Unable to find text in the image" if it fails.
- Models will occasionally return hallucinated text if the model fails to locate the text or doesn't support vision.

Beta Notes

Beta warning: This is an early release of Ollama support. Some rough edges and surprise gremlins may appear!
Your feedback is super valuable! Please report bugs, unexpected behavior, or even weird vibes.
I may not be able to test every model so reports on which models perform well/poorly are very helpful!

Full Changelog: 0.5.0...0.6.0-beta.1

Assets 5

28 Jun 18:25

github-actions

0.5.0

eacda13

0.5.0 New OpenAI Models

This release adds additional OpenAI models.

Features

Add new models:
- OpenAI GPT 4o-mini: Lower cost and latency than GPT-4o
- OpenAI GPT 4.1: Successor to GPT-4, optimized for production use
- OpenAI GPT 4.1-mini: Lightweight version of GPT-4.1
- OpenAI GPT 4.1-nano: Extremely low-latency and low-cost version of GPT-4.1

Style

Make text of notification during extraction more descriptive.

Full Changelog: 0.4.5...0.5.0

Assets 5

28 Jun 17:29

github-actions

0.4.5

46cfc51

0.4.5 New Gemini Models

This release adds additional Gemini models.

Features

Add new models:
- Gemini Flash-Lite: This model has increased rate limits on the free tier: 1,000 requests per day.
- Gemini Flash Pro: This model is slower and only available to paid tiers but could potentially produce more accurate results.

Style

Add short model descriptions to settings page.

Full Changelog: 0.4.0...0.4.5

Assets 5

25 Jun 21:54

github-actions

0.4.0

721c024

0.4.0 Free Tier FTW

This release upgrades the Gemini model to Flash 2.5 as the Flash 1.5 model is being deprecated by Google.

Features

Replace Gemini Flash 1.5 with Gemini Flash 2.5
Gemini Flash 2.5 (currently) has NO daily rate limits and a 250 responses per day limit,
effectively making it FREE to use with this plugin! (within the rate limits)
The free tier does not require any payment provider (no credit card/etc required)
All you need is a Google Account to get an API key from Google.

Full Changelog: 0.3.6...0.4.0

Assets 5

Uh oh!

Releases: rootiest/obsidian-ai-image-ocr

0.9.1 Fixes for Obsidian plugin Standards

Update settings to use Obsidian APIs and fix type casting

What's Changed

Contributors

Uh oh!

0.9.0

🚀 Version 0.9.0 — Embed Source Images

New Features

Fixes

⚠️ Known Limitations

Uh oh!

0.8.0 Batch Image Extraction and Enhanced Templating

🚀 Version 0.8.0 — Batch Image Extraction and Enhanced Templating

New Features

Fixes

⚠️ Known Limitations

What's Changed

Contributors

Uh oh!

0.7.0 Local and Custom Models

Features

Fixes

Development Updates

Repository Updates

Submitting Provider Requests

Notes

Uh oh!

0.6.0-beta.3 Custom Providers

Warning: This is a pre-release!

Features

Notes

Uh oh!

0.6.0-beta.2 LMStudio Local Models

Warning: This is a pre-release!

Features

Notes

Beta Notes

Uh oh!

0.6.0-beta.1 Beta Test: Ollama Support

Warning: This is a pre-release!

Features

Notes

Beta Notes

Uh oh!

0.5.0 New OpenAI Models

Features

Style

Uh oh!

0.4.5 New Gemini Models

Features

Style

Uh oh!

0.4.0 Free Tier FTW

Features

Uh oh!