Skip to content

Releases: rootiest/obsidian-ai-image-ocr

0.9.1 Fixes for Obsidian plugin Standards

25 Aug 15:33
7d1e01c

Choose a tag to compare

Update settings to use Obsidian APIs and fix type casting

  • Use ctx?.file?.path directly instead of casting to any.
  • Change heading "Single image extraction" to "Batch image extraction".
  • Use new Setting(containerEl).setName().setHeading() for section headings.
  • Import moment from obsidian instead of accessing (window as any).moment.

What's Changed

Full Changelog: 0.9.0...0.9.1

0.9.0

25 Aug 15:20
3fa3de8

Choose a tag to compare

🚀 Version 0.9.0 — Embed Source Images

This release introduces the first working implementation of batch image extraction for the AI Image OCR plugin.


New Features

  • Allows embedding source image in output template
  • Implements a debug mode for extended console output

Fixes

  • Improved formatting across the plugin:
    • Follows Obsidian plugin guidelines (sentence case, etc)
  • Use Obsidian methods and standards:
    • Follows Obsidian plugin guidelines
    • Eliminates need for hosted CORS proxy

⚠️ Known Limitations

  • Token limits are not currently checked — large batches may cause errors or fail silently.
    • This is particularly relevant with OpenAI models

Full Changelog: 0.8.0...0.9.0

0.8.0 Batch Image Extraction and Enhanced Templating

05 Jul 23:02
09b277f

Choose a tag to compare

🚀 Version 0.8.0 — Batch Image Extraction and Enhanced Templating

This release introduces the first working implementation of batch image extraction for the AI Image OCR plugin.


New Features

  • Folder-based image selection
  • Processes all valid images in a folder and sends them as a single API request
  • Basic output directly to the current note
  • Output behavior options:
    • One note per image
    • One combined note with custom separator text
    • Output inline in current note with custom separator text
  • Improved file filtering (skips non-images, corrupt files, etc.)
  • Separate templating options for batched and single image outputs:
    • Customizable header/footer for the entire batch output
    • Customizable header/footer for each image within a batch
    • Distinct file naming and path rules for batched vs. single-image output
    • Added footer setting for single-image extraction
  • Enhanced output templating with dynamic placeholders:
    • General: {{model.name}}, {{provider.name}}, etc.
    • Image metadata: {{image.name}}, {{image.dimensions}}, etc.
    • Embed metadata: {{embed.altText}}, {{embed.url}}, etc.
  • Added Wiki for detailed documentation
  • Updated README to reflect new features and Wiki reference.

Tip

Refer to the Templating Guide in the Wiki for a full list of supported placeholders.


Fixes

  • Major refactor to improve reliability of all functions
  • Improved formatting across the plugin:
    • Follows Obsidian plugin guidelines (sentence case, etc)
  • Fixed typos in settings descriptions

⚠️ Known Limitations

  • Token limits are not currently checked — large batches may cause errors or fail silently.

What's Changed

Full Changelog: 0.7.0...0.8.0

0.7.0 Local and Custom Models

02 Jul 02:12
4cfed6c

Choose a tag to compare

This release implements several new model options and custom prompt text.
The code base has been refactored to aid in future maintenance and updates.
This release also rolls out some GitHub features for user interaction and issue reporting.

Please read the notes at the end of this release text.

Features

  • Adds support for Ollama local models
  • Adds support for LM Studio local models
  • Adds support for any custom OpenAI-compatible models (local or remote)
  • Adds custom prompt text option

Fixes

  • Several improvements to settings page style and content

Development Updates

  • Code base has been refactored into multiple files for ease of readability and maintenance
  • Descriptive comments added to all functions and classes to describe their purpose
  • Removed a few unused helper functions

Repository Updates

  • The Issues Tracker now has templates for Bug Reports and Feature Requests.
  • Discussions page:
    • Q&A for user questions and support
    • Show and Tell for sharing tips and tricks such as provider/model configurations or prompt text
    • General for other general topics
    • Announcements and Polls will also be posted here
    • New releases will be posted in the Announcements for release-specific discussion.
  • The Readme now contains a Roadmap section
    • This section lists potential future enhancements and features
    • The items listed are prospective. No roadmap features are guaranteed.
    • Features may be removed without implementation or held in the roadmap indefinitely
    • Features may not (will not) be implemented in the order they are listed

Submitting Provider Requests

  1. Please try using the Custom OpenAI-compatible provider option first.
  2. Verify your API endpoint, API key, and Model ID are correct.
  3. Confirm the model being used works with image input (Vision and/or multi-modal support)
  4. Verify that the image you wish to extract from works on other providers
    You may use the free Gemini model as a control for this step
  5. If possible, identify the part of the API that differs from the OpenAI API Reference
    We use the OpenAI REST API to make requests.
  6. Create a Feature Request in the Issues Tracker with the following details:
    • The website and name of the provider you are trying to use
    • The API endpoint
    • The model you are using
    • Any error messages returned (Check the Obsidian Dev Console)
    • Describe the issue and steps you have tried to fix it
    • If possible, include a link the to the API docs and/or any tips on what differs from the standard API
    • If you have submitted or found a PR related to this issue, please include a link to it
    • Feel free to include any other relevant information

Please note that I cannot test all providers and models.
If your provider or model does not offer a free API then I will be unable to test it and likely unable to support it.
Please still submit a Request. Exceptions or other arrangements can be made in some cases.

If we compile enough data on providers/models/etc I will add a wiki with a database of user reports on compatibility, syntax, and other details like the best models or prompt text to use.

Notes

  • Discussions and Issues: Please use the Issues Tracker only for Feature Requests and Bug Reports.
    The Discussions page is available for user support, questions, or other topics.
  • Local models require a user-provided local model service. See Ollama or LM Studio for more details.
  • Custom providers must be compatible with the OpenAI API. Several examples are listed in the README.
    Users are responsible for correctly entering the API endpoint and model ID.
  • Some custom providers (particular remote providers) require an API key. If yours does not, leave that field blank.
  • Models must support Vision and/or multi-modal input to parse images.
  • Not all models that support images are trained to recognize text in images. YMMV.
  • In general, larger models like OpenAI and Gemini tend to perform better at this task than open-source or local models.

Full Changelog: 0.5.0...0.7.0

0.6.0-beta.3 Custom Providers

01 Jul 17:20
8f263a6

Choose a tag to compare

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.

This pre-release adds an option to use any custom OpenAI-compatible model.
This adds support for local or remote providers which comply precisely with the OpenAI API structure.

Features

  • Custom Providers:
    You can now connect your plugin to any OpenAI-compatible endpoints for image extraction.
    The following settings options are now available:
    • Provider: Custom OpenAI-compatible
    • AI Endpoint: This is where you enter the full endpoint address (including the /v1/chat/completions, etc)
    • Model ID: This is the model you wish to use on that provider
    • API Key: This is the API key for the provider (leave empty when no key is required)

Notes

  • Custom provider support is dependent on strict compatibility with the OpenAI API format.
  • If your provider has variations (particularly with image attachment formats) then it may not work with this option.
  • Model must have Vision and/or multi-modal support. Image attachments will fail on unsupported models.
  • Not all vision models are trained for text recognition (even if they perform well at describing an image)
  • Some models may return hallucinated text rather than a clear failure when they are unable to process the image.
  • I cannot test all providers and models. If your preferred provider doesn't work, let me know and I will look into it.
    • Understand that for testing-cost reasons I cannot implement tailored support for providers who don't have a free API.

Full Changelog: 0.5.0...0.6.0-beta.3

0.6.0-beta.2 LMStudio Local Models

29 Jun 20:53
c46e81b

Choose a tag to compare

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.
Please read the release notes of the previous beta release as well.

This release adds support for LMStudio with mostly the same functions and behavior as the Ollama integration.

Features

  • LMStudio Integration:
    You can now connect your plugin to local LMStudio server endpoints for image extraction.
    The following settings options are now available:
    • Provider: LMStudio (Local)
    • LMStudio Server Url: This is where you enter the endpoint address and port
    • LMStudio Model Name: This is the model you wish to use (you must download and install it in advance)

Notes

  • LMStudio is now supported. However there are several caveats:
    • All of the caveats mentioned in the previous beta release regarding Ollama also apply here.
    • Additionally, LMStudio models sometimes appear to use different prompt formats that what is supported.
    • Preliminary testing shows the current format works with google/gemma-3-4b and qwen/qwen2.5-vl-7b
    • Models must have Vision listed in their capabilities and use the same prompt format as the above models. (most of them should)
    • If a model fails via the plugin but works in the LMStudio UI, please let me know and I will try to look into it.

Beta Notes

  • Beta warning: This is an early release of LMStudio support. Some rough edges and surprise gremlins may appear!
  • Your feedback is super valuable! Please report bugs, unexpected behavior, or even weird vibes.
  • I may not be able to test every model so reports on which models perform well/poorly are very helpful!

Full Changelog: 0.5.0...0.6.0-beta.2

0.6.0-beta.1 Beta Test: Ollama Support

29 Jun 05:01
ee3d6af

Choose a tag to compare

Pre-release

Warning: This is a pre-release!

This release has not been fully tested and you may encounter bugs or other issues.

This plugin does not spawn an Ollama server for you and it does not download models for you.
It allows you to use an Ollama server you have already set up.

Please understand that local Ollama models may not (often don't) perform as well as the cloud providers.
Currently out of the few models I've tried I have had the best results with llama3.2-vision.
However I have not tested all the vision-capable models and there may be others that perform better.

Features

  • Ollama Integration:
    You can now connect your plugin to local Ollama endpoints for image extraction.
    The following settings options are now available:
    • Provider: Ollama (Local)
    • Ollama Server Url: This is where you enter the endpoint address and port
    • Ollama Model Name: This is the model you wish to use (you must download and install it in advance)

Notes

  • Ollama is now supported. However there are several caveats:
    • This plugin does not provide or initialize an Ollama server.
    • You are required to install, initialize, and host the server on your own.
    • ONLY "multi-modal" or "vision" models are supported. Other models are unable to parse images in prompts.
    • Some models perform better than others.
    • Not all vision models are trained for text recognition (even if they perform well at describing an image)
    • Beware that sometimes the model may not respond with "Unable to find text in the image" if it fails.
    • Models will occasionally return hallucinated text if the model fails to locate the text or doesn't support vision.

Beta Notes

  • Beta warning: This is an early release of Ollama support. Some rough edges and surprise gremlins may appear!
  • Your feedback is super valuable! Please report bugs, unexpected behavior, or even weird vibes.
  • I may not be able to test every model so reports on which models perform well/poorly are very helpful!

Full Changelog: 0.5.0...0.6.0-beta.1

0.5.0 New OpenAI Models

28 Jun 18:25
eacda13

Choose a tag to compare

This release adds additional OpenAI models.

Features

  • Add new models:
    • OpenAI GPT 4o-mini: Lower cost and latency than GPT-4o
    • OpenAI GPT 4.1: Successor to GPT-4, optimized for production use
    • OpenAI GPT 4.1-mini: Lightweight version of GPT-4.1
    • OpenAI GPT 4.1-nano: Extremely low-latency and low-cost version of GPT-4.1

Style

  • Make text of notification during extraction more descriptive.

Full Changelog: 0.4.5...0.5.0

0.4.5 New Gemini Models

28 Jun 17:29
46cfc51

Choose a tag to compare

This release adds additional Gemini models.

Features

  • Add new models:
    • Gemini Flash-Lite: This model has increased rate limits on the free tier: 1,000 requests per day.
    • Gemini Flash Pro: This model is slower and only available to paid tiers but could potentially produce more accurate results.

Style

  • Add short model descriptions to settings page.

Full Changelog: 0.4.0...0.4.5

0.4.0 Free Tier FTW

25 Jun 21:54
721c024

Choose a tag to compare

This release upgrades the Gemini model to Flash 2.5 as the Flash 1.5 model is being deprecated by Google.

Features

  • Replace Gemini Flash 1.5 with Gemini Flash 2.5
  • Gemini Flash 2.5 (currently) has NO daily rate limits and a 250 responses per day limit,
    effectively making it FREE to use with this plugin! (within the rate limits)
  • The free tier does not require any payment provider (no credit card/etc required)
  • All you need is a Google Account to get an API key from Google.

Full Changelog: 0.3.6...0.4.0