Skip to content
jabberjabberjabber edited this page Mar 11, 2025 · 8 revisions

How It Works

LLMImageIndexer processes images using a combination of local file system operations, image metadata handling, and AI-powered analysis. Here's a detailed breakdown of the process:

Image Discovery

  • The tool recursively scans the specified directory (unless the "no crawl" option is set) for supported image file formats.
  • Supported formats include JPEG, PNG, GIF, TIFF, BMP, WEBP, HEIF, and various RAW formats (ARW, CR2, DNG, NEF, ORF, PEF, RAF, RW2, SRW, etc.).
  • If supported files are found in a directory, they will be added to the queue and the number of files added are displayed in the GUI.

Metadata Extraction

  • ExifTool is used to verify and extract metadata from each image file.
  • Extracted fields include those containing descriptions or captions, keywords.
  • Metadata is also checked to see if it contains a unique file identifier and status marked indicating previous processing with the indexing tool.

Image Preprocessing

  • Image preprocessing and encoding takes place in the KoboldAPI-Python library.
  • The stored images are not modified. This tool only writes to the image metadata; all processing occurs in memory.
  • Images are scaled to fit common patch size multiples using bicubic resizing.
  • The images are converted to JPEG at quality 95 using RGB and encoded as base64 strings.

LLM Querying

  • Communication with the LLM is handled by the KoboldAPI-Python library via the KoboldCPP API (default: http://localhost:5001).
  • It determines the instruction prompt template to use by asking for the running model and parsing out the name.
  • It sends a POST request to the /api/v1/generate endpoint with the base64-encoded image as a single item in a list along with an instruction prompt.
  • If a detailed caption is requested it sends two queries for each image, one for a caption and one for keywords
  • If a short caption is requested it sends one query for a caption and keywords

AI Response Processing

  • The AI's response is cleaned and parsed for valid caption and/or keywords
  • If a detailed caption has been requested it will dump the response directly into the caption field with no attempted parsing
  • The keyword response must contain a "Keywords" entry. If it does not it will be rejected and tried again, unless the option is selected to fail without retry

Keyword Handling

  • If it is marked as failed the image will be marked as failed in the metadata status and the tool will move to the next file.
  • Files marked as failed can be reprocessed by running the processor again with the option to reprocess failed files. If this is done then previously successful files will be skipped, so you can run it on a directory without worrying about reprocessing every file again.
  • The parsed keywords are run through a filter which does the following for each keyword entry:
    1. Splits unhyphenated compound words on internal capitals (GrandCanyon becomes Grand Canyon)
    2. Ensures a total of 2 words or fewer unless middle word is 'and'/'or' ('bread and butter' is allowed but 'bread with butter' is discarded)
    3. Counts a hyphen between alphanumeric chars as two words ('close-up' is counted as if it were 'close up', not 'closeup')
    4. Ensures they do not start with 3 or more digits ('3D' is allowed; '2024' is discarded)
    5. Each word must be 2 or more chars ('3D videos' is allowed; '3 videos' is discarded)
    6. Removes all non-alphanumeric except spaces and valid hyphens ('tall_man' or 'tall.man' becomes 'tall man')
    7. Checks against words that models commonly used when confused, such as the words in the instructions
    8. Keywords are always converted to lowercase
  • An error may result if the AI attempts to use a character encoding not compatible with the processor, like an emoji

Metadata Updating

  • The tool updates the image metadata with the new keywords using ExifTool
  • Unless "Do not make backups" is specified, ExifTool will make a copy of the image and store it as filename.extension_original
  • The tool will not delete metadata from fields it does not write to. If data exists in fields which are not modified by the tool, the data will still be there after processing. This may result in redundant metadata in separate fields. This tool's purpose is to add data to the appropriate fields and it is not meant to be a metadata management system, so if you have a need to modify your metadata in other ways please use a different tool
  • Keywords are placed in MWG:Keywords and captions are placed in MWG:Description using ExifTool. The MWG tag is not where the metadata actually goes since it does not exist. Exitool determines the appropriate location depending on different criteria. Please see the Exiftool documentation for more details
  • XMP:Identifier and XMP:Status are used to track the processing status of the files. This allows you to rename or move the files without having to reprocess them if you want to run the tool again and the files are in the location you are processing
  • If the "pretend mode" option is set, no actual changes are made to the files

#Error Handling and Retries

  • If a file fails processing, it is retried once unless the "quick fail" option is selected
  • When processing fails it is usually because the model does not give back data that can be parsed as keywords, and a second shot will get a valid generation
  • If a file fails again, it's marked as "failed" in the database, and the failure is reported in the GUI
  • Failed files can be processed again along with any unprocessed files by checking the appropriate box

Keyword Post-Processing

Post processing of keywords is an ongoing project. The current implementation is available at this repo.

GUI Feedback

The GUI provides real-time updates on processing status, including:

  • Number of files processed and remaining
  • Processing time per image and average processing time
  • Keywords and captions written to image
  • The data in the captions and keywords result are the metadata that are going to be in the image and may consist of previous keywords and captions if the selections to append or update them were chosen, or if no caption was written and one existed, you will see the old caption
  • Any errors or warnings encountered

Settings

Press help button in Settings.

Notes

  • The current prompt has been extensively tested for effectiveness but any custom prompt can be entered into the prompt field if desired
  • The tool will write to MWG:Keywords, MWG:Description, XMP:Identifier and *XMP:Status. For information about metadata tags, see the exiftool documentation
  • This tool is meant to do one thing, and to do it well. Do not expect it to do anything but generate keywords and a caption for a directory of image files
  • That said, ideas for features or fixes will be considered
  • This is tool is under active development and the field of AI and machine learning is progressing at an incredibly fast pace. If you rely on the tool to act a specific way, keep that version and use it. Do not expect this tool to behave consistently over the course of revisions. Test each update before use on critical files
  • Always back up files before operating on them with this tool. The backups made by the tool should not be relied on

Troubleshooting

  • See this guide for help choosing models and projectors.
  • If you encounter issues with ExifTool, ensure it's properly installed and accessible in your system PATH.
  • Make sure KoboldCPP is running and the API URL in the GUI matches the KoboldCPP endpoint.
  • Check the output area in the GUI for error messages and warnings
  • If you see files being added to the queue and then being removed without being processed, you are running it on already processed files. To re-process them, check 'reprocess all files'
  • If the tool freezes or fails on a large number of files, check them for corruption or bad metadata. Just because they will open in an image viewer does not mean they do not have bad data in them which prevents operation by this tool
  • If it takes a considerable amount of time, make sure your computer is fast enough. You should have a Mac with unified memory or a dedicated GPU in your system with at least 8GB of VRAM to get decent speeds (up to 10 seconds per image). Without that, your speeds will vary but may take more than a minute per image on older machines!
  • If the output window says it finished but it didn't do anything, make sure the folder you pointed it to exists.
  • Make sure you are always using the latest version of KoboldCPP. It gets updated very frequently (multiple times per month).
  • If the downloaded model is not to your liking, is too slow, or to big, you can choose any model you like! Find a gguf and matching projector and run koboldcpp.exe and load them and then run llmii-no-kobold.bat
  • If you are using your own model and are getting strange results, make sure that the projector matches (Llama-3.1-8b must have a matching Llama-3.1-8b-mmproj with it, for example)
  • If you are using your own model and are still getting strange results, make sure that the name of the gguf matches at least partially the name of the base model it was trained on (llava-v1.6-34b.gguf should be named llava-vicuna-v1.6-34b.gguf for example). This is because the prompt templates are chosen automatically using the model's filename!
  • On macOS or Linux, if you get a "permission denied" error when running KoboldCPP, make sure you've made the binary executable with chmod +x.
Clone this wiki locally