Skip to content
jabberjabberjabber edited this page Mar 4, 2025 · 8 revisions

How It Works

LLMImageIndexer processes images using a combination of local file system operations, image metadata handling, and AI-powered analysis. Here's a detailed breakdown of the process:

Image Discovery

  • The tool recursively scans the specified directory (unless the "no crawl" option is set) for supported image file formats.
  • Supported formats include JPEG, PNG, GIF, TIFF, BMP, WEBP, HEIF, and various RAW formats (ARW, CR2, DNG, NEF, ORF, PEF, RAF, RW2, SRW, etc.).
  • If supported files are found in a directory, they will be added to the queue and the number of files added are displayed in the GUI.

Metadata Extraction

  • ExifTool is used to verify and extract metadata from each image file.
  • Extracted fields include those containing descriptions or captions, keywords.
  • Metadata is also checked to see if it contains a unique file identifier and status marked indicating previous processing with the indexing tool.

Database Management

As of the March 2025 update, this tool no longer uses a database to track file processing. File processing status is now contained inside the individual file metadata in order to more easily allow file movement and name changes. There is a new option which will infer the file status from the presence of existing metadata and update the files with the new status marker if needed.

Image Preprocessing

  • Image preprocessing and encoding takes place in the KoboldAPI-Python library.
  • The stored images are not modified. This tool only writes to the image metadata; all processing occurs in memory.
  • Images are scaled a maximum dimension of 560 pixels at common patch size multiples using bicubic resizing. It is almost never beneficial to process images through a visual language model at a larger resolution for the purposes of captioning and keywording.
  • The images are converted to JPEG at quality 95 using RGB and encoded as base64 strings.

LLM Querying

  • Communication with the LLM is handled by the KoboldAPI-Python library via the KoboldCPP API (default: http://localhost:5001).
  • It determines the instruction prompt template to use by asking for the running model and parsing out the name.
  • It sends a POST request to the /api/v1/generate endpoint with the base64-encoded image as a single item in a list along with an instruction prompt.
  • If a detailed caption is requested it sends two queries for each image, one for a caption and one for keywords
  • If a short caption is requested it sends one query for a caption and keywords

AI Response Processing

  • The AI's response is cleaned and parsed for valid caption and/or keywords
  • If a detailed caption has been requested it will dump the response directly into the caption field with no attempted parsing
  • The keyword response must contain a "Keywords" entry. If it does not it will be rejected and tried again, unless the option is selected to fail without retry

Keyword Handling

  • If it is marked as failed the image will be marked as failed in the metadata status and the tool will move to the next file.
  • Files marked as failed can be reprocessed by running the processor again with the option to reprocess failed files. If this is done then previously successful files will be skipped, so you can run it on a directory without worrying about reprocessing every file again.
  • The parsed keywords are run through a filter which does the following for each keyword entry:
    1. Splits unhyphenated compound words on internal capitals (GrandCanyon becomes Grand Canyon)
    2. Ensures a total of 2 words or fewer unless middle word is 'and'/'or' ('bread and butter' is allowed but 'bread with butter' is discarded)
    3. Counts a hyphen between alphanumeric chars as two words ('close-up' is counted as if it were 'close up', not 'closeup')
    4. Ensures they do not start with 3 or more digits ('3D' is allowed; '2024' is discarded)
    5. Each word must be 2 or more chars ('3D videos' is allowed; '3 videos' is discarded)
    6. Removes all non-alphanumeric except spaces and valid hyphens ('tall_man' or 'tall.man' becomes 'tall man')
    7. Checks against words that models commonly used when confused, such as the words in the instructions
    8. Keywords are always converted to lowercase
  • An error may result if the AI attempts to use a character encoding not compatible with the processor, like an emoji

Metadata Updating

  • The tool updates the image metadata with the new keywords using ExifTool
  • Unless "Do not make backups" is specified, ExifTool will make a copy of the image and store it as filename.extension_original
  • The tool will not delete metadata from fields it does not write to. If data exists in fields which are not modified by the tool, the data will still be there after processing. This may result in redundant metadata in separate fields. This tool's purpose is to add data to the appropriate fields and it is not meant to be a metadata management system, so if you have a need to modify your metadata in other ways please use a different tool
  • Keywords are placed in MWG:Keywords and captions are placed in MWG:Description using ExifTool. The MWG tag is not where the metadata actually goes since it does not exist. Exitool determines the appropriate location depending on different criteria. Please see the Exiftool documentation for more details
  • XMP:Identifier and XMP:Status are used to track the processing status of the files. This allows you to rename or move the files without having to reprocess them if you want to run the tool again and the files are in the location you are processing
  • If the "pretend mode" option is set, no actual changes are made to the files

#Error Handling and Retries

  • If a file fails processing, it is retried once unless the "quick fail" option is selected
  • When processing fails it is usually because the model does not give back data that can be parsed as keywords, and a second shot will get a valid generation
  • If a file fails again, it's marked as "failed" in the database, and the failure is reported in the GUI
  • Failed files can be processed again along with any unprocessed files by checking the appropriate box

Keyword Post-Processing

Post processing of keywords is an ongoing project. The current implementation is available at this repo.

GUI Feedback

The GUI provides real-time updates on processing status, including:

  • Number of files processed and remaining
  • Processing time per image and average processing time
  • Keywords and captions written to image
  • The data in the captions and keywords result are the metadata that are going to be in the image and may consist of previous keywords and captions if the selections to append or update them were chosen, or if no caption was written and one existed, you will see the old caption
  • Any errors or warnings encountered

Settings

  • API URL: The address for the KoboldCpp API server
  • Password: Only needed if you set a password via KoboldCpp, used to access the API
  • System Instruction: This will be whatever the model is trained to use. Best not to mess with it unless you know what you are doing
  • Caption Instruction: Tells the model how to create a detailed caption. Set to whatever you like, but the default works fine
  • Generate detailed caption: Will use a generation to create a caption, and another generation to create keywords. You end up with a much more detailed caption at the expense of twice the compute time. Usually not worth it
  • Generate short caption: the default. Caption is generated along with keywords
  • No caption: Use this only if you don't want to overwrite an existing caption. It does not save any compute time
  • Don't crawl subdirectories: Will only look for images in the directory you specify, and will not go into any others inside it
  • Reprocess all files again: Regardless of previous processing status, reprocess all images. This is useful if you want to add more keywords with a second processing step by using it along with the "Add to existing keywords" option. Best results in a different model is used for each processing
  • Reprocess failed files: does what it says
  • If file has UUID, mark status: This will look for a UUID in the file which was set by the tool. If it finds one, it will see if there are keywords in the metadata and if so mark the file status as 'success'. This allows you to run it on files previously process by and older version that used a database for marking status without having to reprocess every file again. Once the file has the status set it will be just like any other file processed by the new version of the tool
  • No file checking: This will skip the file verification step. Only use this if you are having a problem with valid files being skipped. It may cause the indexer to freeze if files with errors are encountered
  • Pretend mode / Dry run: Let's you see what output you would get from the LLM without actually writing to any files
  • Quick fail: If any kind of error occurs parsing the data from the LLM, don't bother retrying it and mark the file failed and move on. Use this if you are in a hurry
  • Add new keywords to existing keywords: Will append the generated keywords to any existing keywords. If this isn't checked and there are keywords in the field that exiftool writes the new keywords to, they will be overwritten
  • Add new caption to existing caption with : If a caption is generated and a caption already exists in the field exiftool writes the caption to, it will wrap the generated caption with and and append it to the end of the existing one

Notes

  • The current prompt has been extensively tested for effectiveness but any custom prompt can be entered into the prompt field if desired
  • Custom prompts are not saved between between sessions
  • The tool will write to MWG:Keywords, MWG:Description, XMP:Identifier and *XMP:Status. For information about metadata tags, see the exiftool documentation
  • This tool is meant to do one thing, and to do it well. Do not expect it to do anything but generate keywords and a caption for a directory of image files
  • That said, ideas for features or fixes will be considered
  • This is tool is under active development and the field of AI and machine learning is progressing at an incredibly fast pace. If you rely on the tool to act a specific way, keep that version and use it. Do not expect this tool to behave consistently over the course of revisions. Test each update before use on critical files
  • Always back up files before operating on them with this tool. The backups made by the tool should not be relied on

Troubleshooting

  • See this guide for help choosing models and projectors.
  • If you encounter issues with ExifTool, ensure it's properly installed and accessible in your system PATH.
  • Make sure KoboldCPP is running and the API URL in the GUI matches the KoboldCPP endpoint.
  • Check the output area in the GUI for error messages and warnings
  • If you see files being added to the queue and then being removed without being processed, you are running it on already processed files. To re-process them, check 'reprocess all files'
  • If the tool freezes or fails on a large number of files, check them for corruption or bad metadata. Just because they will open in an image viewer does not mean they do not have bad data in them which prevents operation by this tool
  • If it takes a considerable amount of time, make sure your computer is fast enough. You should have a Mac with unified memory or a dedicated GPU in your system with at least 8GB of VRAM to get decent speeds (up to 10 seconds per image). Without that, your speeds will vary but may take more than a minute per image on older machines!
  • If the output window says it finished but it didn't do anything, make sure the folder you pointed it to exists.
  • Make sure you are always using the latest version of KoboldCPP. It gets updated very frequently (multiple times per month).
  • If the downloaded model is not to your liking, is too slow, or to big, you can choose any model you like! Find a gguf and matching projector and run koboldcpp.exe and load them and then run llmii-no-kobold.bat
  • If you are using your own model and are getting strange results, make sure that the projector matches (Llama-3.1-8b must have a matching Llama-3.1-8b-mmproj with it, for example)
  • If you are using your own model and are still getting strange results, make sure that the name of the gguf matches at least partially the name of the base model it was trained on (llava-v1.6-34b.gguf should be named llava-vicuna-v1.6-34b.gguf for example). This is because the prompt templates are chosen automatically using the model's filename!
  • On macOS or Linux, if you get a "permission denied" error when running KoboldCPP, make sure you've made the binary executable with chmod +x.
Clone this wiki locally