Image Classification Thresholds and Logic #2072

virtualarchitectures · 2025-08-13T15:48:00Z

virtualarchitectures
Aug 13, 2025

Hi, I'm trying to use the image classification logic in Python. I first tested using the command line as per the documentation:

docling --enrich-picture-classes FILE

Doing this I can see that Docling add's the identified class before the base64 encoded image in the markdown output. Using the example code ( https://docling-project.github.io/docling/usage/enrichments/#picture-classification ) for Python I can access the annotations and see the scores for each class using something like the following:

for element, _level in docling_doc.iterate_items():
    if isinstance(element, PictureItem): 
        print(element.annotations)

However, I'm not sure what thresholds or logic Docling is using to extract the preferred class based on the scores provided. I looked in the code on GitHub here but I couldn't see the logic for selecting the preferred category: https://github.com/docling-project/docling/blob/main/docling/models/document_picture_classifier.py

I did search the wider codebase but didn't spot what I was looking for. Could you please point me to the part of the code where selection of the preferred class is implemented so I can replicate this in my Python script?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image Classification Thresholds and Logic #2072

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Image Classification Thresholds and Logic #2072

Uh oh!

virtualarchitectures Aug 13, 2025

Replies: 0 comments

virtualarchitectures
Aug 13, 2025