Skip to content

[BUG CLIENT]: Batch Annotation OCR Doesn't Work #246

@Arminkhayati

Description

@Arminkhayati

Python -VV

Python 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]

Pip Freeze

aiofiles==24.1.0
aiohappyeyeballs==2.6.1
aiohttp==3.11.16       
aiosignal==1.3.2       
aiosqlite==0.21.0
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.9.0
argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work
argon2-cffi-bindings @ file:///C:/b/abs_f11axiliot/croot/argon2-cffi-bindings_1736182463870/work
asgiref==3.8.1
asttokens @ file:///C:/b/abs_9662ywy9fp/croot/asttokens_1743630464377/work
async-lru @ file:///C:/b/abs_e0hjkvwwb5/croot/async-lru_1699554572212/work
async-timeout==4.0.3
atomic-agents==1.1.3
attrs @ file:///C:/b/abs_89hmquz5ga/croot/attrs_1734533130810/work
babel @ file:///C:/b/abs_ffzt1bmjth/croot/babel_1737454394148/work
backoff==2.2.1
banks==2.1.2
bcrypt==4.3.0
beautifulsoup4 @ file:///C:/b/abs_d5wytg_p0w/croot/beautifulsoup4-split_1718029833749/work
bleach @ file:///C:/b/abs_925i9psm3u/croot/bleach_1732292896852/work
Brotli @ file:///C:/b/abs_c415aux9ra/croot/brotli-split_1736182803933/work
build==1.2.2.post1
cachetools==5.5.2
certifi @ file:///C:/b/abs_8a944p1_gn/croot/certifi_1738623753421/work/certifi
cffi @ file:///C:/b/abs_29_b57if3f/croot/cffi_1736184144340/work
chardet==3.0.4
charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work
chroma-hnswlib==0.7.6
chromadb==0.6.3
click==8.1.8
cobble==0.1.4
colorama @ file:///C:/b/abs_a9ozq0l032/croot/colorama_1672387194846/work
coloredlogs==15.0.1
comm @ file:///C:/b/abs_67a8058udb/croot/comm_1709322909844/work
contourpy==1.3.2
cryptography==44.0.2
cycler==0.12.1
dataclasses-json==0.6.7
debugpy @ file:///C:/b/abs_bf9oo2vhxp/croot/debugpy_1736269476451/work
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work
Deprecated==1.2.18
dill==0.4.0
dirtyjson==1.0.8
distro==1.9.0
docling==2.31.0
docling-core==2.30.0
docling-ibm-models==3.4.3
docling-parse==4.0.1
docstring_parser==0.16
durationpy==0.9
easyocr==1.7.2
effdet==0.4.1
emoji==2.14.1
et_xmlfile==2.0.0
eval_type_backport==0.2.2
exceptiongroup @ file:///C:/b/abs_c5h1o1_b5b/croot/exceptiongroup_1706031441653/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
fastapi==0.115.9
fastjsonschema @ file:///C:/b/abs_4ev90296ly/croot/python-fastjsonschema_1731939386416/work
filelock==3.13.1
filetype==1.2.0
fire==0.7.0
flatbuffers==25.2.10
fonttools==4.57.0
frozenlist==1.5.0
fsspec==2024.6.1
gitdb==4.0.12
GitPython==3.1.44
google-api-core==2.24.2
google-auth==2.39.0
google-cloud-vision==3.10.1
googleapis-common-protos==1.70.0
greenlet==3.1.1
griffe==1.7.3
grpcio==1.72.0rc1
grpcio-status==1.72.0rc1
h11==0.16.0
h2==3.2.0
hpack==3.0.0
hstspreload==2025.1.1
html5lib==1.1
httpcore==1.0.9
httptools==0.6.4
httpx==0.28.1
httpx-sse==0.4.0
huggingface-hub==0.30.2
humanfriendly==10.0
hyperframe==5.2.0
idna==2.10
imageio==2.37.0
importlib_metadata==8.6.1
importlib_resources==6.5.2
iniconfig==2.1.0
instructor==1.8.3
ipykernel @ file:///C:/b/abs_6c9ggygp01/croot/ipykernel_1737660720620/work
ipython @ file:///C:/b/abs_8eyhzleyrk/croot/ipython_1734548134403/work
ipywidgets @ file:///C:/b/abs_f03en2fv37/croot/ipywidgets_1733504604628/work
jedi @ file:///C:/b/abs_3a2kbnlclc/croot/jedi_1733987412687/work
Jinja2==3.1.4
jiter==0.8.2
joblib==1.4.2
json5 @ file:///C:/b/abs_743lprxrv5/croot/json5_1730786818336/work
jsonlines==3.1.0
jsonpatch==1.33
jsonpointer==3.0.0
jsonref==1.1.0
jsonschema @ file:///C:/b/abs_394_t6__xq/croot/jsonschema_1728486718320/work
jsonschema-specifications @ file:///C:/b/abs_0brvm6vryw/croot/jsonschema-specifications_1699032417323/work
jupyter @ file:///C:/b/abs_32aohix0i1/croot/jupyter_1737645821205/work
jupyter-console @ file:///C:/b/abs_82xaa6i2y4/croot/jupyter_console_1680000189372/work
jupyter-events @ file:///C:/b/abs_9cm3qlticu/croot/jupyter_events_1741184612840/work
jupyter-lsp @ file:///C:/b/abs_ecle3em9d4/croot/jupyter-lsp-meta_1699978291372/work
jupyter_client @ file:///C:/b/abs_149bw133if/croot/jupyter_client_1737570986926/work
jupyter_core @ file:///C:/b/abs_beftpbuevw/croot/jupyter_core_1718818307097/work
jupyter_server @ file:///C:/b/abs_dd442s1uya/croot/jupyter_server_1741206396661/work
jupyter_server_terminals @ file:///C:/b/abs_adjrm9dtns/croot/jupyter_server_terminals_1744706714294/work
jupyterlab @ file:///C:/b/abs_adgv7wgabm/croot/jupyterlab_1737555444636/work
jupyterlab_pygments @ file:///C:/b/abs_d5alfet8m6/croot/jupyterlab_pygments_1741124274578/work
jupyterlab_server @ file:///C:/b/abs_fdi5r_tpjc/croot/jupyterlab_server_1725865372811/work
jupyterlab_widgets @ file:///C:/b/abs_70q5kcqvoa/croot/jupyterlab_widgets_1733441048419/work
kiwisolver==1.4.8
kubernetes==32.0.1
langchain==0.3.23
langchain-chroma==0.2.2
langchain-community==0.3.21
langchain-core==0.3.59
langchain-huggingface==0.1.2
langchain-ollama==0.3.2
langchain-openai==0.3.16
langchain-text-splitters==0.3.8
langdetect==1.0.9
langsmith==0.3.31
latex2mathml==3.78.0
lazy_loader==0.4
linkify-it-py==2.0.3
llama-cloud==0.1.19
llama-cloud-services==0.6.22
llama-index==0.12.35
llama-index-agent-openai==0.4.7
llama-index-cli==0.4.1
llama-index-core==0.12.35
llama-index-embeddings-openai==0.3.1
llama-index-indices-managed-llama-cloud==0.6.11
llama-index-llms-ollama==0.5.4
llama-index-llms-openai==0.3.38
llama-index-multi-modal-llms-openai==0.4.3
llama-index-program-openai==0.3.1
llama-index-question-gen-openai==0.3.0
llama-index-readers-file==0.4.7
llama-index-readers-llama-parse==0.4.0
llama-parse==0.6.22
lxml==5.3.2
mammoth==1.9.0
markdown-it-py==3.0.0
marko==2.1.3
MarkupSafe==2.1.5
marshmallow==3.26.1
matplotlib==3.10.1
matplotlib-inline @ file:///C:/ci/matplotlib-inline_1661934094726/work
mcp==1.9.3
mdit-py-plugins==0.4.2
mdurl==0.1.2
mistralai==1.9.1
mistune @ file:///C:/b/abs_77yql3poyz/croot/mistune_1741124004410/work
mmh3==5.1.0
monotonic==1.6
mpire==2.10.2
mpmath==1.3.0
multidict==6.4.3
multiprocess==0.70.18
mypy-extensions==1.0.0
natsort==8.4.0
nbclient @ file:///C:/b/abs_a09c4t3h8x/croot/nbclient_1741124030330/work
nbconvert @ file:///C:/b/abs_c27_60dzt8/croot/nbconvert-meta_1741191385337/work
nbformat @ file:///C:/b/abs_c2jkw46etm/croot/nbformat_1728050303821/work
nest-asyncio @ file:///C:/b/abs_65d6lblmoi/croot/nest-asyncio_1708532721305/work
networkx==3.3
ninja==1.11.1.4
nltk==3.9.1
notebook @ file:///C:/b/abs_b552cuftgu/croot/notebook_1738159967315/work
notebook_shim @ file:///C:/b/abs_9ctyfgpncn/croot/notebook-shim_1741707829491/work
numpy==1.26.4
oauthlib==3.2.2
olefile==0.47
ollama==0.4.8
omegaconf==2.3.0
onnx==1.17.0
onnxruntime==1.21.0
openai==1.77.0
opencv-python==4.11.0.86
opencv-python-headless==4.11.0.86
openparse==0.7.0
openpyxl==3.1.5
opentelemetry-api==1.32.0
opentelemetry-exporter-otlp-proto-common==1.32.0
opentelemetry-exporter-otlp-proto-grpc==1.32.0
opentelemetry-instrumentation==0.53b0
opentelemetry-instrumentation-asgi==0.53b0
opentelemetry-instrumentation-fastapi==0.53b0
opentelemetry-proto==1.32.0
opentelemetry-sdk==1.32.0
opentelemetry-semantic-conventions==0.53b0
opentelemetry-util-http==0.53b0
orjson==3.10.16
overrides @ file:///C:/b/abs_cfh89c8yf4/croot/overrides_1699371165349/work
packaging @ file:///C:/b/abs_3by6s2fa66/croot/packaging_1734472138782/work
pandas==2.2.3
pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work
parso @ file:///C:/b/abs_834b4mj92b/croot/parso_1733963322289/work
pdf2docx==0.5.8
pdf2image==1.17.0
pdfminer.six==20250327
pdfplumber==0.11.6
pi_heif==0.22.0
pikepdf==9.7.0
pillow==11.0.0
platformdirs @ file:///C:/b/abs_ddh15014or/croot/platformdirs_1744273060660/work
pluggy==1.5.0
posthog==3.24.1
prometheus_client @ file:///C:/b/abs_8b175q_ub8/croot/prometheus_client_1744271638821/work
prompt-toolkit @ file:///C:/b/abs_68uwr58ed1/croot/prompt-toolkit_1704404394082/work
propcache==0.3.1
proto-plus==1.26.1
protobuf==6.31.0rc2
psutil @ file:///C:/b/abs_b5gv3mn55h/croot/psutil_1736371546320/work
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
pyasn1==0.6.1
pyasn1_modules==0.4.2
pyclipper==1.3.0.post6
pycocotools==2.0.8
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodome==3.22.0
pydantic==2.11.3
pydantic-settings==2.8.1
pydantic_core==2.33.1
pyfiglet==1.0.3
Pygments @ file:///C:/b/abs_e4bg5vh5j_/croot/pygments_1744667628203/work
pylatexenc==2.10
PyMuPDF==1.25.5
pyparsing==3.2.3
pypdf==5.4.0
PyPDF2==3.0.1
pypdfium2==4.30.1
PyPika==0.48.9
pyproject_hooks==1.2.0
PyQt6==6.7.1
PyQt6_sip @ file:///C:/b/abs_f47pmp6xmn/croot/pyqt-split_1744804513582/work/pyqt_sip
pyreadline3==3.5.4
PySocks @ file:///C:/ci_310/pysocks_1642089375450/work
pytesseract==0.3.13
pytest==8.4.1
python-bidi==0.6.6
python-dateutil @ file:///C:/b/abs_3au_koqnbs/croot/python-dateutil_1716495777160/work
python-docx==1.1.2
python-dotenv==1.1.0
python-iso639==2025.2.18
python-json-logger @ file:///C:/b/abs_0cm_mnox0z/croot/python-json-logger_1734370042436/work
python-magic==0.4.27
python-multipart==0.0.20
python-oxmsg==0.0.2
python-pptx==1.0.2
pytz==2025.2
pywin32==308
pywinpty @ file:///C:/b/abs_883wh7sts8/croot/pywinpty_1741871674963/work
PyYAML @ file:///C:/b/abs_14xkfs39bx/croot/pyyaml_1728657968772/work
pyzmq @ file:///C:/b/abs_f3yte6j5yn/croot/pyzmq_1734711069724/work
qtconsole @ file:///C:/b/abs_077eiqc5gw/croot/qtconsole_1744633984381/work
QtPy @ file:///C:/b/abs_derqu__3p8/croot/qtpy_1700144907661/work
RapidFuzz==3.13.0
referencing @ file:///C:/b/abs_09f4hj6adf/croot/referencing_1699012097448/work
regex==2024.11.6
requests @ file:///C:/b/abs_c3508vg8ez/croot/requests_1731000584867/work
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rfc3339-validator @ file:///C:/b/abs_ddfmseb_vm/croot/rfc3339-validator_1683077054906/work
rfc3986==1.5.0
rfc3986-validator @ file:///C:/b/abs_6e9azihr8o/croot/rfc3986-validator_1683059049737/work
rich==13.9.4
rpds-py @ file:///C:/b/abs_0c6z5kcdb6/croot/rpds-py_1736545465023/work
rsa==4.9
rtree==1.4.0
safetensors==0.5.3
scikit-image==0.25.2
scikit-learn==1.6.1
scipy==1.15.2
semchunk==2.2.2
Send2Trash @ file:///C:/b/abs_7e73ol18dl/croot/send2trash_1736542724140/work
sentence-transformers==4.0.2
shapely==2.1.0
shellingham==1.5.4
sip @ file:///C:/b/abs_5cto136kse/croot/sip_1738856220313/work
six @ file:///C:/b/abs_149wuyuo1o/croot/six_1744271521515/work
smmap==5.0.2
sniffio @ file:///C:/b/abs_3akdewudo_/croot/sniffio_1705431337396/work
soupsieve @ file:///C:/b/abs_bbsvy9t4pl/croot/soupsieve_1696347611357/work
SQLAlchemy==2.0.40
sse-starlette==2.3.6
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
starlette==0.45.3
striprtf==0.0.26
sympy==1.13.1
tabulate==0.9.0
tenacity==9.1.2
termcolor==3.1.0
terminado @ file:///C:/b/abs_25nakickad/croot/terminado_1671751845491/work
textual==0.89.1
threadpoolctl==3.6.0
tifffile==2025.3.30
tiktoken==0.9.0
timm==1.0.15
tinycss2 @ file:///C:/b/abs_df38owi5ma/croot/tinycss2_1738337725183/work
tokenizers==0.21.1
tomli @ file:///C:/Windows/TEMP/abs_ac109f85-a7b3-4b4d-bcfd-52622eceddf0hy332ojo/croots/recipe/tomli_1657175513137/work
torch==2.6.0+cu118
torchaudio==2.6.0+cu118
torchvision==0.21.0+cu118
tornado @ file:///C:/b/abs_7cyu943ybx/croot/tornado_1733960510898/work
tqdm==4.67.1
traitlets @ file:///C:/b/abs_bfsnoxl4pq/croot/traitlets_1718227069245/work
transformers==4.51.3
typer==0.15.2
typing-inspect==0.9.0
typing-inspection==0.4.0
typing_extensions @ file:///C:/b/abs_0ffjxtihug/croot/typing_extensions_1734714875646/work
tzdata==2025.2
uc-micro-py==1.0.3
unstructured==0.17.2
unstructured-client==0.32.3
unstructured-inference==0.8.10
unstructured.pytesseract==0.3.15
urllib3 @ file:///C:/b/abs_7bst06lizn/croot/urllib3_1737133657081/work
uvicorn==0.34.1
watchfiles==1.0.5
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
webencodings==0.5.1
websocket-client @ file:///C:/b/abs_5dmnxxoci9/croot/websocket-client_1715878351319/work
websockets==15.0.1
widgetsnbextension @ file:///C:/b/abs_df0o5rkt3b/croot/widgetsnbextension_1733439593530/work
win-inet-pton @ file:///C:/ci_310/win_inet_pton_1642658466512/work
wrapt==1.17.2
XlsxWriter==3.2.3
yarl==1.19.0
zipp==3.21.0
zstandard==0.23.0

Reproduction Steps

Hi
I am trying to use batch Annotation OCR.
Below is the function I use to create an entry for a image_url to write it in a .jsonl file.
Then I'll upload this file and create a batch job.
After that I'll wait for SUCCESS status and download the result. This approach works perfectly for OCR. But I need annotation for my image OCR result. There is no instruction in documents on how to define document_annotation_format for batch OCR on images.

def create_batch_entry(image_url: str, index: int) -> Dict[str, Any]:

    entry = {
        "custom_id": str(index),
        "body": {
            "document": {
                "type": "image_url",
                "image_url": image_url
            },
            "include_image_base64": False,
            "document_annotation_format": response_format_from_pydantic_model(MistralDocument)
        }
    }
    return entry

Method from some class in my code:

    def create_batch_file(self) -> str:
        """
        Load and encode images to base64 and save them in a .jsonl file.

        :return: Path to the .jsonl file containing encoded image data.
        raises:
         ValueError: If the save directory does not exist or if no images are found.
         ValueError: No images found in the specified directory.
        """
        _ = self.pdf_to_images()
        self.jsonl_file_path = os.path.join(self.save_dir, f"{self.file_name}.jsonl")
        if not os.path.isdir(self.save_dir):
            raise ValueError(f"The directory {self.save_dir} does not exist.")
        # list all images in the directory
        images_list = sorted(Path(self.save_dir).glob("*.jpg"))
        if not images_list:
            raise ValueError("No images found in the specified directory.")

        # if jsonl file exists the return it
        if os.path.exists(self.jsonl_file_path):
            return self.jsonl_file_path

        with open(self.jsonl_file_path, "w", encoding="utf-8") as jsonl_file:
            for index, image_file in enumerate(images_list):
                with Image.open(image_file) as img:
                    encoded_data = encode_image_data(img)
                    image_url = f"data:image/jpeg;base64,{encoded_data}"
                    entry = create_batch_entry(image_url, index)

                    jsonl_file.write(json.dumps(entry) + "\n")

        return self.jsonl_file_path

The response_format_from_pydantic_model function returns ResponseFormat object which results in:

TypeError: Object of type ResponseFormat is not JSON serializable

in the method above when I'm trying to call json.dumps on the create_batch_entry function output.

Even calling .model_dump() method on ResponseFormat object in create_batch_entry function doesn't solve the problem. It results in some HTTP error response from API call.

SDKError: API error occurred: Status 422
{"detail": "Invalid file format.", "description": "Found 3 errors in this file. You can view supported formats here: https://docs.mistral.ai/capabilities/batch.", "errors": [{"message": "36 validation errors for BatchMessages\nbody.ChatCompletionRequest.messages\n  Field required [type=missing, input_value={'document': {'type': 'im... None, 'agent_id': None}, input_type=dict]\n    For further information visit [https://errors.pydantic.dev/2.10/v/missing\nbody.ChatCompletionRequest.document\n](https://errors.pydantic.dev/2.10/v/missing/nbody.ChatCompletionRequest.document/n)  Extra inputs are not permitted [type=extra_forbidden, input_value={'type': 'image_url', 'im...PDCFQAAAAASUVORK5CYII='}, input_type=dict]\n    For further information visit [https://errors.pydantic.dev/2.10/v/extra_forbidden\nbody.ChatCompletionRequest.include_image_base64\n](https://errors.pydantic.dev/2.10/v/extra_forbidden/nbody.ChatCompletionRequest.include_image_base64/n)  Extra inputs are not permitted [type=extra_forbidden, input_value=False, input_type=bool]\n    For further information visit [https://errors.pydantic.dev/2.10/v/extra_forbidden\nbody.ChatCompletionRequest.document_annotation_format\n](https://errors.pydantic.dev/2.10/v/extra_forbidden/nbody.ChatCompletionRequest.document_annotation_format/n)  Extra inputs are not permitted [type=extra_forbidden, input_value={'type': 'json_schema', '...: None, 'strict': True}}, input_type=dict]\n    For further information visit [https://errors.pydantic.dev/2.10/v/extra_forbidden\nbody.ClassificationRequest.input\n](https://errors.pydantic.dev/2.10/v/extra_forbidden/nbody.ClassificationRequest.input/n)  Field required [type=missing, input_value={'document': {'type': 'im... None, 'agent_id': None}, input_type=dict]\n    For further information visit [https://errors.pydantic.dev/2.10/v/missing\nbody.ClassificationRequest.document\n](https://errors.pydantic.dev/2.10/v/missing/nbody.ClassificationRequest.document/n)  ...

My question is how can I run Annotated OCR using MistralOCR?!
I need the MistralDocument model to get generated for each image in my batch.

Expected Behavior

I expected that by this approach I get a response for each image in the batch with markdown text and annotation I want. Same As Single Image Annotated OCR.

Additional Context

No response

Suggested Solutions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions