Model: GraniteDocling #16112

gabe-l-hart · 2025-09-19T14:39:04Z

Description

This PR adds GGUF conversion support for https://huggingface.co/ibm-granite/granite-docling-258M. Once converted, the model runs to completion, however the results are quite bad (see Outstanding Questions below).

Partially addresses #16110

Testing

# Convert language model
 python convert_hf_to_gguf.py ~/models/ibm-granite/granite-docling-258M/

# Convert MMProj
python convert_hf_to_gguf.py ~/models/ibm-granite/granite-docling-258M/ --mmproj

# Run a sample
./bin/llama-mtmd-cli -m ~/models/ibm-granite/granite-docling-258M/granite-docling-258M-F16.gguf --image ~/Pictures/sample-doc.png --mmproj ~/models/ibm-granite/mmproj-granite-docling-258M -p "<__media__>Convert this page to markdown." --verbose -ngl 99

Outstanding Questions

With this PR, the model does convert and the all of the math runs correctly, but in comparing with the results from transformers (implemented here), the output is significantly worse to the point of being unusable. In digging through, it seems that it largely boils down to the implementation of clip_image_preprocess compared to Idefics3ImageProcessor.preprocess. In the transformers implementation, do_resize and do_image_splitting default to True, resulting in a grid of sub-images with appropriate image boundary tokens post-tokenization (the input to the LLM). The corresponding output from mtmd_tokenize simply pads the image to square and resizes to the configured image_size, resulting in a single image in the input token sequence.

This likely relates to some of the follow-on discussion in #13050 (starting with #13050 (comment)) since GraniteDocling is very similar to SmolVLM. I'll dig further on that issue, but my current thinking is that the best course of action will be to merge this as-is, then add a follow-on PR that supports patch-based preprocessing for idefics3.

gabe-l-hart · 2025-09-19T14:39:58Z

cc @ryan-mangeno since I know you were looking at this

also cc @ngxson since you're the source-of-truth for all-things mtmd

CISC · 2025-09-19T18:02:27Z

You cannot use the refact pre-tokenizer as Granite Docling sets clean_up_tokenization_spaces to false, while refact has it set to true (default), also refact has individual_digits enabled.

So, you should use the gpt2 regex and set clean_spaces = false.

ryan-mangeno · 2025-09-19T18:33:31Z

I have tried some testing on it, using the f16 model, question: I hit this assert

Assertion failed: (!isnan(sumf) && !isinf(sumf)), function ggml_vec_dot_f16, file vec.cpp, line 328.

when I set -ngl lower than 26, anything higher or equal, I get somewhat infinite responses and buffer allocation fails, to my knowledge should we be trying to get it to work for -ngl 1?

gabe-l-hart · 2025-09-19T19:01:45Z

Good catch @CISC. I'll make that fix

Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart · 2025-09-19T19:21:19Z

Ok, I've made the prtokenizer fix using trillion which maps to gpt-2 pretokenizer here and sets clean_spaces = false here.

Unfortunately, the generation results are still just as bad.

CISC · 2025-09-19T20:29:08Z

Ok, I've made the prtokenizer fix using trillion which maps to gpt-2 pretokenizer here and sets clean_spaces = false here.

It's better if you give it a new name (like granite-docling), move it from pre_computed_hashes to models, and just add it here:

llama.cpp/src/llama-vocab.cpp

Line 1961 in 6d75883

tokenizer_pre == "trillion") {

gabe-l-hart · 2025-09-19T20:30:22Z

I've now confirmed with some pretty janky hacking that with proper preprocessing the results look much better. Here's what I did:

Extract patches from transformers preprocessing

from transformers import AutoProcessor
from transformers.image_utils import load_image
import torch
from transformers.image_transforms import to_pil_image

model_path = "/Users/ghart/models/ibm-granite/granite-docling-258M"
processor = AutoProcessor.from_pretrained(model_path)
image = load_image("sample.png")

res = processor.image_processor.preprocess([image], return_row_col_info=True, return_tensors="pt")
px = res['pixel_values']
n_imgs = px.shape[1]
for i in range(n_imgs):
    patch_img = px[0, i, :, :].reshape(px.shape[2], px.shape[3], px.shape[4])
    pil_patch_img = to_pil_image(torch.max(patch_img, torch.zeros_like(patch_img)))
    pil_patch_img.save(f"patch_{i}.png")

Manually create prompt for patches

There are 13 patches, arranged in 3 rows of 4 columns plus a single global image reshaped to 512 x 512

"<row_1_col_1><__media__><row_1_col_2><__media__><row_1_col_3><__media__><row_1_col_4><__media__>
<row_2_col_1><__media__><row_2_col_2><__media__><row_2_col_3><__media__><row_2_col_4><__media__>
<row_3_col_1><__media__><row_3_col_2><__media__><row_3_col_3><__media__><row_3_col_4><__media__>

<global-img><__media__>Convert this page to markdown.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>"

Call llama-mtmd-cli

./bin/llama-mtmd-cli -m ~/models/ibm-granite/granite-docling-258M/granite-docling-258M-F16.gguf --mmproj ~/models/ibm-granite/mmproj-granite-docling-258M -p "<row_1_col_1><__media__><row_1_col_2><__media__><row_1_col_3><__media__><row_1_col_4><__media__>
<row_2_col_1><__media__><row_2_col_2><__media__><row_2_col_3><__media__><row_2_col_4><__media__>
<row_3_col_1><__media__><row_3_col_2><__media__><row_3_col_3><__media__><row_3_col_4><__media__>

<global-img><__media__>Convert this page to markdown." --verbose -ngl 99 --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_0.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_1.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_10.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_11.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_12.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_2.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_3.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_4.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_5.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_6.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_7.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_8.png --image /Users/ghart/models/ibm-granite/granite-docling-258M/patch_9.png

gabe-l-hart · 2025-09-19T20:31:46Z

It's better if you give it a new name (like granite-docling), move it from pre_computed_hashes to models, and just add it here:

That's totally fair. I was trying to see if I could get away without changing the c++ side to avoid needing to bump versions in downstream projects (ollama et al), but given the performance without changes to the preprocessing, I think that's a somewhat moot point.

Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <[email protected]>

CISC · 2025-09-19T20:55:08Z

You will hopefully try to improve it in this PR?

gabe-l-hart · 2025-09-19T20:56:17Z

Yeah, I think it probably makes sense to do in one PR since the results currently are garbage and it might require some additional gguf KVs to indicate patching

CISC · 2025-09-19T20:57:37Z

Yeah, I think it probably makes sense to do in one PR since the results currently are garbage and it might require some additional gguf KVs to indicate patching

Great, just re-request review when you're ready. :)

vonjackustc · 2025-09-22T02:48:33Z

It's better if you give it a new name (like granite-docling), move it from pre_computed_hashes to models, and just add it here:

That's totally fair. I was trying to see if I could get away without changing the c++ side to avoid needing to bump versions in downstream projects (ollama et al), but given the performance without changes to the preprocessing, I think that's a somewhat moot point.

Can you change the jinja template to support patches?

gabe-l-hart · 2025-09-23T18:20:07Z

Since this was based on a branch in the repo before the refactor of contributor roles, I can't actually update this branch. I've opened a new PR instead with the overhaul to the preprocessing included: #16206

@CISC would you mind killing this branch in the core repo since I can't?

github-actions bot added the python python script changes label Sep 19, 2025

gabe-l-hart requested review from CISC and ngxson September 19, 2025 17:16

feat: Add granite-docling conversion using trillion pretokenizer

64e10f5

Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart force-pushed the gabe-l-hart/GraniteDocling branch from cbc4bc2 to 64e10f5 Compare September 19, 2025 19:19

gabe-l-hart added 2 commits September 19, 2025 14:44

feat: Add granite-docling vocab pre enum

428db16

Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <[email protected]>

fix: Use granite-docling pre

c2202d2

Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <[email protected]>

CISC approved these changes Sep 19, 2025

View reviewed changes

gabe-l-hart marked this pull request as draft September 19, 2025 21:03

rick-github mentioned this pull request Sep 22, 2025

Granite Docling 258m ollama/ollama#12355

Open

gabe-l-hart mentioned this pull request Sep 23, 2025

Model: Granite docling + Idefics3 preprocessing (SmolVLM) #16206

Merged

gabe-l-hart closed this Sep 23, 2025

CISC deleted the gabe-l-hart/GraniteDocling branch September 23, 2025 20:58

Model: GraniteDocling #16112

Model: GraniteDocling #16112

Uh oh!

Conversation

gabe-l-hart commented Sep 19, 2025

Description

Testing

Outstanding Questions

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

CISC commented Sep 19, 2025

Uh oh!

ryan-mangeno commented Sep 19, 2025

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

CISC commented Sep 19, 2025

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

CISC commented Sep 19, 2025

Uh oh!

gabe-l-hart commented Sep 19, 2025

Uh oh!

CISC commented Sep 19, 2025

Uh oh!

vonjackustc commented Sep 22, 2025

Uh oh!

gabe-l-hart commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants