model : support vision LiquidAI LFM2-VL family #15347

tdakhran · 2025-08-15T16:35:39Z

PR is based on ngxson#28. Huge thanks to @ngxson for bootstrap!

Add support for LFM2-VL vision models from LiquidAI.
Checkpoints are available on HF

LFM2-VL is a dynamic image resolution model, and support for dynamic resolution is implemented.
It uses siglip2 naflex, which does interpolation of positional embedding, which is implemented in the resize_position_embeddings function.
The preprocessor calculates the optimal image size (smart resize), then resizes and pads the input image.

Tested with all combinations of the following params

backends: CPU, CUDA
image resolutions: 256x256, 277x512, 512x277, 512x384, 512x512
quantization(backbone/mmproj): F32/F32, Q4_0/Q8_0

Sample output for the image below with prompt describe image in one sentence

main: loading model: /data/playground/vlm2/LFM2-VL-450M/LFM2-VL-450M-Q4_0.gguf
encoding image slice...
image slice encoded in 117 ms
decoding image batch 1/1, n_tokens_batch = 64
image decoded (batch 1/1) in 49 ms

The image features a woman wearing a stylish hat adorned with blue feathers, set against a warm, orange-toned background.


llama_perf_context_print:        load time =     198.18 ms
llama_perf_context_print: prompt eval time =     202.74 ms /    78 tokens (    2.60 ms per token,   384.73 tokens per second)
llama_perf_context_print:        eval time =     163.87 ms /    27 runs   (    6.07 ms per token,   164.76 tokens per second)
llama_perf_context_print:       total time =     440.29 ms /   105 tokens

test log with timings.

tools/mtmd/clip.cpp

ngxson

Very cool! Thanks for taking time to test it.

I'll deploy a GGUF on ggml-org for testing purpose.

We can merge after you resolve the comment of @CISC

tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC

I just learned the hard way that ~~CUDA~~ IM2COL doesn't support BF16. :)

ngxson and others added 5 commits August 15, 2025 17:06

wip lfm2 vision model

e81b87f

Fix conv weight

d1c1705

Implement dynamic resolution

8f6bce9

Fix cuda

8f20c14

support LFM2-VL-450M

d39cc2e

github-actions bot added examples python python script changes labels Aug 15, 2025

happy CI

b47e68e

CISC reviewed Aug 15, 2025

View reviewed changes

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

ngxson approved these changes Aug 16, 2025

View reviewed changes

tools/mtmd/clip.cpp Show resolved Hide resolved

tools/mtmd/clip.cpp Show resolved Hide resolved

Remove extra ggml_conv and put others into the right place

6fce2d5

Co-authored-by: Sigbjørn Skjæret <[email protected]>

tdakhran requested a review from CISC August 16, 2025 17:34

CISC approved these changes Aug 16, 2025

View reviewed changes

CISC added the hot Something that is hot label Aug 16, 2025

CISC merged commit 65349f2 into ggml-org:master Aug 16, 2025
51 checks passed

tdakhran deleted the tarek/lfm2vl branch August 16, 2025 21:59

ngxson mentioned this pull request Aug 18, 2025

mtmd : clean up clip_n_output_tokens #15391

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model : support vision LiquidAI LFM2-VL family #15347

model : support vision LiquidAI LFM2-VL family #15347

Uh oh!

tdakhran commented Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

CISC left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

model : support vision LiquidAI LFM2-VL family #15347

model : support vision LiquidAI LFM2-VL family #15347

Uh oh!

Conversation

tdakhran commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CISC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tdakhran commented Aug 15, 2025 •

edited

Loading

CISC left a comment •

edited

Loading