Skip to content

model : support vision LiquidAI LFM2-VL family #15347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 16, 2025

Conversation

tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Aug 15, 2025

PR is based on ngxson#28. Huge thanks to @ngxson for bootstrap!

Add support for LFM2-VL vision models from LiquidAI.
Checkpoints are available on HF

LFM2-VL is a dynamic image resolution model, and support for dynamic resolution is implemented.
It uses siglip2 naflex, which does interpolation of positional embedding, which is implemented in the resize_position_embeddings function.
The preprocessor calculates the optimal image size (smart resize), then resizes and pads the input image.

Tested with all combinations of the following params

  • backends: CPU, CUDA
  • image resolutions: 256x256, 277x512, 512x277, 512x384, 512x512
  • quantization(backbone/mmproj): F32/F32, Q4_0/Q8_0

Sample output for the image below with prompt describe image in one sentence
lena

main: loading model: /data/playground/vlm2/LFM2-VL-450M/LFM2-VL-450M-Q4_0.gguf
encoding image slice...
image slice encoded in 117 ms
decoding image batch 1/1, n_tokens_batch = 64
image decoded (batch 1/1) in 49 ms

The image features a woman wearing a stylish hat adorned with blue feathers, set against a warm, orange-toned background.


llama_perf_context_print:        load time =     198.18 ms
llama_perf_context_print: prompt eval time =     202.74 ms /    78 tokens (    2.60 ms per token,   384.73 tokens per second)
llama_perf_context_print:        eval time =     163.87 ms /    27 runs   (    6.07 ms per token,   164.76 tokens per second)
llama_perf_context_print:       total time =     440.29 ms /   105 tokens

test log with timings.

@github-actions github-actions bot added examples python python script changes labels Aug 15, 2025
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Thanks for taking time to test it.

I'll deploy a GGUF on ggml-org for testing purpose.

We can merge after you resolve the comment of @CISC

@tdakhran tdakhran requested a review from CISC August 16, 2025 17:34
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just learned the hard way that CUDA IM2COL doesn't support BF16. :)

@CISC CISC added the hot Something that is hot label Aug 16, 2025
@CISC CISC merged commit 65349f2 into ggml-org:master Aug 16, 2025
51 checks passed
@tdakhran tdakhran deleted the tarek/lfm2vl branch August 16, 2025 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples hot Something that is hot python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants