-
Notifications
You must be signed in to change notification settings - Fork 12.7k
model : support vision LiquidAI LFM2-VL family #15347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Thanks for taking time to test it.
I'll deploy a GGUF on ggml-org for testing purpose.
We can merge after you resolve the comment of @CISC
Co-authored-by: Sigbjørn Skjæret <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just learned the hard way that CUDA IM2COL
doesn't support BF16
. :)
PR is based on ngxson#28. Huge thanks to @ngxson for bootstrap!
Add support for LFM2-VL vision models from LiquidAI.
Checkpoints are available on HF
LFM2-VL is a dynamic image resolution model, and support for dynamic resolution is implemented.
It uses siglip2 naflex, which does interpolation of positional embedding, which is implemented in the
resize_position_embeddings
function.The preprocessor calculates the optimal image size (smart resize), then resizes and pads the input image.
Tested with all combinations of the following params
CPU
,CUDA
256x256
,277x512
,512x277
,512x384
,512x512
F32/F32
,Q4_0/Q8_0
Sample output for the image below with prompt

describe image in one sentence
test log with timings.