-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Other model implementations support passing raw embeddings via llama_batch::embd
as an alternative to token
. This is not the case for Gemma 3n, which will trigger a GGML_ABORT
here, with an unimplemented TODO: support embd input
. This should be fixed for feature parity.
Motivation
All (or most?) other models support inputting embeddings directly. Being able to use embedded tokens yields advantages such as
- being able to use a reference tokenizer instead of Llama.cpp's
- the option to use an external tool to generate image/audio embeddings and feeding those embeddings to Llama.cpp
- flexibility and consistency with other model implementations
Possible Implementation
See https://github.com/huggingface/transformers/blob/7aa888b7fa477d13153ffbfe107dfbd6c696014a/src/transformers/models/gemma3n/modular_gemma3n.py#L2053, https://github.com/huggingface/transformers/blob/7aa888b7fa477d13153ffbfe107dfbd6c696014a/src/transformers/models/gemma3n/modular_gemma3n.py#L1984 and related code
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request