Skip to content

Conversation

@corebonts
Copy link
Contributor

@corebonts corebonts commented Mar 15, 2025

(parcially?) resolves #711

I still want to test it further, but I thought an initial review would be great.

The commit is based on the llama.ccp implementation:

TODO:

  • Test 1B
  • Test 4B
  • Test 12B
  • Test 27B
  • Test Imaging (I don't think it's working)

@corebonts corebonts changed the title Draft: Initial support for Gemma 3 models Initial support for Gemma 3 models Mar 15, 2025
@corebonts
Copy link
Contributor Author

@cjpais What is needed for image support? Also, what is the easiest way to debug or test it? Or should we just go with the text only for now?

@corebonts
Copy link
Contributor Author

corebonts commented Mar 16, 2025

I think I should build on the granite support feature branch because that also introduces the attention scale param.

@cjpais
Copy link
Collaborator

cjpais commented Mar 16, 2025

Sweet thank you so much. I will take a look at it tomorrow/tuesday. I suspect the granite branch will be merged in a day or two, so we will probably wait for that overall before this comes in. If it's easier to rebase on it now great, but if it's easier later we can do that.

@cjpais
Copy link
Collaborator

cjpais commented Mar 17, 2025

largely this looks good to me.

i will probably do another once over tomorrow again (mostly verifying build_gemma3 and around hparams llm_load_hparams vs llama.cpp)

i've tested it on the 4 sizes and it works

@corebonts if you don't mind rebasing it that would be great, otherwise i can resolve tomorrow afternoon.

@corebonts
Copy link
Contributor Author

On it

Tested only on text-to-text.
@cjpais
Copy link
Collaborator

cjpais commented Mar 18, 2025

Thanks @corebonts, I added the n_swa_pattern hparam which was added in llama.cpp #12373 for slightly easier readability between the two repos. It looks like this will be missing some RoPE scaling from upstream as far as I can tell, but this is probably another indication of needing a sync with upstream.

@cjpais cjpais merged commit f0d65b6 into mozilla-ai:main Mar 18, 2025
1 check passed
@corebonts
Copy link
Contributor Author

@cjpais Could you tell me why the n_swa is no longer needed? I see it also in the original llama.cpp code, but according to other gemma3 implementations, it's 1024 (or 512 for the 1b model), like here: https://github.com/google/gemma_pytorch/blob/014acb7ac4563a5f77c76d7ff98f31b568c16508/gemma/config.py#L230

And it's also mentioned in the technical report:
https://www.reddit.com/r/LocalLLaMA/comments/1j9drfk/gemma_3_technical_report/

@cjpais
Copy link
Collaborator

cjpais commented Mar 18, 2025

@corebonts it's there! It's set from the function call below from what I can tell. I tested it as well as this matching the code on the main branch of llama.cpp currently

ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW, hparams.n_swa);

@jart
Copy link
Collaborator

jart commented Mar 23, 2025

A+++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: error loading model: error loading model architecture: unknown model architecture: 'gemma3'

3 participants