Does Axolotl supports only the linearized LLama4? #2939

v-dicicco · 2025-07-17T12:32:21Z

v-dicicco
Jul 17, 2025

Hi,
I found a bit confusing the state of the support of LLama4 inside Axolotl and I would like to have a confirm: does it support the HF model NOT linearized? e.g: meta-llama/Llama-4-Scout-17B-16E-Instruct

I see that all the examples uses the linearized version, and also the readme says "[...] See examples to start training your own Llama 4 models with Axolotl's linearized version!" not sure if the repo just lacks examples (because you will need more GPU) or if there is an incompatibility.

Thanks!

NanoCode012 · 2025-07-18T04:25:23Z

NanoCode012
Jul 18, 2025
Maintainer

@v-dicicco , sorry for the vagueness. It does work with the non-linearized one. The issue was due to how the modeling code was written, it doesn't work well with bitsandbytes (lora/qlora) leading to less vram savings.

However, if you're doing fft, it's not a concern for you!

Just a note too, we haven't tested llama4 for a while, so not really sure if something broke.

If you want to give it a try, maybe try the docker images that were built at the time https://hub.docker.com/r/axolotlai/axolotl-cloud/tags?name=20250511

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Does Axolotl supports only the linearized LLama4? #2939

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Does Axolotl supports only the linearized LLama4? #2939

Uh oh!

v-dicicco Jul 17, 2025

Replies: 1 comment

Uh oh!

NanoCode012 Jul 18, 2025 Maintainer

v-dicicco
Jul 17, 2025

NanoCode012
Jul 18, 2025
Maintainer