Skip to content

Conversation

@makaveli10
Copy link

  • Fix geglu backward
  • Add geglu_back test
  • Add support for using defualt chat-template from the model being fine-tuned which now supports gemma as well. This allows the instruction finetuning to run without the need of a jinja chat-template but can work with it as well.

- Fix CPU implementation: now correctly computes gelu_backward(gate, grad) instead of
splitting computation across two halves
- Update Vulkan shader to match corrected implementation with proper gelu_backward
- Add a test for geglu_back op

The previous implementation incorrectly assumed geglu_back operated on concatenated
tensors and split them. The correct implementation computes the GELU backward pass
element-wise on the gate values.
- Add auto-detection for Gemma format (<start_of_turn>model\n...<end_of_turn>)
- Falls back to ChatML format for other models
- Uses models default chat-template i.e. no need for jinja chat-template

This enables instruction finetuning on any model.
@gianni-cor gianni-cor merged commit 10fd931 into tetherto:temp-latest-finetuning Nov 20, 2025
36 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants