Skip to content

Conversation

@matkle
Copy link
Contributor

@matkle matkle commented Oct 16, 2025

The current implementation of swiglu computes a variant of the standard SwiGLU activation by adding a bias term (i.e., returning out_gelu * (a_linear + 1) instead of out_gelu * a_linear), and optionally applies input clipping and sigmoid input scaling.

This PR makes the bias addition optional (defaulting to True for backward compatibility) and introduces standard_swiglu, which computes the standard SwiGLU activation without bias addition, input clipping, or sigmoid input scaling.

@matkle matkle requested a review from ptillet as a code owner October 16, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants