Skip to content

Conversation

@nvchenghaoz
Copy link

Description

As title, this PR adds the support for soft logitcapping introduced in the Gemma 2 paper. The paper - https://arxiv.org/pdf/2408.00118, the soft logitcapping is the calculation that applies before the softmax op. And the math is
: LOGIT_CAP * tanh(attn / LOGIT_CAP)

Test Coverage

Added two new tests to test the soft logit capping. test_gqa_op_with_logit_cap and test_flashinfer_attention_op_context_with_logit_cap

@nvchenghaoz nvchenghaoz self-assigned this Jun 16, 2025
@nvchenghaoz nvchenghaoz changed the base branch from main to feat/ad-2025-06-24 June 24, 2025 23:41
@nvchenghaoz nvchenghaoz enabled auto-merge (squash) June 27, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants