Skip to content
Discussion options

You must be logged in to vote

In the LitGPT code, I think they called it gk for "gate for step k" (whereas it is "alpha for step t" in the paper).

But if you consider gk.float().exp() later, I think that corresponds to the paper's $\alpha_t$

In my code I am calling it alpha:

    alpha = -self.A_log.exp().view(1, 1, -1) * F.softplus(
        self.W_alpha(x) + self.dt_bias
    )

But this is more of a pre-alpha. The real alpha comes later in

    S = S * a_t.exp()

Maybe to make this clear, I could rename it as follows?

    alpha_log = -self.A_log.exp().view(1, 1, -1) * F.softplus(self.W_alpha(x) + self.dt_bias)
    alpha = alpha_log.exp()

Replies: 3 comments 21 replies

Comment options

You must be logged in to vote
1 reply
@d-kleine
Comment options

Comment options

You must be logged in to vote
16 replies
@rasbt
Comment options

rasbt Nov 6, 2025
Maintainer

@d-kleine
Comment options

@d-kleine
Comment options

@rasbt
Comment options

@d-kleine
Comment options

Answer selected by d-kleine
Comment options

You must be logged in to vote
4 replies
@d-kleine
Comment options

@casinca
Comment options

@d-kleine
Comment options

@rasbt
Comment options

rasbt Nov 5, 2025
Maintainer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants