Skip to content

Conversation

manman-ren
Copy link
Contributor

Summary: mark N_CTX constexpr

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: mark N_CTX constexpr

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the cla signed label Oct 9, 2025
@manman-ren manman-ren changed the title Add vectorization and fadd2_reduce [autoWS][FA] Add vectorization and fadd2_reduce Oct 9, 2025
@manman-ren manman-ren requested review from neildhar, njriasan and htyu and removed request for neildhar October 9, 2025 20:23
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
else:
m_ij = tl.maximum(m_i, tl.max(qk, 1) * qk_scale)
qk = qk * qk_scale - m_ij[:, None]
if VECT_MUL == 2 or VECT_MUL == 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to test the bits directly?

Suggested change
if VECT_MUL == 2 or VECT_MUL == 3:
if VECT_MUL & 2:

p0, p1 = p.reshape([PM, 2, PN // 2]).permute(0, 2, 1).split()
l_ij0, l_ij1 = tl.reduce((p0, p1), axis=1, combine_fn=_reduce_fadd2)
l_i0 = l_i0 * alpha + l_ij0
l_i1 = l_i1 * alpha + l_ij1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sum them after doing the reduction so we can keep the same interface? (and keep the differences localised to this part of the program)

Would that also save a register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "l_ij = l_ij0 + l_ij1" then a single l_i = l_i * alpha + l_ij? That is a good point.
The advantage is removing one addition inside the loop vs. register pressure of l_i1.
I basically copied from Gluon. And this is what is implemented in the dp version as well.
We can potentially clean this up if the alternative is better.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants