Skip to content
Discussion options

You must be logged in to vote

For multi-layer models, it still needs all the hidden-states as input to the next layer, so it may not be worth changing all the kernels for this feature, which also causes Triton to compile the kernel multiple times.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Fadelis98
Comment options

@zhiyuan1i
Comment options

Answer selected by Fadelis98
@Fadelis98
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants