Replies: 2 comments 8 replies
-
Yes, "static" computation graphs have nice benefits, especially for the GPU support idea, so we should support it.
|
Beta Was this translation helpful? Give feedback.
8 replies
-
I have created #8366 which addresses this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a way to avoid re-creating the graph in every tick?
I was thinking about this and from my limited understanding we could:
ggml_rope()
to taken_past
as a tensor (so we can change the value without re-creating the graph)ggml_shift()
operation, which would push everything to the left. this would be applied to the KV-cache after computing the result. shifting everything by one would make space for the new data which is copied withggml_cpy
N > 1
, but the idea is that it will move data out of the context, just likeF.pad(xn, (0, 0, 1, -1)
does in PyTorchI think it could simplify the codebase and maybe even be useful for GPU support? #915
Also, I'm not really sure what scratch is needed for, it looks like a fast arena-allocator to reduce the impact of this periodic graph re-creation, so maybe scratch wouldn't be necessary either.
BTW: there are models which are specifically built on this "shift" operation. So there's another motivation for a new op.
https://github.com/lucidrains/token-shift-gpt
Beta Was this translation helpful? Give feedback.
All reactions