-
Notifications
You must be signed in to change notification settings - Fork 0
Cpu optimzed kernel #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cpu_fused_kernel
Are you sure you want to change the base?
Conversation
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
|
This kernel only relies on pytoch, which is definitely needed for BNB. |
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
|
I don't think libtorch is a problem, the concern should be on ABI compatibility, which means you build in version x, but what happens when it runs w/ version y. |
Yes, the BNB maintainer also raised this point, so he recommended that I put this implementation in kernel-community. We can pull kernels in BNB, it should fix the build and run in different versions issue. |
Introduce BRGEMM to accelerate TTFT up to 10x, speed-up increase with input length.
Make command:
python -c "import torch; print(torch.utils.cmake_prefix_path)"output be like:
/opt/venv/lib/python3.12/site-packages/torch/share/cmakeThen
cmake -DCOMPUTE_BACKEND=cpu -DCMAKE_PREFIX_PATH=/opt/venv/lib/python3.12/site-packages/torch/share/cmake -S . && make