Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance.
However, as most inference/serving frameworks are Python-based, the cpp-only architecture prevents the project from further application. So is there any plan to wrap it with pybind11 so that the kernel can be used in PyTorch?