Skip to content

v0.2.2

Choose a tag to compare

@github-actions github-actions released this 26 Oct 03:12
· 146 commits to main since this release

What's Changed

  • kernel: added flash infer attention impl by @guocuimi in #327
  • refactor: flatten block tables to 1d tensor by @guocuimi in #328
  • kernel: added script to generate instantiation for flashinfer kernels by @guocuimi in #329
  • refactor: move flash attn and flash infer into attention folder by @guocuimi in #330
  • kernel: port flash infer handler + wrapper logics by @guocuimi in #331
  • ut: added unittests for flash infer kernels by @guocuimi in #332
  • refactor: replaced last_page_len with kv_indptr for flash infer kernel by @guocuimi in #333
  • feat: added pass-in alibi slopes support for flash infer kernel by @guocuimi in #334
  • refactor: move paged kv related logic into paged_kv_t by @guocuimi in #335
  • ut: added fp8 kv unittests for flash infer kernel by @guocuimi in #336
  • ci: added pip cache to avoid redownloading by @guocuimi in #337
  • upgrade pytorch to 2.4.1 by @guocuimi in #341
  • ci: run package test in docker by @guocuimi in #345
  • ci: build cuda 12.4 for scalellm cpp images by @guocuimi in #346
  • Upgrade pytorch to 2.5.0 by @guocuimi in #347
  • ut: add more tests for different warp layout by @guocuimi in #340
  • misc: attention kernel refactoring by @guocuimi in #339

Full Changelog: v0.2.1...v0.2.2