Skip to content

Conversation

@k50112113
Copy link

A new file fwd_decode_splitk_kvcache-tunning.py, duplicated from 06-attention-decode.py is added in order to benchmark/optimize forward decoder. This script comes with additional heuristics in handling large M edge cases. This script also provides an additional option to bypass using do_bench() function for runtime estimation in order to incorporate with rocprof to accurately estimate the runtime of short running kernels. This script is also ready to provide comparisons with CK once their kernels are implemented (so far, their function call points to an empty kernel, you can see that by doing pytest and you will see 1 failed case).

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because this is only a script for tunning fwd decode kernel.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants