Commit c5103dd
Fix ci (#262)
* add pack_gqa template for bwd
* support pack_gqa for tile_scheduler
* finish mainloop
* support bwd_epilogue for pack_gqa
* fix test_flex_flash_attn
* format packgqa for ffa bwd
* add packgqa_swapab bench
* format
* removed useless argument from exps/grpcoll test
* updated generate_inst script to use re to extract the kernel function signatures
* speed up magi_attn_comm building by skipping building when instantiations have not changed
* added native_grpcoll_split_alignment envvar with checking
* impl _preprocess_args_for_split_alignment and added pragma: no cover for all __repr__
* minor fixed logging and repr
* minor fixed comments
* minor fixed comments
* minor fixed comments
* refactored buffer to extract the common output view out
* refactored buffer to add split alignment to view
* refactored buffer to add split alignment to view for lse
* implemented test intranode with split alignment
* minor updated test intranode
* updated test_intranode_grpcoll
* minor fixed comments
* added temp debug code to let benchmark meet the split alignment
* raised up kNumTMABytesPerWarp to 216KB to support larger token
* implemented split_alignment for internode
* fixed a bytes count bug for internode; forbid pass_padded_out_buffer with split_alignment > 1
* updated benchmark settings
* Support per split token in static solver (#228)
* Modify the static solver so that each segment of input_split_size is divisible by the same number
* modify chunk logic in static solver
* Dyn solver split alignment (#230)
* add merge_with_split_alignment method in AttnRanges
* support split alignment in dynamic solver
* Relax INT_MAX buffer size limit for internode (#229)
* relaxed the buffer size up to INT_MAX limit for internode
* tested over INT_MAX buffer size in exp/grpcoll tests
* minor fixed
* added docstring for config funcs
* added minimium num bytes check for native grpcoll
* fixed tma bytes and num warps for internode cache notify kernel
* raised up default num_rdma_bytes
* further fixed internode cache notify kernel for group reduce
* removed the temp debug code to make benchmark mask split-aligned
* add dynamic_solver_vis (#231)
* Dynamic split alignment (#233)
* added num_heads_q,kv,group to comm meta for dynamic solver; added seperate split alignment for kv/qo
* added num_heads_q/kv to comm meta for dynamic solver
* supported split alignment varying from dtype
* added native_grpcoll_split_alignment to test_pipeline/test_pipeline_sdpa
* tested through dynamic split alignment for pipeline ut; added world size offset for seed
* added some comments
* added MAGI_ATTENTION_NATIVE_GRPCOLL_SPLIT_ALIGNMENT to docs
* updated the docs for MAGI_ATTENTION_AUTO_RANGE_MERGE
* build cp-bench docker image
* Update API for num_heads and head_dim (#236)
* updated and polished api for required num_heads_q, num_heads_kv, head_dim
* adjusted the calls in ut for updated APIs
* adjusted the calls in examples for updated APIs
* adjusted the calls in exps for updated APIs
* adjusted the calls in docs and readme for updated APIs, as well as deleting the magi_attn_varlen_dipatch and magi_attn_flex_dispatch deprecated APIs
* minor updated tests/test_api/test_interface.py
* minor updated benchmark dockerfile
* Support auto split alignment (#241)
* added head dim to comm meta
* supported auto split alignment w/o varying from dtypes
* minor updated repr and utils
* added strategy for calc_split_alignment
* hotfix switch envvars in bench
* speed up epilogue and aovid read uninitialized memory
* polish code
* enhance packgqa
* code refactor and bug fix
* code refactor and bug fix
* code refactor and bug fix
* add ut
* fixed get_a2av_perm_idx kernel
* supported get_a2av_perm_idx for 32 nodes
* polish code
* polish code
* fix ci
* remove cuda12 build
* remove cuda12 build
* fix ci
---------
Co-authored-by: shw <shw20010329@163.com>
Co-authored-by: Strivin0311 <hyp@smail.nju.edu.cn>
Co-authored-by: WT1W <100120067+WT1W@users.noreply.github.com>
Co-authored-by: lijinnn <31332658+lijinnn@users.noreply.github.com>
Co-authored-by: Big-TRex <1910960034@qq.com>1 parent 525c00b commit c5103dd
1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| |||
0 commit comments