Skip to content

Commit c5103dd

Browse files
littskhanwen-sunStrivin0311WT1Wlijinnn
authored
Fix ci (#262)
* add pack_gqa template for bwd * support pack_gqa for tile_scheduler * finish mainloop * support bwd_epilogue for pack_gqa * fix test_flex_flash_attn * format packgqa for ffa bwd * add packgqa_swapab bench * format * removed useless argument from exps/grpcoll test * updated generate_inst script to use re to extract the kernel function signatures * speed up magi_attn_comm building by skipping building when instantiations have not changed * added native_grpcoll_split_alignment envvar with checking * impl _preprocess_args_for_split_alignment and added pragma: no cover for all __repr__ * minor fixed logging and repr * minor fixed comments * minor fixed comments * minor fixed comments * refactored buffer to extract the common output view out * refactored buffer to add split alignment to view * refactored buffer to add split alignment to view for lse * implemented test intranode with split alignment * minor updated test intranode * updated test_intranode_grpcoll * minor fixed comments * added temp debug code to let benchmark meet the split alignment * raised up kNumTMABytesPerWarp to 216KB to support larger token * implemented split_alignment for internode * fixed a bytes count bug for internode; forbid pass_padded_out_buffer with split_alignment > 1 * updated benchmark settings * Support per split token in static solver (#228) * Modify the static solver so that each segment of input_split_size is divisible by the same number * modify chunk logic in static solver * Dyn solver split alignment (#230) * add merge_with_split_alignment method in AttnRanges * support split alignment in dynamic solver * Relax INT_MAX buffer size limit for internode (#229) * relaxed the buffer size up to INT_MAX limit for internode * tested over INT_MAX buffer size in exp/grpcoll tests * minor fixed * added docstring for config funcs * added minimium num bytes check for native grpcoll * fixed tma bytes and num warps for internode cache notify kernel * raised up default num_rdma_bytes * further fixed internode cache notify kernel for group reduce * removed the temp debug code to make benchmark mask split-aligned * add dynamic_solver_vis (#231) * Dynamic split alignment (#233) * added num_heads_q,kv,group to comm meta for dynamic solver; added seperate split alignment for kv/qo * added num_heads_q/kv to comm meta for dynamic solver * supported split alignment varying from dtype * added native_grpcoll_split_alignment to test_pipeline/test_pipeline_sdpa * tested through dynamic split alignment for pipeline ut; added world size offset for seed * added some comments * added MAGI_ATTENTION_NATIVE_GRPCOLL_SPLIT_ALIGNMENT to docs * updated the docs for MAGI_ATTENTION_AUTO_RANGE_MERGE * build cp-bench docker image * Update API for num_heads and head_dim (#236) * updated and polished api for required num_heads_q, num_heads_kv, head_dim * adjusted the calls in ut for updated APIs * adjusted the calls in examples for updated APIs * adjusted the calls in exps for updated APIs * adjusted the calls in docs and readme for updated APIs, as well as deleting the magi_attn_varlen_dipatch and magi_attn_flex_dispatch deprecated APIs * minor updated tests/test_api/test_interface.py * minor updated benchmark dockerfile * Support auto split alignment (#241) * added head dim to comm meta * supported auto split alignment w/o varying from dtypes * minor updated repr and utils * added strategy for calc_split_alignment * hotfix switch envvars in bench * speed up epilogue and aovid read uninitialized memory * polish code * enhance packgqa * code refactor and bug fix * code refactor and bug fix * code refactor and bug fix * add ut * fixed get_a2av_perm_idx kernel * supported get_a2av_perm_idx for 32 nodes * polish code * polish code * fix ci * remove cuda12 build * remove cuda12 build * fix ci --------- Co-authored-by: shw <shw20010329@163.com> Co-authored-by: Strivin0311 <hyp@smail.nju.edu.cn> Co-authored-by: WT1W <100120067+WT1W@users.noreply.github.com> Co-authored-by: lijinnn <31332658+lijinnn@users.noreply.github.com> Co-authored-by: Big-TRex <1910960034@qq.com>
1 parent 525c00b commit c5103dd

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

.github/workflows/build_test.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,12 +74,12 @@ jobs:
7474
echo "is MagiAttention csrc modified: ${{ steps.filter.outputs.MagiAttentionCsrc }}"
7575
7676
test_MagiAttention_ngc2510_cuda13:
77-
needs: [detect_changes, install_MagiAttention_ngc2505_cuda12]
77+
needs: [detect_changes]
7878
if: |
7979
always() &&
8080
(
8181
needs.detect_changes.outputs.MagiAttention == 'true' &&
82-
(needs.detect_changes.outputs.MagiAttentionCsrc != 'true' || needs.install_MagiAttention_ngc2505_cuda12.result == 'success')
82+
(needs.detect_changes.outputs.MagiAttentionCsrc != 'true')
8383
)
8484
environment: ${{ (github.event_name == 'pull_request_target') && 'ci-internal' || '' }}
8585
runs-on: [self-hosted]

0 commit comments

Comments
 (0)