Fix ci (#262)

littsk · hanwen-sun · Strivin0311 · web-flow · commit c5103dd07546 · 2026-02-13T15:01:33.000+08:00
* add pack_gqa template for bwd * support pack_gqa for tile_scheduler * finish mainloop * support bwd_epilogue for pack_gqa * fix test_flex_flash_attn * format packgqa for ffa bwd * add packgqa_swapab bench * format * removed useless argument from exps/grpcoll test * updated generate_inst script to use re to extract the kernel function signatures * speed up magi_attn_comm building by skipping building when instantiations have not changed * added native_grpcoll_split_alignment envvar with checking * impl _preprocess_args_for_split_alignment and added pragma: no cover for all __repr__ * minor fixed logging and repr * minor fixed comments * minor fixed comments * minor fixed comments * refactored buffer to extract the common output view out * refactored buffer to add split alignment to view * refactored buffer to add split alignment to view for lse * implemented test intranode with split alignment * minor updated test intranode * updated test_intranode_grpcoll * minor fixed comments * added temp debug code to let benchmark meet the split alignment * raised up kNumTMABytesPerWarp to 216KB to support larger token * implemented split_alignment for internode * fixed a bytes count bug for internode; forbid pass_padded_out_buffer with split_alignment > 1 * updated benchmark settings * Support per split token in static solver (#228) * Modify the static solver so that each segment of input_split_size is divisible by the same number * modify chunk logic in static solver * Dyn solver split alignment (#230) * add merge_with_split_alignment method in AttnRanges * support split alignment in dynamic solver * Relax INT_MAX buffer size limit for internode (#229) * relaxed the buffer size up to INT_MAX limit for internode * tested over INT_MAX buffer size in exp/grpcoll tests * minor fixed * added docstring for config funcs * added minimium num bytes check for native grpcoll * fixed tma bytes and num warps for internode cache notify kernel * raised up default num_rdma_bytes * further fixed internode cache notify kernel for group reduce * removed the temp debug code to make benchmark mask split-aligned * add dynamic_solver_vis (#231) * Dynamic split alignment (#233) * added num_heads_q,kv,group to comm meta for dynamic solver; added seperate split alignment for kv/qo * added num_heads_q/kv to comm meta for dynamic solver * supported split alignment varying from dtype * added native_grpcoll_split_alignment to test_pipeline/test_pipeline_sdpa * tested through dynamic split alignment for pipeline ut; added world size offset for seed * added some comments * added MAGI_ATTENTION_NATIVE_GRPCOLL_SPLIT_ALIGNMENT to docs * updated the docs for MAGI_ATTENTION_AUTO_RANGE_MERGE * build cp-bench docker image * Update API for num_heads and head_dim (#236) * updated and polished api for required num_heads_q, num_heads_kv, head_dim * adjusted the calls in ut for updated APIs * adjusted the calls in examples for updated APIs * adjusted the calls in exps for updated APIs * adjusted the calls in docs and readme for updated APIs, as well as deleting the magi_attn_varlen_dipatch and magi_attn_flex_dispatch deprecated APIs * minor updated tests/test_api/test_interface.py * minor updated benchmark dockerfile * Support auto split alignment (#241) * added head dim to comm meta * supported auto split alignment w/o varying from dtypes * minor updated repr and utils * added strategy for calc_split_alignment * hotfix switch envvars in bench * speed up epilogue and aovid read uninitialized memory * polish code * enhance packgqa * code refactor and bug fix * code refactor and bug fix * code refactor and bug fix * add ut * fixed get_a2av_perm_idx kernel * supported get_a2av_perm_idx for 32 nodes * polish code * polish code * fix ci * remove cuda12 build * remove cuda12 build * fix ci --------- Co-authored-by: shw <shw20010329@163.com> Co-authored-by: Strivin0311 <hyp@smail.nju.edu.cn> Co-authored-by: WT1W <100120067+WT1W@users.noreply.github.com> Co-authored-by: lijinnn <31332658+lijinnn@users.noreply.github.com> Co-authored-by: Big-TRex <1910960034@qq.com>
diff --git a/.github/workflows/build_test.yaml b/.github/workflows/build_test.yaml
@@ -74,12 +74,12 @@ jobs:
           echo "is MagiAttention csrc modified: ${{ steps.filter.outputs.MagiAttentionCsrc }}"
 
   test_MagiAttention_ngc2510_cuda13:
-    needs: [detect_changes, install_MagiAttention_ngc2505_cuda12]
+    needs: [detect_changes]
     if: |
       always() &&
       (
         needs.detect_changes.outputs.MagiAttention == 'true' &&
-        (needs.detect_changes.outputs.MagiAttentionCsrc != 'true' || needs.install_MagiAttention_ngc2505_cuda12.result == 'success')
+        (needs.detect_changes.outputs.MagiAttentionCsrc != 'true')
       )
     environment: ${{ (github.event_name == 'pull_request_target') && 'ci-internal' || '' }}
     runs-on: [self-hosted]

Original file line number	Diff line number	Diff line change
`@@ -74,12 +74,12 @@ jobs:`
`74`	`74`	`echo "is MagiAttention csrc modified: ${{ steps.filter.outputs.MagiAttentionCsrc }}"`
`75`	`75`
`76`	`76`	`test_MagiAttention_ngc2510_cuda13:`
`77`		`- needs: [detect_changes, install_MagiAttention_ngc2505_cuda12]`
	`77`	`+ needs: [detect_changes]`
`78`	`78`	`if: \|`
`79`	`79`	`always() &&`
`80`	`80`	`(`
`81`	`81`	`needs.detect_changes.outputs.MagiAttention == 'true' &&`
`82`		`- (needs.detect_changes.outputs.MagiAttentionCsrc != 'true' \|\| needs.install_MagiAttention_ngc2505_cuda12.result == 'success')`
	`82`	`+ (needs.detect_changes.outputs.MagiAttentionCsrc != 'true')`
`83`	`83`	`)`
`84`	`84`	`environment: ${{ (github.event_name == 'pull_request_target') && 'ci-internal' \|\| '' }}`
`85`	`85`	`runs-on: [self-hosted]`