Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2508,7 +2508,9 @@ bool SchedGroup::canAddSU(SUnit &SU) const {
++E;

// Return true if all of the bundled MIs can be added to this group.
return std::all_of(B, E, [this](MachineInstr &MI) { return canAddMI(MI); });
return std::all_of(B, E, [this](MachineInstr &MI) {
return (MI.isMetaInstruction()) || canAddMI(MI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (MI.isMetaInstruction()) || canAddMI(MI);
return MI.isMetaInstruction() || canAddMI(MI);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic change LGTM. Though the code structure here is now unnecessarily confusing and could be cleaned up in a follow up. Inside canAddMI already tries to handle isMetaInstruction. The code here is also pre-finding the bounds of the bundle prior to doing the all_of. You can remove the isMetaInstruction inside of canAddMI, and directly check canAddMI in the initial walk through the bundle above

Copy link
Contributor Author

@yoonseoch yoonseoch Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, will do so.
Initially I did not remove isMetaInstruction inside of canAddMI as that could lead to edges between two sched_barriers, which I thought was unintentional (lines 2501, 2502).

Copy link
Contributor Author

@yoonseoch yoonseoch Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making canAddMI() return true for a metaInstruction in effect came up with more intrusive changes to existing lit tests. I got about 14 lit-test failures from CodeGen/AMDGPU, mostly on tests of iglp_opts and sched-barrier, and sched-group-barriers.

For example, llvm.amdgcn.sched.group.barrier.ll,

// sequence after making canAddMI return true on an metaInstruction

385 define amdgpu_kernel void @test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE(ptr addrspace(1) noalias %in,
386 ; GCN-LABEL: test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE:                                           
387 ; GCN:       ; %bb.0:                                                                                                 
388 ; GCN-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x24                                                                    
389 ; GCN-NEXT:    v_and_b32_e32 v0, 0x3ff, v0                                                                            
390 ; GCN-NEXT:    v_lshlrev_b32_e32 v16, 7, v0                                                                           
    ----------------------------------------------------------------------------------------------------------------------
391 ; GCN-NEXT:    s_waitcnt lgkmcnt(0)                                                                                   
392 ; GCN-NEXT:    global_load_dwordx4 v[12:15], v16, s[0:1] offset:32                                                    
393 ; GCN-NEXT:    ; kill: killed $sgpr0_sgpr1                                                                            
394 ; GCN-NEXT:    global_load_dwordx4 v[0:3], v16, s[0:1]                                                                
395 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)                                               
396 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000002) size(2) SyncID(0)                                               
397 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000040) size(1) SyncID(0)                                               
398 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)                                               
399 ; GCN-NEXT:    s_waitcnt vmcnt(1)                                                                                     
400 ; GCN-NEXT:    v_mul_lo_u32 v13, v13, v13                                                                             
401 ; GCN-NEXT:    v_mul_lo_u32 v12, v12, v12                                                                             
402 ; GCN-NEXT:    v_mul_lo_u32 v15, v15, v15                                                                             
403 ; GCN-NEXT:    v_mul_lo_u32 v14, v14, v14                                                                             
    ----------------------------------------------------------------------------------------------------------------------
    ----------------------------------------------------------------------------------------------------------------------
    ----------------------------------------------------------------------------------------------------------------------
    ----------------------------------------------------------------------------------------------------------------------
404 ; GCN-NEXT:    s_waitcnt vmcnt(0)                                                                                     
405 ; GCN-NEXT:    v_mul_lo_u32 v3, v3, v3                                                                                
406 ; GCN-NEXT:    v_mul_lo_u32 v2, v2, v2                                                                                
407 ; GCN-NEXT:    global_store_dwordx4 v16, v[12:15], s[2:3] offset:32                                                   
408 ; GCN-NEXT:    global_load_dwordx4 v[4:7], v16, s[0:1] offset:48                                                      
409 ; GCN-NEXT:    v_mul_lo_u32 v1, v1, v1             

// original sequence in the test

 385 define amdgpu_kernel void @test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE(ptr addrspace(1) noalias %in
 386 ; GCN-LABEL: test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE:
 387 ; GCN:       ; %bb.0:
 388 ; GCN-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x24
 389 ; GCN-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
 390 ; GCN-NEXT:    v_lshlrev_b32_e32 v16, 7, v0
 391 ; GCN-NEXT:    ; kill: killed $sgpr0_sgpr1
 392 ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
 393 ; GCN-NEXT:    global_load_dwordx4 v[12:15], v16, s[0:1] offset:32
     ---------------------------------------------------------------------------------------------------------------------
     ---------------------------------------------------------------------------------------------------------------------
 394 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
 395 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000002) size(2) SyncID(0)
 396 ; GCN-NEXT:    s_waitcnt vmcnt(0)
     ---------------------------------------------------------------------------------------------------------------------
     ---------------------------------------------------------------------------------------------------------------------
 397 ; GCN-NEXT:    v_mul_lo_u32 v13, v13, v13
 398 ; GCN-NEXT:    v_mul_lo_u32 v12, v12, v12
 399 ; GCN-NEXT:    v_mul_lo_u32 v15, v15, v15
 400 ; GCN-NEXT:    v_mul_lo_u32 v14, v14, v14
 401 ; GCN-NEXT:    global_store_dwordx4 v16, v[12:15], s[2:3] offset:32
 402 ; GCN-NEXT:    global_load_dwordx4 v[0:3], v16, s[0:1]
 403 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000040) size(1) SyncID(0)
 404 ; GCN-NEXT:    ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
 405 ; GCN-NEXT:    s_waitcnt vmcnt(0)
 406 ; GCN-NEXT:    v_mul_lo_u32 v3, v3, v3
 407 ; GCN-NEXT:    v_mul_lo_u32 v2, v2, v2
     ---------------------------------------------------------------------------------------------------------------------
     ---------------------------------------------------------------------------------------------------------------------
 408 ; GCN-NEXT:    v_mul_lo_u32 v1, v1, v1      

I think using the original logic of canAddMI is safer given that it is used for a non-bundle single MI.
I can still simplify the logic to move the checking into the scanning of the range and avoid doing another scan of all_of.

});
}

void SchedGroup::initSchedGroup() {
Expand Down
Loading