Skip to content

Commit 3d01255

Browse files
committed
fix invalid instruction error in 2shot ar kernel on Ampere
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
1 parent 08755a8 commit 3d01255

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,11 +134,17 @@ public:
134134
// corresponding CTA has not been launched.
135135
for (int flag_idx = blockIdx.x; flag_idx < kBarrierFlagCount; flag_idx += gridDim.x)
136136
{
137+
#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
137138
asm volatile(
138139
"st.global.relaxed.sys.b32 [%1], %0;" ::"r"(m_flag_value), "l"(m_target_flag + flag_idx * NRanks));
140+
#else
141+
st_flag(m_target_flag + flag_idx * NRanks, m_flag_value);
142+
#endif
139143
}
144+
#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
140145
// Single release fence
141146
asm volatile("fence.release.sys;");
147+
#endif
142148

143149
while (ld_flag(m_current_flag) == prev_flag(m_flag_value))
144150
{

0 commit comments

Comments
 (0)