Skip to content

Commit dda7658

Browse files
authored
[https://nvbugs/5655885][fix] fix invalid instruction error in 2shot ar kernel on Ampere (#9394)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
1 parent 7588029 commit dda7658

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,11 +137,17 @@ public:
137137
// corresponding CTA has not been launched.
138138
for (int flag_idx = blockIdx.x; flag_idx < kBarrierFlagCount; flag_idx += gridDim.x)
139139
{
140+
#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
140141
asm volatile(
141142
"st.global.relaxed.sys.b32 [%1], %0;" ::"r"(m_flag_value), "l"(m_target_flag + flag_idx * NRanks));
143+
#else
144+
st_flag(m_target_flag + flag_idx * NRanks, m_flag_value);
145+
#endif
142146
}
147+
#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
143148
// Single release fence
144149
asm volatile("fence.release.sys;");
150+
#endif
145151

146152
while (ld_flag(m_current_flag) == prev_flag(m_flag_value))
147153
{

0 commit comments

Comments
 (0)