Skip to content

Commit e3dd7a7

Browse files
committed
Really dumb division bug fixed.
All tests now pass except summation, which fails to meet tolerance.
1 parent 88fd3d6 commit e3dd7a7

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

src/gpuarray_reduction.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2386,10 +2386,11 @@ static void reduxGenSrcAppendPhase1 (GpuReduction* gr){
23862386
" if(misalignL && doFinish && LID_0 < D){\n"
23872387
" SETREDUXSTATE(accV, accI, wdL[(GID_0+0)*D+LID_0], waL[(GID_0+0)*D+LID_0]);\n"
23882388
" \n"
2389-
" for(k=-1; /* Starting with the first block to our left... */\n"
2390-
" (start +0)/B == /* Is our write target the same as that of */\n"
2391-
" (start+k*V+V-1)/B; /* the target k blocks to our left? */\n"
2392-
" k--){ /* Try moving one more to the left. */\n"
2389+
" /* vvv-- NOTA BENE: The +B hack is REALLY NECESSARY, since C division is rounding to zero: (-1)/B == (B-1)/B for B>1. */\n"
2390+
" for(k=-1; /* Starting with the first block to our left... */\n"
2391+
" (start +B)/B == /* Is our write target the same as that of */\n"
2392+
" (start+k*V+V-1+B)/B; /* the target k blocks to our left? */\n"
2393+
" k--){ /* Try moving one more to the left. */\n"
23932394
" REDUX(accV, accI, wdR[(GID_0+k)*D+LID_0], waR[(GID_0+k)*D+LID_0]);\n"
23942395
" }\n"
23952396
" \n");

0 commit comments

Comments
 (0)