We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 641276a commit d06d087Copy full SHA for d06d087
ggml/src/ggml-metal/ggml-metal.m
@@ -3022,6 +3022,13 @@ static bool ggml_metal_encode_node(
3022
const int64_t shmem_size = d_state / 32;
3023
GGML_ASSERT(shmem_size * 32 == d_state);
3024
3025
+ // The final simd_sum won't work if the number of simd groups is
3026
+ // larger than the size of a single simd group. If this case is
3027
+ // hit at some point, the logic in the second simd_sum could be
3028
+ // expanded to handle this with one more sequential simd_sum to
3029
+ // collapse simd group sums another time.
3030
+ GGML_ASSERT(shmem_size <= 32);
3031
+
3032
// One thread pre element in d_state
3033
GGML_ASSERT(d_state <= (int64_t)pipeline.maxTotalThreadsPerThreadgroup);
3034
0 commit comments