Skip to content

Conversation

@muggle-stack
Copy link
Contributor

Fix: SpaceMit IME backend array out-of-bounds access

Description

This PR fixes a critical bug in the SpaceMit IME (Intelligent Matrix Engine) backend that causes out-of-bounds array access during quantization phase, leading to undefined behavior and potential crashes.

Problem

Root Cause:
The task-to-batch index mapping calculation was incorrect. The code was dividing compute_idx by block_size_m instead of per_gemm_block_count_m, causing incorrect gemm_idx values that exceed the array bounds.

Example scenario:

batch_feature = 1          // qnbitgemm_args array has only 1 element (index 0)
gemm_m = 30
block_size_m = 4
per_gemm_block_count_m = div_round_up(30, 4) = 8
task_count = 1 * 8 = 8     // compute_idx ranges from 0 to 7

// BUGGY calculation:
compute_idx = 4
gemm_idx = 4 / 4 = 1       // Out of bounds! (array size is 1)

// CORRECT calculation:
gemm_idx = 4 / 8 = 0       // Valid index

Impact:

  • Accessing qnbitgemm_args[gemm_idx] with invalid index reads uninitialized memory
  • Can result in invalid pointer values (e.g., 0x20451)
  • Causes SIGBUS errors when dereferencing invalid pointers
  • May appear to work in some configurations due to:
    • Lucky runtime parameters (e.g., gemm_m being a multiple of 4)
    • CPU affinity masking the issue when threads run on IME1-capable cores

Solution

Fix the task assignment calculation to properly map tasks to batches:

// Correct mapping: task -> batch -> block within batch
int32_t gemm_idx = compute_idx / per_gemm_block_count_m;
int32_t block_idx_in_gemm = compute_idx % per_gemm_block_count_m;
int32_t m_idx = block_idx_in_gemm * block_size_m;

This ensures:

  • All tasks with the same batch are mapped to the same gemm_idx
  • For batch_feature=1, all tasks map to gemm_idx=0
  • m_idx correctly ranges over the blocks within each batch

Testing

Tested on SpaceMit K1 RISC-V64 board with:

  • Model: qwen2.5:0.5b (Q4_0 quantization)
  • Configuration: 4 threads, gemm_m=30, batch_feature=1
  • Before fix: Immediate SIGBUS crash with invalid pointer 0x20451
  • After fix: Model runs successfully, inference completes normally

Files Changed

  • ggml/src/ggml-cpu/spacemit/ime.cpp: Fix task-to-batch index calculation (lines 488-490)

Related Issues

This bug was discovered while integrating the SpaceMit backend into Ollama, where the Go runtime's thread scheduling exposed the out-of-bounds access more readily than in llama.cpp's native threading model.


Verification:

# Build for SpaceMit
cmake -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DGGML_CPU_RISCV64_SPACEMIT=ON \
    -DGGML_RVV=ON \
    -DGGML_RV_ZFH=ON \
    -DRISCV64_SPACEMIT_IME_SPEC=RISCV64_SPACEMIT_IME1
make -C build -j8

# Run with matrix size that triggers the bug
./build/bin/llama-cli -m model.gguf -t 4

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 17, 2025
@ggerganov
Copy link
Member

cc @alex-spacemit

Copy link
Collaborator

@alex-spacemit alex-spacemit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, there were some mistakes during the code naming standardization.Thank you.

@muggle-stack
Copy link
Contributor Author

well, there were some mistakes during the code naming standardization.Thank you.

Thanks for the review. Boss Alex.
@ggerganov Please help merge this PR when convenient. 🙏

@ggerganov ggerganov merged commit 342c728 into ggml-org:master Oct 17, 2025
69 of 70 checks passed
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
…org#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants