Eval bug: [Mac] LoRA inference crashes with GGML_ASSERT graph size error (works on Windows)

### Name and Version

llama-cli version 6710 (74b8fc17)
Also tested with:
- Official xcframework (latest release from GitHub)
- Homebrew llama.cpp version 6710 and last 2 builds
Both exhibit the same crash.

### Operating systems

Mac

### GGML backends

Metal, CPU

### Hardware

Apple M2 Ultra

### Models

Base Model:
- Name: Hermes-3-Llama-3.2-3B
- Quantization: Q4_0
- Size: 1.8GB
- Source: https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF
- File: Hermes-3-Llama-3.2-3B_q4_0.gguf

LoRA Adapter:
- Name: Hermes-3-Llama-3.2-3B_adapter.
- Size: 93MB
- Format: GGUF
- Trained and tested on Windows (works fine on llama cli there)

### Problem description & steps to reproduce

LoRA adapter inference crashes on Mac with a graph size assertion error, but works perfectly on Windows. Base model inference and LoRA adapter loading work fine on Mac - only inference with an active LoRA adapter crashes.
TEPS TO REPRODUCE:

1. Test base model (this works):
   llama-cli -m Hermes-3-Llama-3.2-3B_q4_0.gguf \
   -n 20 -ngl 99 -c 2048 -b 256
   Result: Works perfectly, generates text

2. Test with LoRA adapter (this crashes):
   llama-cli -m Hermes-3-Llama-3.2-3B_q4_0.gguf \
     --lora gandalf_Hermes-3-Llama-3.2-3B_adapter.gguf \
     -n 20 -ngl 99 -c 2048 -b 256
   Result: Crashes immediately with GGML_ASSERT

Workaround attempted(all failed):
- Reduced context: -c 512, -c 1024
- Reduced batch: -b 128
- CPU only: -ngl 0
- Limited threads: -t 4
- Different prompts
- Latest xcframework (just updated)
- Homebrew build (v6710)

- Windows: Base + LoRA works perfectly 
- Mac (xcframework): Base works, LoRA crashes 
- Mac (Homebrew): Base works, LoRA crashes 

### First Bad Commit

Unable to determine - issue appears to exist in latest and last 2 versions.

### Relevant log output

```shell
Command executed:
llama-cli -m models/hermes3/base/Hermes-3-Llama-3.2-3B_q4_0.gguf --lora models/hermes3/adapters/Hermes-3-Llama-3.2-3B_adapter.gguf -p "Tell me about wizards." -n 20 -ngl 99 -c 2048 -b 256

Output:
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M2 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 115448.73 MB
build: 6710 (74b8fc17) with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin25.0.0
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Metal (Apple M2 Ultra) (unknown id) - 110100 MiB free
[... model loading succeeds ...]
llama_lora_adapter_init_impl: applying lora adapter from 'models/hermes3/adapters/gandalf_Hermes-3-Llama-3.2-3B_adapter.gguf'
[... LoRA loading succeeds ...]

Process 40788 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x000000018d35142c libsystem_kernel.dylib`__wait4 + 8

Stack trace:
frame #0: libsystem_kernel.dylib`__wait4 + 8
frame #1: libggml-base.dylib`ggml_abort + 156
frame #2: libggml-base.dylib`ggml_backend_sched_alloc_graph + 464
frame #3: libllama.dylib`llama_context::process_ubatch + 516
frame #4: libllama.dylib`llama_context::decode + 1148
frame #5: libllama.dylib`llama_decode + 20
frame #6: llama-cli`common_init_from_params + 2168
frame #7: llama-cli`main + 636

Error message:
/Users/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed

Exit: signal SIGABRT (abort)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: [Mac] LoRA inference crashes with GGML_ASSERT graph size error (works on Windows) #16475

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: [Mac] LoRA inference crashes with GGML_ASSERT graph size error (works on Windows) #16475

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions