Bug: RoPE Cache lobotomizes GLM on my setup

### What happened?

Problem: GLM-4.6 output is complete gibberish, mostly spaces and punctuation. GLM-4.5-Air will output words, but quickly devolves into repetition.
Adding `--no-rope-cache` fixes this.

4.6 Model: https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ3_KS
4.5 Air Model: https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF/tree/main/IQ4_KSS

nvidia-smi:
```
$ nvidia-smi 
Mon Nov  3 14:47:11 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0  On |                  N/A |
|  0%   51C    P5             42W /  420W |   22531MiB /  24576MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 5090        Off |   00000000:2B:00.0 Off |                  N/A |
|  0%   43C    P8             22W /  450W |   28411MiB /  32607MiB |     16%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        Off |   00000000:2C:00.0 Off |                  N/A |
|  0%   29C    P8             23W /  420W |   18631MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
```

build config:
```
cmake -B build \
  -DCMAKE_CXX_FLAGS="-march=native -mtune=native -O3" \
  -DCMAKE_C_FLAGS="-march=native -mtune=native -O3" \
  -DGGML_NATIVE=ON \
  -DGGML_SCHED_MAX_COPIES=1 \
  -DGGML_CUDA=ON \
  -DBUILD_SHARED_LIBS=OFF \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA_FA_ALL_QUANTS=ON \
  -DGGML_CUDA_F16=ON \
  -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 \
  -DCMAKE_CUDA_ARCHITECTURES="86;120"
```

4.6 command:
```
llama-server
--no-display-prompt
--verbosity 0
-mla 2
-amb 512
--port 9999
--predict -1
--n-gpu-layers 1000
--main-gpu 0
--parallel 1
--no-warmup
--jinja
--cache-type-k q8_0
--cache-type-v q8_0
--temp 0.6
--min-p 0.1
--presence-penalty 1.5
--alias glm
-rtr
-b 512
-c 100000
--override-tensor "\.(([13467][0-9])|(2[0-7])|(5[012])|(8[01]))\..*exps=CPU"
--override-tensor "blk\.(1?|2|9)[0-9]\.=CUDA0"
--override-tensor "blk\.(3|4|5)[0-9]\.=CUDA1"
--override-tensor "blk\.(6|7|8)[0-9]\.=CUDA2"
--model /mnt/store/ai/GLM4.6/GLM-4.6-IQ3_KS-00001-of-00004.gguf
```

### Name and Version

$ build/bin/llama-server --version
version: 3946 (1cfd1986)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: RoPE Cache lobotomizes GLM on my setup #893

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: RoPE Cache lobotomizes GLM on my setup #893

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions