Bug: attention max batch option doesn't work for GLM 4.5

### What happened?

I'm not sure if it's a param exclusive to deepseek. So I disabled FA and got 7 GB compute buffers requested

```
llama_new_context_with_model: n_ctx      = 20000
llama_new_context_with_model: n_batch    = 1024
llama_new_context_with_model: n_ubatch   = 1024
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: mla_attn   = 0
llama_new_context_with_model: attn_max_b = 512
llama_new_context_with_model: fused_moe  = 1
llama_new_context_with_model: fused_up_gate = 1
llama_new_context_with_model: ser        = -1, 0
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1093.75 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA2 KV buffer size =   703.12 MiB
llama_kv_cache_init:      CUDA3 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA4 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA5 KV buffer size =   703.12 MiB
llama_kv_cache_init:      CUDA6 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA7 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA8 KV buffer size =   312.50 MiB
llama_kv_cache_init:      CUDA9 KV buffer size =   312.50 MiB
llama_kv_cache_init:     CUDA10 KV buffer size =   234.38 MiB
llama_new_context_with_model: KV self size  = 7265.62 MiB, K (f16): 3632.81 MiB, V (f16): 3632.81 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     1.16 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=1)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 7734.13 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 8109821952
llama_new_context_with_model: failed to allocate compute buffers
```

### Name and Version

3433c7b56d11227c604f81b67b83824097d7923d

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: attention max batch option doesn't work for GLM 4.5 #755

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: attention max batch option doesn't work for GLM 4.5 #755

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions