Skip to content

Bug: attention max batch option doesn't work for GLM 4.5 #755

@ghost

Description

What happened?

I'm not sure if it's a param exclusive to deepseek. So I disabled FA and got 7 GB compute buffers requested

llama_new_context_with_model: n_ctx      = 20000
llama_new_context_with_model: n_batch    = 1024
llama_new_context_with_model: n_ubatch   = 1024
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: mla_attn   = 0
llama_new_context_with_model: attn_max_b = 512
llama_new_context_with_model: fused_moe  = 1
llama_new_context_with_model: fused_up_gate = 1
llama_new_context_with_model: ser        = -1, 0
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1093.75 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA2 KV buffer size =   703.12 MiB
llama_kv_cache_init:      CUDA3 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA4 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA5 KV buffer size =   703.12 MiB
llama_kv_cache_init:      CUDA6 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA7 KV buffer size =   781.25 MiB
llama_kv_cache_init:      CUDA8 KV buffer size =   312.50 MiB
llama_kv_cache_init:      CUDA9 KV buffer size =   312.50 MiB
llama_kv_cache_init:     CUDA10 KV buffer size =   234.38 MiB
llama_new_context_with_model: KV self size  = 7265.62 MiB, K (f16): 3632.81 MiB, V (f16): 3632.81 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     1.16 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=1)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 7734.13 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 8109821952
llama_new_context_with_model: failed to allocate compute buffers

Name and Version

3433c7b

What operating system are you seeing the problem on?

Linux

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions