Skip to content

Misc. bug: llama-fit-params stuck in infinite loop #18337

@jettoblack

Description

@jettoblack

Name and Version

version: 7524 (5ee4e43)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

llama-fit-params -hf unsloth/GLM-4.7-GGUF:IQ2_XXS -c 65536

Problem description & steps to reproduce

On a host with 4x 3090 24GB, 1x 5060Ti 16GB, 64GB host RAM

Output without --verbose, here's where it gets stuck and stops logging:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 5 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 4: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
common_download_file_single_online: using cached file: /home/jasonl/.cache/llama.cpp/unsloth_GLM-4.7-GGUF_UD-IQ2_XXS_GLM-4.7-UD-IQ2_XXS-00001-of-00003.gguf
common_download_file_single_online: using cached file: /home/jasonl/.cache/llama.cpp/unsloth_GLM-4.7-GGUF_UD-IQ2_XXS_GLM-4.7-UD-IQ2_XXS-00002-of-00003.gguf
common_download_file_single_online: using cached file: /home/jasonl/.cache/llama.cpp/unsloth_GLM-4.7-GGUF_UD-IQ2_XXS_GLM-4.7-UD-IQ2_XXS-00003-of-00003.gguf
build: 7524 (5ee4e43f2) with GNU 13.3.0 for Linux x86_64
llama_params_fit_impl: projected memory use with initial parameters [MiB]:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 3090)   :  24124 total,  27958 used,   4099 deficit
llama_params_fit_impl:   - CUDA1 (NVIDIA GeForce RTX 3090)   :  24124 total,  30479 used,   6620 deficit
llama_params_fit_impl:   - CUDA2 (NVIDIA GeForce RTX 3090)   :  24124 total,  29214 used,   5355 deficit
llama_params_fit_impl:   - CUDA3 (NVIDIA GeForce RTX 3090)   :  24115 total,  28974 used,   5131 deficit
llama_params_fit_impl:   - CUDA4 (NVIDIA GeForce RTX 5060 Ti):  15848 total,  18177 used,   2468 deficit
llama_params_fit_impl: projected to use 134804 MiB of device memory vs. 112335 MiB of free device memory
llama_params_fit_impl: cannot fulfill margin of 1024 MiB on all devices, need to use 28796 MiB less in total
llama_params_fit_impl: context size set by user to 65536 -> no change
llama_params_fit_impl: with only dense weights in device memory there is a total surplus of 71665 MiB
llama_params_fit_impl: filling dense-only layers back-to-front:
llama_params_fit_impl:   - CUDA4 (NVIDIA GeForce RTX 5060 Ti): 41 layers,  14446 MiB used,   1262 MiB free
llama_params_fit_impl:   - CUDA3 (NVIDIA GeForce RTX 3090)   : 53 layers,  18698 MiB used,   5143 MiB free
llama_params_fit_impl:   - CUDA2 (NVIDIA GeForce RTX 3090)   :  0 layers,      0 MiB used,  23859 MiB free
llama_params_fit_impl:   - CUDA1 (NVIDIA GeForce RTX 3090)   :  0 layers,      0 MiB used,  23859 MiB free
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 3090)   :  0 layers,   1032 MiB used,  22826 MiB free
llama_params_fit_impl: converting dense-only layers to full layers and filling them front-to-back with overflow to next device/system memory:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 3090)   : 18 layers ( 1 overflowing),  22833 MiB used,   1025 MiB free
llama_params_fit_impl:   - CUDA1 (NVIDIA GeForce RTX 3090)   : 15 layers ( 1 overflowing),  22430 MiB used,   1428 MiB free
llama_params_fit_impl:   - CUDA2 (NVIDIA GeForce RTX 3090)   : 15 layers ( 1 overflowing),  22657 MiB used,   1201 MiB free

First Bad Commit

No response

Relevant log output

When running with --verbose, the output log grows seemingly forever, or at least as long as I was willing to let it run. Here's ~ 200MB worth zipped:

[fit.txt.gz](https://github.com/user-attachments/files/24321672/fit.txt.gz)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions