llama : skip loading unused tensors #12004

ggerganov · 2025-02-21T15:47:42Z

While working on the new Encoder-Decoder context, I noticed that the following use case crashes on master:

Model: https://huggingface.co/google-t5/t5-small
Mac:

make -j && lldb -- ./bin/llama-cli -m ../models/google-t5-small/ggml-model-f16.gguf -p 'Translate from English to German: The house is wonderful.' -dev none

0.00.117.532 I load_tensors: loading model tensors, this can take a while... (mmap = true)
0.00.119.228 I load_tensors: offloading 6 repeating layers to GPU
0.00.119.229 I load_tensors: offloading output layer to GPU
0.00.119.229 I load_tensors: offloaded 7/7 layers to GPU
0.00.119.231 I load_tensors:  CPU_AARCH64 model buffer size =     0.00 MiB
0.00.119.231 I load_tensors:   CPU_Mapped model buffer size =   115.44 MiB
Process 69539 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100c972fc libggml-cpu.dylib`ggml_backend_cpu_aarch64_buffer_set_tensor(buffer=0x0000600003f5c150, tensor=0x0000000100e88020, data=0x00000001059efa60, offset=0, size=512) at ggml-cpu-aarch64.cpp:4150:41
   4147	   GGML_ASSERT(size == ggml_nbytes(tensor));
   4148	
   4149	   auto tensor_traits = (ggml::cpu::aarch64::tensor_traits_base *) tensor->extra;
-> 4150	   auto OK            = tensor_traits->repack(tensor, data, size);
   4151	
   4152	   GGML_ASSERT(OK == 0);
   4153	   GGML_UNUSED(buffer);
Target 0: (llama-cli) stopped.

Process 69539 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100c972fc libggml-cpu.dylib`ggml_backend_cpu_aarch64_buffer_set_tensor(buffer=0x0000600003f5c150, tensor=0x0000000100e88020, data=0x00000001059efa60, offset=0, size=512) at ggml-cpu-aarch64.cpp:4150:41
   4147	   GGML_ASSERT(size == ggml_nbytes(tensor));
   4148	
   4149	   auto tensor_traits = (ggml::cpu::aarch64::tensor_traits_base *) tensor->extra;
-> 4150	   auto OK            = tensor_traits->repack(tensor, data, size);
   4151	
   4152	   GGML_ASSERT(OK == 0);
   4153	   GGML_UNUSED(buffer);
Target 0: (llama-cli) stopped.
(lldb) print *tensor
(ggml_tensor) {
  type = GGML_TYPE_F16
  buffer = 0x0000600003f5c150
  ne = ([0] = 8, [1] = 32, [2] = 1, [3] = 1)
  nb = ([0] = 2, [1] = 16, [2] = 512, [3] = 512)
  op = GGML_OP_NONE
  op_params = {
    [0] = 0
    [1] = 0
    [2] = 0
    [3] = 0
    [4] = 0
    [5] = 0
    [6] = 0
    [7] = 0
    [8] = 0
    [9] = 0
    [10] = 0
    [11] = 0
    [12] = 0
    [13] = 0
    [14] = 0
    [15] = 0
  }
  flags = 0
  src = {
    [0] = nullptr
    [1] = nullptr
    [2] = nullptr
    [3] = nullptr
    [4] = nullptr
    [5] = nullptr
    [6] = nullptr
    [7] = nullptr
    [8] = nullptr
    [9] = nullptr
  }
  view_src = nullptr
  view_offs = 0
  data = 0x0000000100e98000
  name = "dec.blk.0.cross_attn_rel_b.weight"
  extra = 0x0000000000000000
  padding = ""
}

The current logic tries to assign the LLM_TENSOR_DEC_CROSS_ATTN_REL_B tensor to the AARCH64 buffer type because it's tensor-info op is set as GGML_OP_NONE:

    // this tensor is loaded for T5, but never used
    {LLM_TENSOR_DEC_CROSS_ATTN_REL_B,       {LLM_TENSOR_LAYER_REPEATING, GGML_OP_NONE}},

With the patch in this PR, all such tensors will now be assigned to the host buffer type and a warning will be printed:

0.00.127.598 W tensor dec.blk.0.cross_attn_rel_b.weight has no operation assigned, using host buffer

ggml-ci

slaren · 2025-02-21T15:54:52Z

We could skip loading unused tensors entirely:

diff --git a/src/llama-model.cpp b/src/llama-model.cpp
index 0f4b62c43..55796dd7d 100644
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@@ -1424,6 +1424,12 @@ bool llama_model::load_tensors(llama_model_loader & ml) {
                 throw std::runtime_error(format("missing tensor info mapping for %s", tn.str().c_str()));
             }

+            // skip unused tensors
+            if (info.op == GGML_OP_NONE) {
+                LLAMA_LOG_WARN("model has unused tensor %s\n", tn.str().c_str());
+                return nullptr;
+            }
+
             // tensors with "bias" suffix are always used with GGML_OP_ADD
             ggml_op op;
             bool bias = tn.suffix != nullptr && strcmp(tn.suffix, "bias") == 0;

Might also need to increase ml.n_created to avoid failure due to unused tensors.

ggml-ci

* llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci

llama : assign unknown/unused tensors to host buffer type

5014f38

ggml-ci

ggerganov requested a review from slaren February 21, 2025 15:48

llama : skip unused tensors

ed2571e

ggml-ci

slaren approved these changes Feb 21, 2025

View reviewed changes

ggerganov merged commit 51f311e into master Feb 21, 2025
52 checks passed

ggerganov deleted the gg/enc-dev-fix branch February 21, 2025 16:33

ggerganov changed the title ~~llama : assign unknown/unused tensors to host buffer type~~ llama : skip loading unused tensors Feb 21, 2025

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

llama : skip loading unused tensors (ggml-org#12004)

3f70c87

* llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

llama : skip loading unused tensors (ggml-org#12004)

822e451

* llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

llama : skip loading unused tensors (ggml-org#12004)

e043d42

* llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci

mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025

llama : skip loading unused tensors (ggml-org#12004)

9309823

* llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : skip loading unused tensors #12004

llama : skip loading unused tensors #12004

Uh oh!

ggerganov commented Feb 21, 2025

Uh oh!

slaren commented Feb 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llama : skip loading unused tensors #12004

llama : skip loading unused tensors #12004

Uh oh!

Conversation

ggerganov commented Feb 21, 2025

Uh oh!

slaren commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slaren commented Feb 21, 2025 •

edited

Loading