Refactor llama-model.cpp #16252

pwilkin · 2025-09-25T13:42:08Z

@ggerganov I know you said you were planning to do it, but honestly it's been a nightmare working on all the model implementations with the huge llama-model.cpp, so I wanted to just get the "easy" albeit tedious part out of the way. Moved all llm_build_* definitions to their separate class files in src/models/

CISC · 2025-09-25T14:02:19Z

This is a nightmare to review and rebase until merged though, also you seem to have pushed the same changes to your Qwen3-Next PR?

pwilkin · 2025-09-25T14:48:32Z

@CISC Ye, need them for working there, but I'll revert once I'm done (unless this is done first and I can merge on top).

I know it's a nightmare, I already kind of went through it when I asked an LLM to automate some tasks and it proceeded merrily ripping out methods just because some classes didn't inherit from llm_graph_context :>

If you want, I can write a script that runs tree-sitter on the original definitions in llama-model.cpp vs the new classes and shows any differences to verify that nothing was accidentally lost.

jacekpoplawski · 2025-09-25T15:23:26Z

Maybe it would be easier to refactor that partially, just a subset of the models?

ngxson · 2025-09-29T03:50:49Z

This seems to be a good change, just have some other ideas:

I think for now all models can share one single .h file. The main .cpp implementation can be split into smaller files as proposed here
Our naming convention uses - instead of _. Maybe just src/models/(model-name).cpp is enough, no need the llm_build prefix. For example: src/models/gpt2.cpp

pwilkin · 2025-10-24T13:41:53Z

All right, I've done the changes suggested by @ngxson - renamed the files to just .cpp and merged all the header files into one models.h. I've also merged all the current changes from master.

Could we possibly move this forward somehow? This isn't going to get any easier any time soon... @ggerganov ?

ggerganov · 2025-10-24T13:47:47Z

Yes, thanks for the help. I will take a look soon at the Qwen 3 Next PR and also this reorganization here. Likely over the weekend.

ngxson · 2025-10-24T21:39:00Z

The diff is too big so I had a look at some files under your fork instead.

It looks good overall, just a nitpick stuff: Seems like many files starts with this pattern:

(maybe one empty line)
#include "models.h"
(some empty line)
...

While it's cleaner to be:

#include "models.h"
(one empty line)
...

Should be a real quick regex replace to change

pwilkin · 2025-10-24T22:10:21Z

@ngxson aye, done.

NeoZhangJianyu · 2025-10-28T01:38:59Z

@pwilkin
There are 117 changed files in this PR.
You need to rebase this PR to show clear code change.

Thank you!

pwilkin · 2025-10-28T17:05:33Z

@pwilkin There are 117 changed files in this PR. You need to rebase this PR to show clear code change.

Thank you!

Sorry, I don't understand.

Yeah, there are 117 changed files in this PR because it's a refactoring - it extracts the builders from llama-model.cpp to separate files.

I don't think the exact architecture of the PR matters - GitHub will always show the final changes in the PR.

0cc4m · 2025-10-28T17:12:38Z

No, you have changes that don't belong here, from PRs that have been merged recently. This happens occasionally, not sure what exactly causes it. Rebasing your branch should fix it.

pwilkin · 2025-10-28T17:15:29Z

No, you have changes that don't belong here, from PRs that have been merged recently. This happens occasionally, not sure what exactly causes it. Rebasing your branch should fix it.

Oh, that. Yeah, that's some weird byproduct of merging the current master. I'll try to rebase instead.

pwilkin · 2025-10-28T17:45:53Z

@NeoZhangJianyu @0cc4m did a clean rebase

pwilkin · 2025-10-29T12:02:58Z

@ggerganov @CISC turns out that formatting those calls is exactly the perfect mindless task to do during a Teams call at work ;)

Should be good now.

CISC · 2025-10-29T12:15:21Z

@ggerganov @CISC turns out that formatting those calls is exactly the perfect mindless task to do during a Teams call at work ;)

Should be good now.

Quickly glanced through them, looks like you missed a few (I take offense sir! :D ):

bailingmoe2
grovemoe
llada

CISC

Is it possible to create a merged diff of all the models so we can verify that nothing got left out?

pwilkin · 2025-10-29T13:12:13Z

Is it possible to create a merged diff of all the models so we can verify that nothing got left out?

Working on it.

pwilkin · 2025-10-29T15:19:36Z

Aight, was a bit annoying getting an LLM to generate all the diffs properly but I think this is it.
llm_build.patch

CISC · 2025-10-29T18:56:44Z

Aight, was a bit annoying getting an LLM to generate all the diffs properly but I think this is it.

I'm not sure that is reliable, who knows what it did, it even messed up the diff several places. :)

I think the only way to get a fairly usable diff is to run both through clang-format and merge all the files in the same order as the original.

pwilkin · 2025-10-29T21:50:23Z

I think the only way to get a fairly usable diff is to run both through clang-format and merge all the files in the same order as the original.

That won't do it as well, you're going to run into a nesting issue. Since the refactoring splits code into header and definition, you'll have level 2 bracketing (class -> method) vs level 1 bracketing (class::method)

pwilkin · 2025-10-31T17:41:51Z

@CISC @ggerganov I fixed the script to properly report diffs, removed all the extra semicolons it reported. We're down to the following, which I think are innocent:

--- reference/llm_build_bitnet::llm_graph_context (clang-formatted)
+++ src/models/llm_build_bitnet::llm_graph_context (clang-formatted)
@@ -68,7 +68,7 @@
       if (model.layers[il].bo) {
         cur = ggml_add(ctx0, cur, model.layers[il].bo);
       }
-      cb(cur, "attn_o_out", il);
+      cb(cur, "attn_out", il);
     }
     if (il == n_layer - 1 && inp_out_ids) {
       cur = ggml_get_rows(ctx0, cur, inp_out_ids);

--- reference/llm_build_lfm2::build_attn_block (clang-formatted)
+++ src/models/llm_build_lfm2::build_attn_block (clang-formatted)
@@ -1,7 +1,7 @@
 {
   GGML_ASSERT(hparams.n_embd_v_gqa(il) == hparams.n_embd_k_gqa(il));
-  auto const n_embd_head = hparams.n_embd_head_v;
-  auto const n_head_kv = hparams.n_head_kv(il);
+  const auto n_embd_head = hparams.n_embd_head_v;
+  const auto n_head_kv = hparams.n_head_kv(il);
   auto *q = build_lora_mm(model.layers[il].wq, cur);
   cb(q, "model.layers.{}.self_attn.q_proj", il);
   auto *k = build_lora_mm(model.layers[il].wk, cur);

--- reference/llm_build_lfm2::build_shortconv_block (clang-formatted)
+++ src/models/llm_build_lfm2::build_shortconv_block (clang-formatted)
@@ -15,7 +15,7 @@
   cb(bcx, "model.layers.{}.conv.in_proj", il);
   constexpr auto n_chunks = 3;
   GGML_ASSERT(bcx->ne[0] % n_chunks == 0);
-  auto const chunk_size = bcx->ne[0] / n_chunks;
+  const auto chunk_size = bcx->ne[0] / n_chunks;
   auto *b =
       ggml_view_3d(ctx0, bcx, chunk_size, bcx->ne[1], bcx->ne[2], bcx->nb[1],
                    bcx->nb[2], 0 * chunk_size * ggml_element_size(bcx));

I think this is ready to merge, let's do it before another model gets added to llama-model.cpp and I have to rebase again :P

CISC · 2025-10-31T20:26:27Z

I think this is ready to merge, let's do it before another model gets added to llama-model.cpp and I have to rebase again :P

Ooops, too late. :P

Mind fixing the last little nits while you're at it, then we'll merge.

pwilkin · 2025-10-31T21:36:09Z

Mind fixing the last little nits while you're at it, then we'll merge.

Meaning those three diffs I just posted? I think all of them are correct fixes in context, so I'd keep them tbh. But if you want I can add them too, just for completeness' sake (though reverting the "attn_o_out" seems weird).

pwilkin · 2025-10-31T21:40:20Z

Minimax is in.

CISC · 2025-10-31T22:27:21Z

Mind fixing the last little nits while you're at it, then we'll merge.

Meaning those three diffs I just posted? I think all of them are correct fixes in context, so I'd keep them tbh. But if you want I can add them too, just for completeness' sake (though reverting the "attn_o_out" seems weird).

Yeah, just because this is a refactor and not changing more than necessary, but nvm.

src/models/models.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC · 2025-11-01T09:50:46Z

@pwilkin Great job BTW, this was a massive undertaking!

pwilkin · 2025-11-01T10:59:14Z

@pwilkin Great job BTW, this was a massive undertaking!

Yeah I think this was the hardest one of the split tasks :)

pwilkin requested review from CISC and ggerganov as code owners September 25, 2025 13:42

ngxson mentioned this pull request Sep 29, 2025

Model: Qwen3 Next #16095

Merged

pwilkin requested review from 0cc4m and slaren as code owners October 25, 2025 16:50

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning labels Oct 25, 2025

pwilkin requested a review from JohannesGaessler as a code owner October 27, 2025 12:33

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Oct 27, 2025

pwilkin force-pushed the llama-cpp-refactor branch from 0d6329d to dfd84ad Compare October 28, 2025 17:44

CISC removed testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes labels Oct 28, 2025

pwilkin force-pushed the llama-cpp-refactor branch 2 times, most recently from 7f82b27 to 803221e Compare October 29, 2025 11:21

CISC approved these changes Oct 29, 2025

View reviewed changes

pwilkin force-pushed the llama-cpp-refactor branch from 39bca8b to c8c8fc7 Compare October 31, 2025 13:08

DajanaV mentioned this pull request Oct 31, 2025

UPSTREAM PR #16252: Refactor llama-model.cpp auroralabs-loci/llama.cpp#19

Closed

pwilkin added 8 commits October 31, 2025 22:39

Sqashed: llama-model.cpp refactoring

0f72889

Fix formatting of attn / ffn / ffn_moe calls

75dc8f6

Fix import regression / unify spacing in models.h

11522ce

totally DID NOT miss those!

857188c

Add missing qwen3vl(moe) models

f447101

Add missing new .cpp files to build

fd0921f

Remove extra semicolons

bafee3b

Editor checker

a81302f

pwilkin force-pushed the llama-cpp-refactor branch from 934128c to a81302f Compare October 31, 2025 21:39

CISC reviewed Oct 31, 2025

View reviewed changes

src/models/models.h Show resolved Hide resolved

Update src/models/models.h [no ci]

7786d4e

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC merged commit bea0452 into ggml-org:master Oct 31, 2025
1 check passed

Refactor llama-model.cpp #16252

Refactor llama-model.cpp #16252

Uh oh!

Conversation

pwilkin commented Sep 25, 2025

Uh oh!

CISC commented Sep 25, 2025

Uh oh!

pwilkin commented Sep 25, 2025

Uh oh!

jacekpoplawski commented Sep 25, 2025

Uh oh!

ngxson commented Sep 29, 2025

Uh oh!

pwilkin commented Oct 24, 2025

Uh oh!

ggerganov commented Oct 24, 2025

Uh oh!

ngxson commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Oct 24, 2025

Uh oh!

NeoZhangJianyu commented Oct 28, 2025

Uh oh!

pwilkin commented Oct 28, 2025

Uh oh!

0cc4m commented Oct 28, 2025

Uh oh!

pwilkin commented Oct 28, 2025

Uh oh!

pwilkin commented Oct 28, 2025

Uh oh!

pwilkin commented Oct 29, 2025

Uh oh!

CISC commented Oct 29, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Oct 29, 2025

Uh oh!

pwilkin commented Oct 29, 2025

Uh oh!

CISC commented Oct 29, 2025

Uh oh!

pwilkin commented Oct 29, 2025

Uh oh!

pwilkin commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

pwilkin commented Oct 31, 2025

Uh oh!

pwilkin commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Nov 1, 2025

Uh oh!

pwilkin commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ngxson commented Oct 24, 2025 •

edited

Loading