Skip to content

Conversation

@pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Sep 25, 2025

@ggerganov I know you said you were planning to do it, but honestly it's been a nightmare working on all the model implementations with the huge llama-model.cpp, so I wanted to just get the "easy" albeit tedious part out of the way. Moved all llm_build_* definitions to their separate class files in src/models/

@CISC
Copy link
Collaborator

CISC commented Sep 25, 2025

This is a nightmare to review and rebase until merged though, also you seem to have pushed the same changes to your Qwen3-Next PR?

@pwilkin
Copy link
Collaborator Author

pwilkin commented Sep 25, 2025

@CISC Ye, need them for working there, but I'll revert once I'm done (unless this is done first and I can merge on top).

I know it's a nightmare, I already kind of went through it when I asked an LLM to automate some tasks and it proceeded merrily ripping out methods just because some classes didn't inherit from llm_graph_context :>

If you want, I can write a script that runs tree-sitter on the original definitions in llama-model.cpp vs the new classes and shows any differences to verify that nothing was accidentally lost.

@jacekpoplawski
Copy link
Contributor

Maybe it would be easier to refactor that partially, just a subset of the models?

@ngxson ngxson mentioned this pull request Sep 29, 2025
@ngxson
Copy link
Collaborator

ngxson commented Sep 29, 2025

This seems to be a good change, just have some other ideas:

  1. I think for now all models can share one single .h file. The main .cpp implementation can be split into smaller files as proposed here
  2. Our naming convention uses - instead of _. Maybe just src/models/(model-name).cpp is enough, no need the llm_build prefix. For example: src/models/gpt2.cpp

@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 24, 2025

All right, I've done the changes suggested by @ngxson - renamed the files to just .cpp and merged all the header files into one models.h. I've also merged all the current changes from master.

Could we possibly move this forward somehow? This isn't going to get any easier any time soon... @ggerganov ?

@ggerganov
Copy link
Member

Yes, thanks for the help. I will take a look soon at the Qwen 3 Next PR and also this reorganization here. Likely over the weekend.

@ngxson
Copy link
Collaborator

ngxson commented Oct 24, 2025

The diff is too big so I had a look at some files under your fork instead.

It looks good overall, just a nitpick stuff: Seems like many files starts with this pattern:

(maybe one empty line)
#include "models.h"
(some empty line)
...

While it's cleaner to be:

#include "models.h"
(one empty line)
...

Should be a real quick regex replace to change

@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 24, 2025

@ngxson aye, done.

@pwilkin pwilkin requested review from 0cc4m and slaren as code owners October 25, 2025 16:50
@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning labels Oct 25, 2025
@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Oct 27, 2025
@NeoZhangJianyu
Copy link
Collaborator

@pwilkin
There are 117 changed files in this PR.
You need to rebase this PR to show clear code change.

Thank you!

@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 28, 2025

@pwilkin There are 117 changed files in this PR. You need to rebase this PR to show clear code change.

Thank you!

Sorry, I don't understand.

Yeah, there are 117 changed files in this PR because it's a refactoring - it extracts the builders from llama-model.cpp to separate files.

I don't think the exact architecture of the PR matters - GitHub will always show the final changes in the PR.

@0cc4m
Copy link
Collaborator

0cc4m commented Oct 28, 2025

No, you have changes that don't belong here, from PRs that have been merged recently. This happens occasionally, not sure what exactly causes it. Rebasing your branch should fix it.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 28, 2025

No, you have changes that don't belong here, from PRs that have been merged recently. This happens occasionally, not sure what exactly causes it. Rebasing your branch should fix it.

Oh, that. Yeah, that's some weird byproduct of merging the current master. I'll try to rebase instead.

@pwilkin pwilkin force-pushed the llama-cpp-refactor branch from 0d6329d to dfd84ad Compare October 28, 2025 17:44
@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 28, 2025

@NeoZhangJianyu @0cc4m did a clean rebase

@CISC CISC added model Model specific refactoring Refactoring and removed testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 28, 2025
@JohannesGaessler
Copy link
Collaborator

merging the current master

My recommendation would be always rebase and/or cherry-pick instead of merge.

@ggerganov
Copy link
Member

Should be OK to merge this now?

@CISC
Copy link
Collaborator

CISC commented Oct 28, 2025

Should be OK to merge this now?

There are some whitespace inconsistencies (f.ex. in model.h, most prototypes are clustered together, but some are separated), also not sure I like the new formatting of the build_xxx parameters, the old clusterings were easier to parse by sight.

Comment on lines +94 to +97
ggml_tensor * moe_out = build_moe_ffn(
cur, model.layers[il].ffn_gate_inp, model.layers[il].ffn_up_exps, model.layers[il].ffn_gate_exps,
model.layers[il].ffn_down_exps, model.layers[il].ffn_exp_probs_b, n_expert, n_expert_used, LLM_FFN_SILU,
true, false, 0.0, LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX, il);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the formatting of these calls should be preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific refactoring Refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants