Skip to content

Conversation

@gabriellarson
Copy link
Contributor

This pull request adds initial support for Apple's new DiffuCoder model.

I'll be uploading the F16 gguf to https://huggingface.co/gabriellarson/DiffuCoder-7B-cpGRPO-GGUF

Comment on lines +3160 to +3170
# Read merges.txt
merges_file = self.dir_model / "merges.txt"
merges = []
with open(merges_file, 'r', encoding='utf-8') as f:
for i, line in enumerate(f):
line = line.strip()
if i == 0 and line.startswith("#version:"):
continue
if not line:
continue
merges.append(line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? SpecialVocab should be able to handle this.

layer.ffn_up_exps = create_tensor(tn(LLM_TENSOR_FFN_UP_EXPS, "weight", i), { n_embd, n_ff_exp, n_expert}, 0);
}
} break;
case LLM_ARCH_DREAM:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks (unsurprisingly) identical to Qwen2, I don't think it deserves the duplication.

@CISC
Copy link
Collaborator

CISC commented Jul 2, 2025

Also really puzzled by the fact that it's just a finetuned Qwen2.5 model while at the same time being based on Dream which claims to be a diffusion model?!

@github-actions github-actions bot added the python python script changes label Jul 2, 2025
@jacekpoplawski
Copy link
Contributor

Also really puzzled by the fact that it's just a finetuned Qwen2.5 model while at the same time being based on Dream which claims to be a diffusion model?!

According to PDF they started from Qwen but it's not just a finetune

@gabriellarson
Copy link
Contributor Author

Yeah I was up too late and skimmed through the DiffuCoder paper too fast. The HuggingFace model tree, and section 3 of the paper made it seem like a normal Qwen2.5 finetune. I'll close this request for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants