Skip to content

Conversation

@Inf1delis
Copy link
Contributor

@Inf1delis Inf1delis commented Dec 14, 2024

Self-reported review complexity:

  • Medium

The PR adds support for DeepSeek MoE v1 models (Base and Instruct) & support new GigaChat models (Base and Instruct). Since GigaChat is based on the Deepseek MoE v1 architecture, the changes for that model is limited to the tokenizer.

@github-actions github-actions bot added testing Everything test related python python script changes labels Dec 14, 2024
@Inf1delis
Copy link
Contributor Author

@ggerganov Hi! I think this PR is ready, could you check it up?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions to fix the location of the new DS code to be located before DS2



@Model.register("DeepseekForCausalLM")
class DeepseekModel(Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move before DeepseekV2Model above

src/llama.cpp Outdated
}
}
} break;
case LLM_ARCH_DEEPSEEK:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move before case LLM_ARCH_DEEPSEEK2 above.

@Inf1delis Inf1delis requested a review from ggerganov December 15, 2024 13:29
@Inf1delis
Copy link
Contributor Author

Thank you for your suggestions! I hadn't noticed that.
The changes have been made: the new DS code is now placed before DeepseekV2Model and before case LLM_ARCH_DEEPSEEK2.

@ggerganov ggerganov merged commit a097415 into ggml-org:master Dec 15, 2024
1 check passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants