Commit d590cd4
authored
model : Granite MoE shared (ggml-org#13269)
* feat: Add GGUF conversion for granitemoeshared
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* feat: hparam and arch plumbing for granitemoeshared
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Split MoE fused tensors for shared experts in conversion
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* feat: First WIP cut at model arch in cpp
The hparam and architecture plumbing should be correct, but the
implementation of the shared experts seems to still be broken.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Cleaner (maybe more correct?) splitting for gate/up
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Fix the input to the shared experts
I had misread that the shared experts take the inputs _before_ the standard
MoE layer and was feeding the output of the MoE to the shared experts.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Avoid architecture-specific checks for Granite MoE Shared
This is a cleaner way that will allow more flexibility in architecture
strings going forward.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* refactor: Split granite architectures out of llm_build_llama
This helps de-clutter the llama-family graph construction and allows
granite to diverge further (in preparation for Granite 4).
NOTE: I removed the granite scale factors from llm_build_deci because they
appear to only be there as copy-paste from llm_build_llama. The HF config
does not seem to set those values:
https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Fix compiler warning about uninitialized inp_pos
This should not have been reachable, but it warns on some compliers
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Consoladate GraniteMoEShared into GraniteMoE for conversion
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <[email protected]>
---------
Signed-off-by: Gabe Goodhart <[email protected]>1 parent 1e2809b commit d590cd4
File tree
5 files changed
+235
-35
lines changed- gguf-py/gguf
- src
5 files changed
+235
-35
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5746 | 5746 | | |
5747 | 5747 | | |
5748 | 5748 | | |
5749 | | - | |
| 5749 | + | |
5750 | 5750 | | |
5751 | 5751 | | |
5752 | 5752 | | |
5753 | 5753 | | |
| 5754 | + | |
| 5755 | + | |
| 5756 | + | |
| 5757 | + | |
| 5758 | + | |
| 5759 | + | |
| 5760 | + | |
| 5761 | + | |
| 5762 | + | |
5754 | 5763 | | |
5755 | 5764 | | |
5756 | 5765 | | |
| |||
5761 | 5770 | | |
5762 | 5771 | | |
5763 | 5772 | | |
5764 | | - | |
| 5773 | + | |
5765 | 5774 | | |
5766 | 5775 | | |
5767 | 5776 | | |
5768 | 5777 | | |
5769 | 5778 | | |
| 5779 | + | |
| 5780 | + | |
| 5781 | + | |
| 5782 | + | |
| 5783 | + | |
| 5784 | + | |
| 5785 | + | |
| 5786 | + | |
| 5787 | + | |
5770 | 5788 | | |
5771 | 5789 | | |
5772 | 5790 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1905 | 1905 | | |
1906 | 1906 | | |
1907 | 1907 | | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
1908 | 1911 | | |
1909 | 1912 | | |
1910 | 1913 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
| 431 | + | |
431 | 432 | | |
432 | 433 | | |
433 | 434 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1481 | 1481 | | |
1482 | 1482 | | |
1483 | 1483 | | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
1484 | 1487 | | |
1485 | 1488 | | |
1486 | 1489 | | |
| |||
0 commit comments