Commit 8a1ea3e
committed
feat: Add support for dense FFN in GraniteMoeHybrid
This was already partially supported via reusing the granite ffn builder,
and there may be models that leverage this architecture going forward. The
naming is a bit odd, but in the transformers version, it reuses the same
model class and simply has zero regular experts and a single shared expert
(which is the same as a single dense FFN).
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <[email protected]>1 parent 257d436 commit 8a1ea3e
2 files changed
+20
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6538 | 6538 | | |
6539 | 6539 | | |
6540 | 6540 | | |
| 6541 | + | |
| 6542 | + | |
6541 | 6543 | | |
6542 | 6544 | | |
6543 | 6545 | | |
6544 | 6546 | | |
| 6547 | + | |
| 6548 | + | |
| 6549 | + | |
| 6550 | + | |
| 6551 | + | |
| 6552 | + | |
| 6553 | + | |
| 6554 | + | |
| 6555 | + | |
| 6556 | + | |
| 6557 | + | |
6545 | 6558 | | |
6546 | | - | |
6547 | | - | |
| 6559 | + | |
6548 | 6560 | | |
6549 | 6561 | | |
6550 | 6562 | | |
| |||
6569 | 6581 | | |
6570 | 6582 | | |
6571 | 6583 | | |
6572 | | - | |
| 6584 | + | |
6573 | 6585 | | |
6574 | 6586 | | |
6575 | 6587 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2190 | 2190 | | |
2191 | 2191 | | |
2192 | 2192 | | |
| 2193 | + | |
2193 | 2194 | | |
2194 | 2195 | | |
2195 | 2196 | | |
2196 | 2197 | | |
2197 | 2198 | | |
2198 | 2199 | | |
2199 | 2200 | | |
| 2201 | + | |
| 2202 | + | |
| 2203 | + | |
| 2204 | + | |
2200 | 2205 | | |
2201 | 2206 | | |
2202 | 2207 | | |
| |||
0 commit comments