Commit 614c6e6
committed
fix: Use Q8_0 for all embedding quantizations for granite and granitemoe
At lower precision levels, the models can manifest numerical instability,
especially with batch size > 1. This shows up as nondeterministic stopping
when index 0 (the EOG token) has a seemingly uninitialized large value in
the logits.
Branch: GraniteEmbedQuant
Signed-off-by: Gabe Goodhart <[email protected]>1 parent 3edfa7d commit 614c6e6
1 file changed
+5
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
175 | 178 | | |
176 | 179 | | |
177 | 180 | | |
| |||
0 commit comments