Commit 39802c7
committed
Add soft capping to reversible embedding layer (#1718)
Forgetting the final output soft-cap is a really easy mistake,
and worse, outputs will still look plausible for generations without
the softcap, just with worse actual results.
Adding this to our reversible embedding layer will be much more robust.
As long as you use the layer to compute logits over the vocab, you can
no longer forget the soft-cap.1 parent 8499c92 commit 39802c7
File tree
4 files changed
+21
-14
lines changed- keras_nlp/src
- layers/modeling
- models/gemma
4 files changed
+21
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
| |||
91 | 95 | | |
92 | 96 | | |
93 | 97 | | |
| 98 | + | |
94 | 99 | | |
95 | 100 | | |
96 | 101 | | |
| |||
104 | 109 | | |
105 | 110 | | |
106 | 111 | | |
| 112 | + | |
107 | 113 | | |
108 | 114 | | |
109 | 115 | | |
| |||
125 | 131 | | |
126 | 132 | | |
127 | 133 | | |
128 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
129 | 140 | | |
130 | 141 | | |
131 | 142 | | |
| |||
135 | 146 | | |
136 | 147 | | |
137 | 148 | | |
| 149 | + | |
138 | 150 | | |
139 | 151 | | |
140 | 152 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
83 | 90 | | |
84 | 91 | | |
85 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
135 | 136 | | |
136 | 137 | | |
137 | 138 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | 230 | | |
238 | 231 | | |
239 | 232 | | |
| |||
445 | 438 | | |
446 | 439 | | |
447 | 440 | | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | 441 | | |
455 | 442 | | |
456 | 443 | | |
| |||
0 commit comments