Commit 33f7f48
Fix: fix ignore_index not being applied in JSD distillation loss (#974)
## Summary
Fix `ignore_index` parameter not being applied in
`LigerFusedLinearJSDLoss`.
The `ignore_index` parameter was accepted but never used in
`distillation_loss_fn`, causing all tokens (including padding/prompt) to
be included in loss computation.
### Changes
- Change `reduction='sum'` to `reduction='none'` for per-token masking
- Use `masked_fill` for dtype preservation (prevent bf16 → fp32
promotion)
- Add `clamp_min(1)` to prevent NaN when all tokens ignored
- Normalize by `num_valid_tokens` instead of `full_target.shape[0]`
- Add comprehensive ignore_index tests
## Testing Done
- Hardware Type: H100
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
---------
Co-authored-by: Sunghyun Cho <[email protected]>1 parent fe1ea95 commit 33f7f48
File tree
4 files changed
+59
-19
lines changed- src/liger_kernel/chunked_loss
- test
- chunked_loss
4 files changed
+59
-19
lines changedLines changed: 8 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
| 135 | + | |
| 136 | + | |
136 | 137 | | |
137 | | - | |
138 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
139 | 144 | | |
140 | 145 | | |
141 | 146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
21 | 23 | | |
22 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
26 | 32 | | |
27 | 33 | | |
28 | | - | |
| 34 | + | |
29 | 35 | | |
30 | | - | |
| 36 | + | |
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
34 | 40 | | |
35 | 41 | | |
36 | 42 | | |
37 | | - | |
38 | | - | |
| 43 | + | |
| 44 | + | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
42 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
43 | 58 | | |
44 | 59 | | |
45 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | | - | |
64 | | - | |
| 67 | + | |
| 68 | + | |
65 | 69 | | |
66 | | - | |
67 | | - | |
68 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
69 | 78 | | |
70 | 79 | | |
71 | 80 | | |
| |||
182 | 191 | | |
183 | 192 | | |
184 | 193 | | |
| 194 | + | |
185 | 195 | | |
186 | 196 | | |
187 | 197 | | |
| |||
196 | 206 | | |
197 | 207 | | |
198 | 208 | | |
| 209 | + | |
199 | 210 | | |
200 | 211 | | |
201 | 212 | | |
| |||
207 | 218 | | |
208 | 219 | | |
209 | 220 | | |
| 221 | + | |
210 | 222 | | |
211 | 223 | | |
212 | 224 | | |
| |||
218 | 230 | | |
219 | 231 | | |
220 | 232 | | |
| 233 | + | |
221 | 234 | | |
222 | 235 | | |
223 | 236 | | |
| |||
243 | 256 | | |
244 | 257 | | |
245 | 258 | | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
246 | 264 | | |
247 | 265 | | |
248 | 266 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1033 | 1033 | | |
1034 | 1034 | | |
1035 | 1035 | | |
1036 | | - | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
1037 | 1039 | | |
1038 | 1040 | | |
1039 | 1041 | | |
0 commit comments