Commit ffc4ab7
authored
Feat/dflash training improvements (#463)
* feat(dflash): add random anchor sampling, loss decay, and sync with upstream
- Add random anchor sampling for block construction (paper Sec 4.2)
- Add exponential loss decay weighting (paper Sec 4.2, Eq.4, Appendix A.3.1)
- Sync with upstream: dflash_config in config.json, mask_token_id from config, decoupled target_layer_ids
- Align training hyperparams with paper (lr=6e-4, warmup=0.04, epochs=6, max_length=3072)
- Fix auto_map and saved model file name for HuggingFace compatibility
* fix(dflash): per-sample anchor sampling with padding block isolation
- Sample anchors independently per batch sample (max strategy)
- Mark padding blocks with block_id=-1 for attention isolation
- Padding blocks excluded from both attention and loss computation
- Use absolute positions from gather_idx for position encoding
- Per-sample block_ids throughout: attention mask, loss mask, noise input
* fix(dflash): align acceptance rate metric with inference and trust_remote_code kwarg
- Use loss_mask to exclude prompt tokens from acc calculation, only
measure on completion/assistant blocks to match inference behavior
- Replace token-level accuracy with cumprod-based acceptance length
- Clean up debug prints and redundant comments
- Add trust_remote_code kwarg1 parent 6c27152 commit ffc4ab7
File tree
8 files changed
+351
-103
lines changed- configs
- examples
- scripts
- specforge
- core
- modeling
- draft
- target
8 files changed
+351
-103
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | | - | |
32 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
34 | 39 | | |
35 | 40 | | |
36 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
25 | 30 | | |
26 | 31 | | |
27 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
70 | 88 | | |
71 | 89 | | |
72 | 90 | | |
| |||
81 | 99 | | |
82 | 100 | | |
83 | 101 | | |
84 | | - | |
| 102 | + | |
85 | 103 | | |
86 | | - | |
87 | | - | |
88 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
89 | 107 | | |
90 | 108 | | |
91 | 109 | | |
| |||
152 | 170 | | |
153 | 171 | | |
154 | 172 | | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
155 | 177 | | |
156 | 178 | | |
157 | 179 | | |
| |||
265 | 287 | | |
266 | 288 | | |
267 | 289 | | |
268 | | - | |
| 290 | + | |
269 | 291 | | |
270 | 292 | | |
271 | 293 | | |
| |||
274 | 296 | | |
275 | 297 | | |
276 | 298 | | |
277 | | - | |
| 299 | + | |
278 | 300 | | |
279 | 301 | | |
280 | 302 | | |
| |||
344 | 366 | | |
345 | 367 | | |
346 | 368 | | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
347 | 377 | | |
348 | 378 | | |
349 | 379 | | |
| |||
369 | 399 | | |
370 | 400 | | |
371 | 401 | | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
372 | 405 | | |
373 | 406 | | |
374 | 407 | | |
| |||
0 commit comments