Commit b7758b8
committed
Switch llama3 and qwen3 model configs from sdpa/causal to flex/block_causal
With packed datasets, sdpa/causal allows cross-document attention
leakage and uses sequential positions across document boundaries.
flex/block_causal isolates documents in attention and enables
per-document RoPE position IDs.
llama4, deepseek_v3, and gpt_oss already used flex/block_causal.1 parent 5ddb317 commit b7758b8
2 files changed
+32
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
| |||
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| 61 | + | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| |||
130 | 132 | | |
131 | 133 | | |
132 | 134 | | |
133 | | - | |
| 135 | + | |
| 136 | + | |
134 | 137 | | |
135 | 138 | | |
136 | 139 | | |
| |||
142 | 145 | | |
143 | 146 | | |
144 | 147 | | |
| 148 | + | |
145 | 149 | | |
146 | 150 | | |
147 | 151 | | |
| |||
219 | 223 | | |
220 | 224 | | |
221 | 225 | | |
222 | | - | |
| 226 | + | |
| 227 | + | |
223 | 228 | | |
224 | 229 | | |
225 | 230 | | |
| |||
248 | 253 | | |
249 | 254 | | |
250 | 255 | | |
251 | | - | |
| 256 | + | |
| 257 | + | |
252 | 258 | | |
253 | 259 | | |
254 | 260 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| 61 | + | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| |||
76 | 78 | | |
77 | 79 | | |
78 | 80 | | |
| 81 | + | |
79 | 82 | | |
80 | 83 | | |
81 | 84 | | |
| |||
106 | 109 | | |
107 | 110 | | |
108 | 111 | | |
109 | | - | |
| 112 | + | |
| 113 | + | |
110 | 114 | | |
111 | 115 | | |
112 | 116 | | |
| |||
137 | 141 | | |
138 | 142 | | |
139 | 143 | | |
140 | | - | |
| 144 | + | |
| 145 | + | |
141 | 146 | | |
142 | 147 | | |
143 | 148 | | |
| |||
168 | 173 | | |
169 | 174 | | |
170 | 175 | | |
171 | | - | |
| 176 | + | |
| 177 | + | |
172 | 178 | | |
173 | 179 | | |
174 | 180 | | |
| |||
198 | 204 | | |
199 | 205 | | |
200 | 206 | | |
201 | | - | |
| 207 | + | |
| 208 | + | |
202 | 209 | | |
203 | 210 | | |
204 | 211 | | |
| |||
228 | 235 | | |
229 | 236 | | |
230 | 237 | | |
231 | | - | |
| 238 | + | |
| 239 | + | |
232 | 240 | | |
233 | 241 | | |
234 | 242 | | |
| |||
258 | 266 | | |
259 | 267 | | |
260 | 268 | | |
261 | | - | |
| 269 | + | |
| 270 | + | |
262 | 271 | | |
263 | 272 | | |
264 | 273 | | |
| |||
300 | 309 | | |
301 | 310 | | |
302 | 311 | | |
303 | | - | |
| 312 | + | |
| 313 | + | |
304 | 314 | | |
305 | 315 | | |
306 | 316 | | |
| |||
341 | 351 | | |
342 | 352 | | |
343 | 353 | | |
344 | | - | |
| 354 | + | |
| 355 | + | |
345 | 356 | | |
346 | 357 | | |
347 | 358 | | |
| |||
382 | 393 | | |
383 | 394 | | |
384 | 395 | | |
385 | | - | |
| 396 | + | |
| 397 | + | |
386 | 398 | | |
387 | 399 | | |
388 | 400 | | |
| |||
0 commit comments