Commit 87187b1
authored
[chunked loss] align teacher and student logit shape (#634)
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
In rare cases where the teacher and student models don't have the same
vocab size (but their vocabs are actually the same), for example qwen
models, we pad students to match the teacher's logit.
<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->
## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
make test
<!--
Replace BLANK with your device type. For example, A100-80G-PCIe
Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->
- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence1 parent 3a5845b commit 87187b1
1 file changed
+15
-0
lines changedLines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
118 | 133 | | |
119 | 134 | | |
120 | 135 | | |
| |||
0 commit comments