Commit baa7938
committed
refactor: improve gradient computation structure in FactorTransferDistillationStrategy
Refactored ComputeGradient to compute combined gradients first, then apply
(1.0 - _factorWeight) scaling exactly once per element. This eliminates the
separate final loop and makes the logic clearer.
Changes:
- Compute softGrad = temperature-scaled soft difference (without factorWeight)
- Compute hardGrad = studentProbs - trueLabels
- Form combined = Alpha * hardGrad + (1 - Alpha) * softGrad
- Multiply final combined by (1.0 - _factorWeight) before assigning to gradient[i]
- Handles both trueLabels != null and trueLabels == null cases cleanly
This ensures (1.0 - _factorWeight) is applied exactly once per gradient element
in a single assignment, improving clarity and efficiency.1 parent 5402416 commit baa7938
File tree
1 file changed
+22
-12
lines changed- src/KnowledgeDistillation/Strategies
1 file changed
+22
-12
lines changedLines changed: 22 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | 113 | | |
120 | 114 | | |
121 | 115 | | |
122 | 116 | | |
123 | 117 | | |
124 | 118 | | |
125 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
126 | 125 | | |
127 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
128 | 129 | | |
129 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
130 | 134 | | |
131 | 135 | | |
132 | | - | |
133 | | - | |
134 | | - | |
| 136 | + | |
135 | 137 | | |
136 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
137 | 147 | | |
138 | 148 | | |
139 | 149 | | |
| |||
0 commit comments