Commit 214ff18
committed
fix: restore classical kd signal in variational distillation strategy
Remove incorrect (1.0 - variationalWeight) multiplication that was suppressing
the classical distillation loss and gradient instead of adding variational terms.
This fixes the critical issue where:
- When variationalWeight = 1.0, loss/gradient collapsed to zero
- When variationalWeight < 1.0, only scaled classical KD without variational contribution
Changes:
- ComputeLoss: Remove (1.0 - variationalWeight) scaling from softLoss and combinedLoss
- ComputeGradient: Remove (1.0 - variationalWeight) scaling from soft and hard gradients
Note: This is a baseline fix. Full variational integration (adding variational loss/gradient
weighted by variationalWeight) requires latent representations (mean, logVar) which are not
available in current ComputeLoss/ComputeGradient signatures.
Resolves coderabbitai review comments on lines 63-85 and 87-118.1 parent 1dc2f47 commit 214ff18
File tree
1 file changed
+5
-5
lines changed- src/KnowledgeDistillation/Strategies
1 file changed
+5
-5
lines changedLines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
| 71 | + | |
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
| 81 | + | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
112 | | - | |
113 | | - | |
| 112 | + | |
| 113 | + | |
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| |||
0 commit comments