You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: resolve SelfDistillationTrainer cached prediction alignment with shuffled batches
CRITICAL FIX: The previous implementation used array indices to look up cached teacher
predictions, but TrainBatch receives batch-local indices (0 to batchSize-1), not global
dataset indices. After data shuffling, this caused the student to learn from incorrect
teacher outputs, making self-distillation ineffective or harmful.
Changes:
- Changed _cachedTeacherPredictions from Vector<Vector<T>> to Dictionary<Vector<T>, Vector<T>>
- Use ReferenceEqualityComparer to map input instances to their cached predictions
- GetTeacherPredictions now looks up by input reference (not index) to handle shuffled data
- Updated EMA blending logic to work with dictionary structure
This ensures that regardless of data shuffling, each input sample is matched with its
correct cached teacher prediction from the previous generation.
Addresses: Code review comment from @coderabbitai on SelfDistillationTrainer.cs:114-124
0 commit comments