Commit ff414c6
committed
feat: add distillation-specific checkpoint management
Add dedicated checkpoint management for knowledge distillation training without
polluting IFullModel with additional interface requirements.
## Design Decision
**Question**: Should we add ICheckpointableModel to IFullModel?
**Answer**: No - create specialized checkpoint manager instead
**Reasoning**:
1. IFullModel already inherits 7 interfaces - adding more violates Interface Segregation
2. Not all models support checkpointing (stateless models, non-serializable state)
3. IModelSerializer already provides similar file-based functionality
4. Distillation-specific needs (curriculum state, teacher/student pairs) don't belong in IFullModel
## New Components
### DistillationCheckpointConfig
Configuration for checkpoint management:
- Save frequency (every N epochs or batches)
- Keep best N checkpoints (automatic pruning)
- Which models to save (teacher, student, or both)
- Curriculum state preservation
- Metric tracking (validation loss, accuracy, etc.)
### DistillationCheckpointManager<T>
Handles checkpoint operations:
- Automatic saving based on schedule
- Best checkpoint selection by metric
- Multi-stage distillation support (student → teacher)
- Curriculum learning state management
- Training resumption after interruptions
- Batch-level checkpointing for long epochs
### Key Features
1. **Automatic Checkpoint Pruning**
- Keep only best N checkpoints based on validation metric
- Save disk space while preserving important models
2. **Multi-Stage Distillation**
- Save student from Stage 1
- Load as teacher for Stage 2
- Support progressive compression (large → medium → small)
3. **Curriculum State Preservation**
- Save curriculum progress with checkpoint
- Resume from correct curriculum stage
- Don't restart from "easy samples"
4. **Flexible Checkpointing**
- Epoch-based or batch-based
- Stream-based via ICheckpointableModel
- File-based via IModelSerializer
- Custom metrics for best checkpoint selection
## Usage Example
```csharp
var config = new DistillationCheckpointConfig
{
CheckpointDirectory = "./checkpoints",
SaveEveryEpochs = 5,
KeepBestN = 3,
SaveStudent = true
};
var manager = new DistillationCheckpointManager<double>(config);
for (int epoch = 0; epoch < 100; epoch++)
{
// Train...
double validationLoss = Evaluate(student);
manager.SaveCheckpointIfNeeded(
epoch: epoch,
student: student,
metrics: new Dictionary<string, double> { { "validation_loss", validationLoss } }
);
}
// Load best checkpoint
manager.LoadBestCheckpoint(student);
```
## Documentation
Added CHECKPOINTING_GUIDE.md with:
- Why checkpointing matters for distillation
- Architecture overview
- 5 detailed usage examples
- Best practices
- Common patterns (early stopping, multi-stage, resumption)
## Benefits
- ✅ Separation of concerns (checkpointing separate from model interface)
- ✅ Distillation-specific features (curriculum state, teacher/student pairs)
- ✅ Production-ready (automatic pruning, resumption, multi-stage support)
- ✅ Flexible (works with any ICheckpointableModel)
- ✅ Doesn't pollute IFullModel interface
Resolves checkpoint management requirements for knowledge distillation1 parent 949087f commit ff414c6
File tree
3 files changed
+1077
-0
lines changed- src/KnowledgeDistillation
3 files changed
+1077
-0
lines changed
0 commit comments