|
| 1 | +# Commitment Cost Scheduling for VectorQuantizerEMA |
| 2 | + |
| 3 | +This document describes the commitment cost scheduling feature added to the `VectorQuantizerEMA` class in `foldtree2/src/quantizers.py`. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The commitment cost is a crucial hyperparameter in vector quantization that controls the balance between: |
| 8 | +- **Encoder commitment**: How much the encoder should commit to mapping inputs close to codebook entries |
| 9 | +- **Codebook flexibility**: How much the codebook can adapt to the encoded representations |
| 10 | + |
| 11 | +A **warmup schedule** for the commitment cost can improve training stability and final performance by: |
| 12 | +1. Starting with a low commitment cost to allow the codebook to initialize properly |
| 13 | +2. Gradually increasing it to the target value to encourage encoder commitment |
| 14 | +3. Using a smooth schedule (cosine or linear) to avoid training instability |
| 15 | + |
| 16 | +## New Parameters |
| 17 | + |
| 18 | +### `use_commitment_scheduling` (default: False) |
| 19 | +**Boolean flag to enable or disable commitment cost scheduling.** |
| 20 | + |
| 21 | +- **True**: Enable scheduling with warmup from `commitment_start` to `commitment_end` |
| 22 | +- **False**: Use constant `commitment_cost` throughout training (original behavior) |
| 23 | + |
| 24 | +This flag makes it easy to turn scheduling on/off without changing other parameters. |
| 25 | + |
| 26 | +### `commitment_warmup_steps` (default: 5000) |
| 27 | +The number of training steps over which the commitment cost will be scheduled from `commitment_start` to `commitment_end`. |
| 28 | + |
| 29 | +**Default**: 5000 steps is chosen as a reasonable default that: |
| 30 | +- Allows sufficient warmup for most training scenarios |
| 31 | +- Represents ~250 epochs with batch_size=20 on a 1000-sample dataset |
| 32 | +- Can be adjusted based on your dataset size and training regime |
| 33 | + |
| 34 | +### `commitment_schedule` (default: 'cosine') |
| 35 | +The type of schedule to use for the commitment cost warmup: |
| 36 | +- **'cosine'**: Smooth cosine annealing from start to end (recommended) |
| 37 | +- **'linear'**: Linear interpolation from start to end |
| 38 | +- **'none'**: No scheduling, use final value immediately |
| 39 | + |
| 40 | +### `commitment_start` (default: 0.1) |
| 41 | +The initial commitment cost value at the beginning of training. |
| 42 | + |
| 43 | +**Why 0.1?** Starting with a lower value (compared to typical final values like 0.25-1.0) allows: |
| 44 | +- Codebook to initialize without over-committing the encoder |
| 45 | +- More exploration in the early training phase |
| 46 | +- Smoother convergence |
| 47 | + |
| 48 | +### `commitment_end` (default: None, uses `commitment_cost`) |
| 49 | +The final commitment cost value after warmup completes. If not specified, uses the `commitment_cost` parameter. |
| 50 | + |
| 51 | +## Usage Examples |
| 52 | + |
| 53 | +### Default Behavior (No Scheduling - Original Behavior) |
| 54 | + |
| 55 | +```python |
| 56 | +from foldtree2.src.quantizers import VectorQuantizerEMA |
| 57 | + |
| 58 | +# Default: scheduling is disabled, uses constant commitment cost |
| 59 | +quantizer = VectorQuantizerEMA( |
| 60 | + num_embeddings=512, |
| 61 | + embedding_dim=128, |
| 62 | + commitment_cost=0.25, |
| 63 | + # use_commitment_scheduling=False, # Default - scheduling disabled |
| 64 | +) |
| 65 | +``` |
| 66 | + |
| 67 | +### Enable Scheduling with Defaults |
| 68 | + |
| 69 | +```python |
| 70 | +# Enable scheduling with recommended defaults |
| 71 | +quantizer = VectorQuantizerEMA( |
| 72 | + num_embeddings=512, |
| 73 | + embedding_dim=128, |
| 74 | + commitment_cost=0.25, # This becomes commitment_end |
| 75 | + use_commitment_scheduling=True, # Enable scheduling |
| 76 | + # commitment_warmup_steps=5000, # Default |
| 77 | + # commitment_schedule='cosine', # Default |
| 78 | + # commitment_start=0.1, # Default |
| 79 | +) |
| 80 | +``` |
| 81 | + |
| 82 | +### Custom Warmup Schedule |
| 83 | + |
| 84 | +```python |
| 85 | +# Longer warmup with higher final commitment cost |
| 86 | +quantizer = VectorQuantizerEMA( |
| 87 | + num_embeddings=512, |
| 88 | + embedding_dim=128, |
| 89 | + commitment_cost=0.5, |
| 90 | + use_commitment_scheduling=True, # Enable scheduling |
| 91 | + commitment_warmup_steps=10000, # Longer warmup |
| 92 | + commitment_schedule='cosine', |
| 93 | + commitment_start=0.05, # Start even lower |
| 94 | +) |
| 95 | +``` |
| 96 | + |
| 97 | +### Linear Schedule |
| 98 | + |
| 99 | +```python |
| 100 | +# Linear interpolation instead of cosine |
| 101 | +quantizer = VectorQuantizerEMA( |
| 102 | + num_embeddings=512, |
| 103 | + embedding_dim=128, |
| 104 | + commitment_cost=0.25, |
| 105 | + use_commitment_scheduling=True, # Enable scheduling |
| 106 | + commitment_warmup_steps=5000, |
| 107 | + commitment_schedule='linear', |
| 108 | + commitment_start=0.1, |
| 109 | +) |
| 110 | +``` |
| 111 | + |
| 112 | +### Disable Scheduling (Original Behavior) |
| 113 | + |
| 114 | +```python |
| 115 | +# Two ways to disable scheduling: |
| 116 | + |
| 117 | +# Method 1: Simply don't set the flag (default is False) |
| 118 | +quantizer = VectorQuantizerEMA( |
| 119 | + num_embeddings=512, |
| 120 | + embedding_dim=128, |
| 121 | + commitment_cost=0.25, |
| 122 | +) |
| 123 | + |
| 124 | +# Method 2: Explicitly disable |
| 125 | +quantizer = VectorQuantizerEMA( |
| 126 | + num_embeddings=512, |
| 127 | + embedding_dim=128, |
| 128 | + commitment_cost=0.25, |
| 129 | + use_commitment_scheduling=False, # Explicitly disabled |
| 130 | +) |
| 131 | +``` |
| 132 | + |
| 133 | +### Custom Start and End Values |
| 134 | + |
| 135 | +```python |
| 136 | +# Fine control over start and end values |
| 137 | +quantizer = VectorQuantizerEMA( |
| 138 | + num_embeddings=512, |
| 139 | + embedding_dim=128, |
| 140 | + commitment_cost=0.8, # Ignored if commitment_end is set |
| 141 | + use_commitment_scheduling=True, # Enable scheduling |
| 142 | + commitment_warmup_steps=8000, |
| 143 | + commitment_schedule='cosine', |
| 144 | + commitment_start=0.05, |
| 145 | + commitment_end=0.6, # Explicitly set final value |
| 146 | +) |
| 147 | +``` |
| 148 | + |
| 149 | +## Monitoring During Training |
| 150 | + |
| 151 | +### Get Current Commitment Cost |
| 152 | + |
| 153 | +```python |
| 154 | +# During training loop |
| 155 | +for epoch in range(num_epochs): |
| 156 | + for batch in dataloader: |
| 157 | + z, vq_loss = quantizer(batch) |
| 158 | + |
| 159 | + # Get current commitment cost for logging |
| 160 | + current_cost = quantizer.get_commitment_cost() |
| 161 | + |
| 162 | + # Log to tensorboard or print |
| 163 | + writer.add_scalar('VQ/commitment_cost', current_cost, step) |
| 164 | + print(f"Step {step}, Commitment Cost: {current_cost:.4f}") |
| 165 | +``` |
| 166 | + |
| 167 | +### Reset Schedule (if needed) |
| 168 | + |
| 169 | +```python |
| 170 | +# Reset the schedule to start warmup from beginning |
| 171 | +quantizer.reset_commitment_schedule() |
| 172 | + |
| 173 | +# Useful if you want to: |
| 174 | +# - Restart training with a new warmup |
| 175 | +# - Switch phases in multi-stage training |
| 176 | +``` |
| 177 | + |
| 178 | +## Integration with Training Scripts |
| 179 | + |
| 180 | +### Example: learn_monodecoder.py |
| 181 | + |
| 182 | +```python |
| 183 | +# In encoder initialization |
| 184 | +encoder = ft2.mk1_Encoder( |
| 185 | + in_channels=ndim, |
| 186 | + hidden_channels=[hidden_size, hidden_size], |
| 187 | + out_channels=args.embedding_dim, |
| 188 | + metadata={'edge_types': [('res','contactPoints','res')]}, |
| 189 | + num_embeddings=args.num_embeddings, |
| 190 | + commitment_cost=0.9, # Final target commitment cost |
| 191 | + # Enable scheduling |
| 192 | + use_commitment_scheduling=True, |
| 193 | + commitment_warmup_steps=5000, # 5000 steps warmup |
| 194 | + commitment_schedule='cosine', # Smooth cosine schedule |
| 195 | + commitment_start=0.1, # Start at 10% of final value |
| 196 | + edge_dim=1, |
| 197 | + encoder_hidden=hidden_size, |
| 198 | + EMA=args.EMA, |
| 199 | + nheads=8, |
| 200 | + dropout_p=0.01, |
| 201 | + reset_codes=False, |
| 202 | + flavor='transformer', |
| 203 | + fftin=True |
| 204 | +) |
| 205 | + |
| 206 | +# In training loop - commitment cost updates automatically |
| 207 | +for epoch in range(epochs): |
| 208 | + for batch in train_loader: |
| 209 | + z, vq_loss = encoder(batch) |
| 210 | + |
| 211 | + # Optional: Log commitment cost |
| 212 | + if step % 100 == 0: |
| 213 | + current_cost = encoder.vq_layer.get_commitment_cost() |
| 214 | + writer.add_scalar('VQ/commitment_cost', current_cost, step) |
| 215 | +``` |
| 216 | + |
| 217 | +## Schedule Visualization |
| 218 | + |
| 219 | +### Cosine Schedule (Recommended) |
| 220 | +``` |
| 221 | +Commitment Cost |
| 222 | + | |
| 223 | +1.0 | ▄▀▀▀▀▀▀▀▀▀▀ |
| 224 | +0.9 | ▄▀ |
| 225 | +0.8 | ▄▀ |
| 226 | +0.7 | ▀ |
| 227 | +0.6 | ▄▀ |
| 228 | +0.5 | ▄▀ |
| 229 | +0.4 | ▄▀ |
| 230 | +0.3 | ▄▀ |
| 231 | +0.2 | ▄▀ |
| 232 | +0.1 |▄▀▀▀▀ |
| 233 | + |_________________________ |
| 234 | + 0 1k 2k 3k 4k 5k steps |
| 235 | +``` |
| 236 | + |
| 237 | +**Benefits of Cosine**: |
| 238 | +- Faster initial increase for quick codebook initialization |
| 239 | +- Slower approach to final value for stability |
| 240 | +- Smooth gradients throughout |
| 241 | + |
| 242 | +### Linear Schedule |
| 243 | +``` |
| 244 | +Commitment Cost |
| 245 | + | |
| 246 | +1.0 | ▄ |
| 247 | +0.9 | ▄▀ |
| 248 | +0.8 | ▄▀ |
| 249 | +0.7 | ▄▀ |
| 250 | +0.6 | ▄▀ |
| 251 | +0.5 | ▄▀ |
| 252 | +0.4 | ▄▀ |
| 253 | +0.3 | ▄▀ |
| 254 | +0.2 | ▄▀ |
| 255 | +0.1 |▄▄▄▄▀ |
| 256 | + |_________________________ |
| 257 | + 0 1k 2k 3k 4k 5k steps |
| 258 | +``` |
| 259 | + |
| 260 | +## Recommended Settings by Dataset Size |
| 261 | + |
| 262 | +### Small Dataset (< 1000 samples) |
| 263 | +```python |
| 264 | +use_commitment_scheduling=True |
| 265 | +commitment_warmup_steps=2000 # Shorter warmup |
| 266 | +commitment_start=0.1 |
| 267 | +commitment_cost=0.25 # Moderate final value |
| 268 | +commitment_schedule='cosine' |
| 269 | +``` |
| 270 | + |
| 271 | +### Medium Dataset (1000-10000 samples) |
| 272 | +```python |
| 273 | +use_commitment_scheduling=True |
| 274 | +commitment_warmup_steps=5000 # Default - good balance |
| 275 | +commitment_start=0.1 |
| 276 | +commitment_cost=0.5 # Higher commitment |
| 277 | +commitment_schedule='cosine' |
| 278 | +``` |
| 279 | + |
| 280 | +### Large Dataset (> 10000 samples) |
| 281 | +```python |
| 282 | +use_commitment_scheduling=True |
| 283 | +commitment_warmup_steps=10000 # Longer warmup |
| 284 | +commitment_start=0.05 # Start lower |
| 285 | +commitment_cost=0.8 # Strong commitment |
| 286 | +commitment_schedule='cosine' |
| 287 | +``` |
| 288 | + |
| 289 | +## Troubleshooting |
| 290 | + |
| 291 | +### Codebook Collapse (Many Unused Codes) |
| 292 | +- **Increase** `commitment_warmup_steps` (e.g., 10000) |
| 293 | +- **Decrease** `commitment_start` (e.g., 0.05) |
| 294 | +- Use `commitment_schedule='cosine'` for smoother warmup |
| 295 | + |
| 296 | +### Poor Reconstruction Quality |
| 297 | +- **Decrease** `commitment_start` to allow more encoder flexibility early on |
| 298 | +- **Increase** final `commitment_cost` for stronger commitment |
| 299 | +- Monitor commitment cost curves - ensure it reaches final value |
| 300 | + |
| 301 | +### Training Instability |
| 302 | +- **Increase** `commitment_warmup_steps` for more gradual change |
| 303 | +- Use `commitment_schedule='cosine'` instead of 'linear' |
| 304 | +- Start with lower `commitment_start` value |
| 305 | + |
| 306 | +## Technical Details |
| 307 | + |
| 308 | +### Schedule Formulas |
| 309 | + |
| 310 | +**Cosine Schedule**: |
| 311 | +```python |
| 312 | +c = 0.5 * (1 + cos(π * t / T)) |
| 313 | +commitment_cost = end + (start - end) * c |
| 314 | +``` |
| 315 | +where: |
| 316 | +- `t` = current training step |
| 317 | +- `T` = total warmup steps |
| 318 | +- `start` = initial commitment cost |
| 319 | +- `end` = final commitment cost |
| 320 | + |
| 321 | +**Linear Schedule**: |
| 322 | +```python |
| 323 | +commitment_cost = start + (end - start) * (t / T) |
| 324 | +``` |
| 325 | + |
| 326 | +### Automatic Updates |
| 327 | + |
| 328 | +The commitment cost is updated automatically during the forward pass when both `self.training == True` and `self.use_commitment_scheduling == True`: |
| 329 | + |
| 330 | +```python |
| 331 | +def forward(self, x): |
| 332 | + if self.training and self.use_commitment_scheduling: |
| 333 | + self.update_commitment_cost() |
| 334 | + self.current_step += 1 |
| 335 | + # ... rest of forward pass |
| 336 | +``` |
| 337 | + |
| 338 | +This ensures the schedule progresses with each training batch only when scheduling is enabled. |
| 339 | + |
| 340 | +## References |
| 341 | + |
| 342 | +The commitment cost scheduling is inspired by: |
| 343 | +- Warmup strategies in transformer training (learning rate warmup) |
| 344 | +- Cosine annealing schedules for training stability |
| 345 | +- Best practices in VQ-VAE training |
| 346 | + |
| 347 | +## Backward Compatibility |
| 348 | + |
| 349 | +The new parameters have sensible defaults that maintain **exact** original behavior: |
| 350 | +- **`use_commitment_scheduling=False` by default**: Scheduling is opt-in, not enabled by default |
| 351 | +- When scheduling is disabled, uses constant `commitment_cost` throughout training |
| 352 | +- No changes to existing code needed - fully backward compatible |
| 353 | + |
| 354 | +To enable the new scheduling feature, simply add: |
| 355 | +```python |
| 356 | +use_commitment_scheduling=True |
| 357 | +``` |
| 358 | + |
| 359 | +All other scheduling parameters have reasonable defaults: |
| 360 | +- Default warmup of 5000 steps |
| 361 | +- Cosine schedule for smooth, stable training |
| 362 | +- Starting at 0.1 allows proper initialization |
0 commit comments