Skip to content

Commit e779927

Browse files
author
dmoi
committed
mixed precision and update yaml
1 parent 2ea0689 commit e779927

30 files changed

+10299
-953
lines changed

COMMITMENT_COST_SCHEDULING.md

Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# Commitment Cost Scheduling for VectorQuantizerEMA
2+
3+
This document describes the commitment cost scheduling feature added to the `VectorQuantizerEMA` class in `foldtree2/src/quantizers.py`.
4+
5+
## Overview
6+
7+
The commitment cost is a crucial hyperparameter in vector quantization that controls the balance between:
8+
- **Encoder commitment**: How much the encoder should commit to mapping inputs close to codebook entries
9+
- **Codebook flexibility**: How much the codebook can adapt to the encoded representations
10+
11+
A **warmup schedule** for the commitment cost can improve training stability and final performance by:
12+
1. Starting with a low commitment cost to allow the codebook to initialize properly
13+
2. Gradually increasing it to the target value to encourage encoder commitment
14+
3. Using a smooth schedule (cosine or linear) to avoid training instability
15+
16+
## New Parameters
17+
18+
### `use_commitment_scheduling` (default: False)
19+
**Boolean flag to enable or disable commitment cost scheduling.**
20+
21+
- **True**: Enable scheduling with warmup from `commitment_start` to `commitment_end`
22+
- **False**: Use constant `commitment_cost` throughout training (original behavior)
23+
24+
This flag makes it easy to turn scheduling on/off without changing other parameters.
25+
26+
### `commitment_warmup_steps` (default: 5000)
27+
The number of training steps over which the commitment cost will be scheduled from `commitment_start` to `commitment_end`.
28+
29+
**Default**: 5000 steps is chosen as a reasonable default that:
30+
- Allows sufficient warmup for most training scenarios
31+
- Represents ~250 epochs with batch_size=20 on a 1000-sample dataset
32+
- Can be adjusted based on your dataset size and training regime
33+
34+
### `commitment_schedule` (default: 'cosine')
35+
The type of schedule to use for the commitment cost warmup:
36+
- **'cosine'**: Smooth cosine annealing from start to end (recommended)
37+
- **'linear'**: Linear interpolation from start to end
38+
- **'none'**: No scheduling, use final value immediately
39+
40+
### `commitment_start` (default: 0.1)
41+
The initial commitment cost value at the beginning of training.
42+
43+
**Why 0.1?** Starting with a lower value (compared to typical final values like 0.25-1.0) allows:
44+
- Codebook to initialize without over-committing the encoder
45+
- More exploration in the early training phase
46+
- Smoother convergence
47+
48+
### `commitment_end` (default: None, uses `commitment_cost`)
49+
The final commitment cost value after warmup completes. If not specified, uses the `commitment_cost` parameter.
50+
51+
## Usage Examples
52+
53+
### Default Behavior (No Scheduling - Original Behavior)
54+
55+
```python
56+
from foldtree2.src.quantizers import VectorQuantizerEMA
57+
58+
# Default: scheduling is disabled, uses constant commitment cost
59+
quantizer = VectorQuantizerEMA(
60+
num_embeddings=512,
61+
embedding_dim=128,
62+
commitment_cost=0.25,
63+
# use_commitment_scheduling=False, # Default - scheduling disabled
64+
)
65+
```
66+
67+
### Enable Scheduling with Defaults
68+
69+
```python
70+
# Enable scheduling with recommended defaults
71+
quantizer = VectorQuantizerEMA(
72+
num_embeddings=512,
73+
embedding_dim=128,
74+
commitment_cost=0.25, # This becomes commitment_end
75+
use_commitment_scheduling=True, # Enable scheduling
76+
# commitment_warmup_steps=5000, # Default
77+
# commitment_schedule='cosine', # Default
78+
# commitment_start=0.1, # Default
79+
)
80+
```
81+
82+
### Custom Warmup Schedule
83+
84+
```python
85+
# Longer warmup with higher final commitment cost
86+
quantizer = VectorQuantizerEMA(
87+
num_embeddings=512,
88+
embedding_dim=128,
89+
commitment_cost=0.5,
90+
use_commitment_scheduling=True, # Enable scheduling
91+
commitment_warmup_steps=10000, # Longer warmup
92+
commitment_schedule='cosine',
93+
commitment_start=0.05, # Start even lower
94+
)
95+
```
96+
97+
### Linear Schedule
98+
99+
```python
100+
# Linear interpolation instead of cosine
101+
quantizer = VectorQuantizerEMA(
102+
num_embeddings=512,
103+
embedding_dim=128,
104+
commitment_cost=0.25,
105+
use_commitment_scheduling=True, # Enable scheduling
106+
commitment_warmup_steps=5000,
107+
commitment_schedule='linear',
108+
commitment_start=0.1,
109+
)
110+
```
111+
112+
### Disable Scheduling (Original Behavior)
113+
114+
```python
115+
# Two ways to disable scheduling:
116+
117+
# Method 1: Simply don't set the flag (default is False)
118+
quantizer = VectorQuantizerEMA(
119+
num_embeddings=512,
120+
embedding_dim=128,
121+
commitment_cost=0.25,
122+
)
123+
124+
# Method 2: Explicitly disable
125+
quantizer = VectorQuantizerEMA(
126+
num_embeddings=512,
127+
embedding_dim=128,
128+
commitment_cost=0.25,
129+
use_commitment_scheduling=False, # Explicitly disabled
130+
)
131+
```
132+
133+
### Custom Start and End Values
134+
135+
```python
136+
# Fine control over start and end values
137+
quantizer = VectorQuantizerEMA(
138+
num_embeddings=512,
139+
embedding_dim=128,
140+
commitment_cost=0.8, # Ignored if commitment_end is set
141+
use_commitment_scheduling=True, # Enable scheduling
142+
commitment_warmup_steps=8000,
143+
commitment_schedule='cosine',
144+
commitment_start=0.05,
145+
commitment_end=0.6, # Explicitly set final value
146+
)
147+
```
148+
149+
## Monitoring During Training
150+
151+
### Get Current Commitment Cost
152+
153+
```python
154+
# During training loop
155+
for epoch in range(num_epochs):
156+
for batch in dataloader:
157+
z, vq_loss = quantizer(batch)
158+
159+
# Get current commitment cost for logging
160+
current_cost = quantizer.get_commitment_cost()
161+
162+
# Log to tensorboard or print
163+
writer.add_scalar('VQ/commitment_cost', current_cost, step)
164+
print(f"Step {step}, Commitment Cost: {current_cost:.4f}")
165+
```
166+
167+
### Reset Schedule (if needed)
168+
169+
```python
170+
# Reset the schedule to start warmup from beginning
171+
quantizer.reset_commitment_schedule()
172+
173+
# Useful if you want to:
174+
# - Restart training with a new warmup
175+
# - Switch phases in multi-stage training
176+
```
177+
178+
## Integration with Training Scripts
179+
180+
### Example: learn_monodecoder.py
181+
182+
```python
183+
# In encoder initialization
184+
encoder = ft2.mk1_Encoder(
185+
in_channels=ndim,
186+
hidden_channels=[hidden_size, hidden_size],
187+
out_channels=args.embedding_dim,
188+
metadata={'edge_types': [('res','contactPoints','res')]},
189+
num_embeddings=args.num_embeddings,
190+
commitment_cost=0.9, # Final target commitment cost
191+
# Enable scheduling
192+
use_commitment_scheduling=True,
193+
commitment_warmup_steps=5000, # 5000 steps warmup
194+
commitment_schedule='cosine', # Smooth cosine schedule
195+
commitment_start=0.1, # Start at 10% of final value
196+
edge_dim=1,
197+
encoder_hidden=hidden_size,
198+
EMA=args.EMA,
199+
nheads=8,
200+
dropout_p=0.01,
201+
reset_codes=False,
202+
flavor='transformer',
203+
fftin=True
204+
)
205+
206+
# In training loop - commitment cost updates automatically
207+
for epoch in range(epochs):
208+
for batch in train_loader:
209+
z, vq_loss = encoder(batch)
210+
211+
# Optional: Log commitment cost
212+
if step % 100 == 0:
213+
current_cost = encoder.vq_layer.get_commitment_cost()
214+
writer.add_scalar('VQ/commitment_cost', current_cost, step)
215+
```
216+
217+
## Schedule Visualization
218+
219+
### Cosine Schedule (Recommended)
220+
```
221+
Commitment Cost
222+
|
223+
1.0 | ▄▀▀▀▀▀▀▀▀▀▀
224+
0.9 | ▄▀
225+
0.8 | ▄▀
226+
0.7 | ▀
227+
0.6 | ▄▀
228+
0.5 | ▄▀
229+
0.4 | ▄▀
230+
0.3 | ▄▀
231+
0.2 | ▄▀
232+
0.1 |▄▀▀▀▀
233+
|_________________________
234+
0 1k 2k 3k 4k 5k steps
235+
```
236+
237+
**Benefits of Cosine**:
238+
- Faster initial increase for quick codebook initialization
239+
- Slower approach to final value for stability
240+
- Smooth gradients throughout
241+
242+
### Linear Schedule
243+
```
244+
Commitment Cost
245+
|
246+
1.0 | ▄
247+
0.9 | ▄▀
248+
0.8 | ▄▀
249+
0.7 | ▄▀
250+
0.6 | ▄▀
251+
0.5 | ▄▀
252+
0.4 | ▄▀
253+
0.3 | ▄▀
254+
0.2 | ▄▀
255+
0.1 |▄▄▄▄▀
256+
|_________________________
257+
0 1k 2k 3k 4k 5k steps
258+
```
259+
260+
## Recommended Settings by Dataset Size
261+
262+
### Small Dataset (< 1000 samples)
263+
```python
264+
use_commitment_scheduling=True
265+
commitment_warmup_steps=2000 # Shorter warmup
266+
commitment_start=0.1
267+
commitment_cost=0.25 # Moderate final value
268+
commitment_schedule='cosine'
269+
```
270+
271+
### Medium Dataset (1000-10000 samples)
272+
```python
273+
use_commitment_scheduling=True
274+
commitment_warmup_steps=5000 # Default - good balance
275+
commitment_start=0.1
276+
commitment_cost=0.5 # Higher commitment
277+
commitment_schedule='cosine'
278+
```
279+
280+
### Large Dataset (> 10000 samples)
281+
```python
282+
use_commitment_scheduling=True
283+
commitment_warmup_steps=10000 # Longer warmup
284+
commitment_start=0.05 # Start lower
285+
commitment_cost=0.8 # Strong commitment
286+
commitment_schedule='cosine'
287+
```
288+
289+
## Troubleshooting
290+
291+
### Codebook Collapse (Many Unused Codes)
292+
- **Increase** `commitment_warmup_steps` (e.g., 10000)
293+
- **Decrease** `commitment_start` (e.g., 0.05)
294+
- Use `commitment_schedule='cosine'` for smoother warmup
295+
296+
### Poor Reconstruction Quality
297+
- **Decrease** `commitment_start` to allow more encoder flexibility early on
298+
- **Increase** final `commitment_cost` for stronger commitment
299+
- Monitor commitment cost curves - ensure it reaches final value
300+
301+
### Training Instability
302+
- **Increase** `commitment_warmup_steps` for more gradual change
303+
- Use `commitment_schedule='cosine'` instead of 'linear'
304+
- Start with lower `commitment_start` value
305+
306+
## Technical Details
307+
308+
### Schedule Formulas
309+
310+
**Cosine Schedule**:
311+
```python
312+
c = 0.5 * (1 + cos(π * t / T))
313+
commitment_cost = end + (start - end) * c
314+
```
315+
where:
316+
- `t` = current training step
317+
- `T` = total warmup steps
318+
- `start` = initial commitment cost
319+
- `end` = final commitment cost
320+
321+
**Linear Schedule**:
322+
```python
323+
commitment_cost = start + (end - start) * (t / T)
324+
```
325+
326+
### Automatic Updates
327+
328+
The commitment cost is updated automatically during the forward pass when both `self.training == True` and `self.use_commitment_scheduling == True`:
329+
330+
```python
331+
def forward(self, x):
332+
if self.training and self.use_commitment_scheduling:
333+
self.update_commitment_cost()
334+
self.current_step += 1
335+
# ... rest of forward pass
336+
```
337+
338+
This ensures the schedule progresses with each training batch only when scheduling is enabled.
339+
340+
## References
341+
342+
The commitment cost scheduling is inspired by:
343+
- Warmup strategies in transformer training (learning rate warmup)
344+
- Cosine annealing schedules for training stability
345+
- Best practices in VQ-VAE training
346+
347+
## Backward Compatibility
348+
349+
The new parameters have sensible defaults that maintain **exact** original behavior:
350+
- **`use_commitment_scheduling=False` by default**: Scheduling is opt-in, not enabled by default
351+
- When scheduling is disabled, uses constant `commitment_cost` throughout training
352+
- No changes to existing code needed - fully backward compatible
353+
354+
To enable the new scheduling feature, simply add:
355+
```python
356+
use_commitment_scheduling=True
357+
```
358+
359+
All other scheduling parameters have reasonable defaults:
360+
- Default warmup of 5000 steps
361+
- Cosine schedule for smooth, stable training
362+
- Starting at 0.1 allows proper initialization

0 commit comments

Comments
 (0)