|
1 | 1 | # Callbacks |
2 | 2 |
|
3 | | -Megatron Bridge provides a lightweight callback system for injecting custom logic into the training and evaluation loop without modifying framework code. This is ideal for: |
4 | | - |
5 | | -- Proprietary integrations |
6 | | -- Custom logging and metrics tracking |
7 | | -- Company-specific monitoring |
8 | | -- Infrastructure heartbeats |
9 | | -- Experiment tracking |
| 3 | +Megatron Bridge provides a lightweight callback system for injecting custom logic into the training and evaluation loop without modifying framework code. This is ideal for propietary integrations or custom logging and metrics tracking. |
10 | 4 |
|
11 | 5 | ## Quick Start |
12 | 6 |
|
@@ -48,10 +42,10 @@ def log_step(context): |
48 | 42 | if context.loss_dict: |
49 | 43 | print(f"Step {step}: {context.loss_dict}") |
50 | 44 |
|
51 | | -manager = CallbackManager() |
52 | | -manager.register("on_train_step_end", log_step) |
| 45 | +callback_manager = CallbackManager() |
| 46 | +callback_manager.register("on_train_step_end", log_step) |
53 | 47 |
|
54 | | -pretrain(config, forward_step_func, callbacks=manager) |
| 48 | +pretrain(config, forward_step_func, callbacks=callback_manager) |
55 | 49 | ``` |
56 | 50 |
|
57 | 51 | ### Mixing Both Patterns |
@@ -136,7 +130,7 @@ class StepCounterCallback(Callback): |
136 | 130 |
|
137 | 131 | ## Distributed Training |
138 | 132 |
|
139 | | -Callbacks fire on **all ranks** without framework-level synchronization. If your callback should only run on specific ranks, add rank guards: |
| 133 | +Callbacks fire on **all ranks** without framework-level synchronization. If your callback should only run on specific ranks, add guards: |
140 | 134 |
|
141 | 135 | ```python |
142 | 136 | import torch.distributed as dist |
@@ -199,8 +193,6 @@ The callback system follows these principles: |
199 | 193 |
|
200 | 194 | 3. **Safety**: Callbacks receive framework state but modifying it is at the user's own risk. The framework makes no guarantees about the effects of modifications. |
201 | 195 |
|
202 | | -4. **Simplicity**: No priority ordering, no control flow back to the training loop, no framework-level exception handling. Callbacks are purely additive. |
203 | | - |
204 | 196 | ## Examples |
205 | 197 |
|
206 | 198 | ### Proprietary Metrics |
|
0 commit comments