Skip to content

Commit b23f6f3

Browse files
authored
Create UPDATE_0.2.0.md
1 parent 2c983cc commit b23f6f3

File tree

1 file changed

+79
-0
lines changed

1 file changed

+79
-0
lines changed

docs/UPDATE_0.2.0.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Update Log 0.2.0
2+
3+
## What's New
4+
5+
### 1. Added an `Optimizer Manager` to support various optimizer algorithms.
6+
7+
Before 0.2.0, the `optimizer` was strongly coupled to the "loss scaler". This results in users cannot use multiple optimizers at the same time when training model in fp16.
8+
9+
**======= Before 0.2.0 =======**
10+
11+
```python
12+
for iteration in range(1000):
13+
# zero grad
14+
optimizer.zero_grad()
15+
16+
# ...
17+
# loss scale and backward
18+
loss = optimizer.loss_scale(loss)
19+
loss.backward()
20+
21+
# optimizer step
22+
bmtrain.optim_step(optimizer, lr_scheduler)
23+
```
24+
25+
The `bmtrain.optim_step` allows only one `optimizer` and at most one `lr_schduler`, which cannot handle some more complex scenarios.
26+
27+
28+
**======= After 0.2.0 =======**
29+
30+
```python
31+
# create a new instance of optimizer manager
32+
optim_manager = bmtrain.optim.OptimManager(loss_scale=1024)
33+
# let optim_manager handle all the optimizer and (optional) their corresponding lr_scheduler
34+
optim_manager.add_optimizer(optimizer, lr_scheduler)
35+
# add_optimizer can be called multiple times to add other optimizers.
36+
37+
for iteration in range(1000):
38+
# zero grad
39+
optim_manager.zero_grad() # calling zero_grad for each optimizer
40+
41+
# ...
42+
# loss scale and backward
43+
optim_manager.backward()
44+
45+
# optimizer step
46+
optim_manager.step()
47+
```
48+
49+
Starting from BMTrain 0.2.0, we provide "OptimManager" to manage optimizers and loss scales.
50+
`OptimManager` supports managing multiple optimizers and lr_schedulers at the same time, and allows setting the loss scale independently.
51+
`OptimManager` can also manage pytorch native optimizers, such as SGD, AdamW, etc.
52+
53+
### 2. Pipeline Parallelism
54+
55+
In this version, BMTrain has added a new kind of parallel algorithm: pipeline parallelism.
56+
To enable pipeline parallelism, one line of code needs to be modified.
57+
58+
**======= ZeRO =======**
59+
```python
60+
layers = bmt.TransformerBlockList([
61+
# ...
62+
])
63+
```
64+
65+
**======= Pipeline =======**
66+
```python
67+
layers = bmt.PipelineTransformerBlockList([
68+
# ...
69+
])
70+
```
71+
72+
Replacing TransformerBlockList with PipelineTransformerBlockList allows the parallel algorithm to switch from ZeRO to pipeline parallelism.
73+
The number of stages in the pipeline can be set by passing the `pipe_size` parameter to bmtrain.init_distributed.
74+
75+
### 3. Others
76+
77+
* Supports BF16.
78+
* Tensors recorded in inspector supports backward propagation.
79+
* Adds new tests.

0 commit comments

Comments
 (0)