Skip to content

Commit 8cdeec0

Browse files
committed
add some examples and check the spelling
1 parent 6c1e961 commit 8cdeec0

File tree

1 file changed

+22
-17
lines changed

1 file changed

+22
-17
lines changed

asset/basic_training.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Basic Training
22
## config
3-
You may want to load your own configurations in equivalent ways:
3+
You may want to load your configurations in equivalent ways:
44
* cmd
55
* config files
66
* yaml
@@ -26,33 +26,33 @@ You can also modify configurations through the local files:
2626
python run_textbox.py ... --config_files <config-file-one> <config-file-two>
2727
```
2828

29-
Every config file is an additional yaml files like:
29+
Every config file is an additional yaml file like:
3030

3131
```yaml
3232
efficient_methods: ['prompt-tuning']
3333
```
34-
It's suitable for **a large number of** modifications or **long-term** modification with cmd like:
34+
It's suitable for **a large number of** modifications or **long-term** modifications with cmd like:
3535
* ``efficient_methods``
3636
* ``efficient_kwargs``
3737
* ...
3838
3939
### yaml
4040
41-
The original configurations are in the yaml files. You can check the values there, but it's not recommended to modify the files except for **permanently** modification the dataset. These files are in the path ``textbox\properties``:
41+
The original configurations are in the yaml files. You can check the values there, but it's not recommended to modify the files except for **permanent** modification of the dataset. These files are in the path ``textbox\properties``:
4242
* ``overall.yaml``
4343
* ``dataset\*.yaml``
4444
* ``model\*yaml``
4545
4646
4747
## trainer
4848
49-
You can choose optimizer and scheduler through `optimizer=<optimizer-name>` and `scheduler=<scheduler-name>`. We provide a wrapper around **pytorch optimizer**, which means parameters like `epsilon` or `warmup_steps` can be specified with keyword dictionaries `optimizer_kwargs={'epsilon': ... }` and `scheduler_kwargs={'warmup_steps': ... }`. See [pytorch optimizer](https://pytorch.org/docs/stable/optim.html#algorithms) and scheduler for a complete tutorial. <!-- TODO -->
49+
You can choose an optimizer and scheduler through `optimizer=<optimizer-name>` and `scheduler=<scheduler-name>`. We provide a wrapper around **pytorch optimizer**, which means parameters like `epsilon` or `warmup_steps` can be specified with keyword dictionaries `optimizer_kwargs={'epsilon': ... }` and `scheduler_kwargs={'warmup_steps': ... }`. See [pytorch optimizer](https://pytorch.org/docs/stable/optim.html#algorithms) and scheduler for a complete tutorial. <!-- TODO -->
5050

51-
Validation frequency is introduced to validate the model **at each specific batch-steps or epochs**. Specify `valid_strategy` (either `'step'` or `'epoch'`) and `valid_steps=<int>` to adjust the pace. Specifically, traditional train-validate paradigm is a special case with `valid_strategy=epoch` and `valid_steps=1`.
51+
Validation frequency is introduced to validate the model **at each specific batch-steps or epoch**. Specify `valid_strategy` (either `'step'` or `'epoch'`) and `valid_steps=<int>` to adjust the pace. Specifically, the traditional train-validate paradigm is a special case with `valid_strategy=epoch` and `valid_steps=1`.
5252

53-
`max_save=<int>` indicates **the maximal amount of saved files** (checkpoint and generated corpus during evaluation). `-1`: save every file, `0`: do not save any file, `1`: only save the file with best score, and `n`: save both the best and the last $n−1$ files.
53+
`max_save=<int>` indicates **the maximal amount of saved files** (checkpoint and generated corpus during evaluation). `-1`: save every file, `0`: do not save any file, `1`: only save the file with the best score, and `n`: save both the best and the last $n−1$ files.
5454

55-
According to ``metrics_for_best_model``, thr score of current checkpoint will be calculated, and evaluatin metrics specified with ``metrics``([full list](evaluation.md)) will be chosen. **Early stopping** can be configured with `stopping_steps=<int>` and score of every checkpoint.
55+
According to ``metrics_for_best_model``, the score of the current checkpoint will be calculated, and evaluation metrics specified with ``metrics``([full list](evaluation.md)) will be chosen. **Early stopping** can be configured with `stopping_steps=<int>` and score of every checkpoint.
5656

5757

5858
```bash
@@ -61,35 +61,40 @@ python run_textbox.py ... --stopping_steps=8 \\
6161
--metrics=\[\'rouge\'\]
6262
```
6363

64-
You can resume from a **previous checkpoint** through ``model_path=<checkpoint_path>``.When you want to restrore **all trainer parameters** like optimizer and start_epoch, you can set ``resume_training=True``. Otherwise, only **model and tokenizer** will be loaded.
64+
You can resume from a **previous checkpoint** through ``model_path=<checkpoint_path>``.When you want to restore **all trainer parameters** like optimizer and start_epoch, you can set ``resume_training=True``. Otherwise, only **model and tokenizer** will be loaded. The script below will resume training from checkpoint in the path ``saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best``
6565

66-
Other commonly used parameters includes `epochs=<int>` and `max_steps=<int>` (indicating maximum iteration of epochs and batch steps, if you set `max_steps`, `epochs` will be invalid), `learning_rate=<float>`, `train_batch_size=<int>`, `weight_decay=<bool>`, and `grad_clip=<bool>`.
66+
```bash
67+
python run_textbox --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
68+
--resume_training=True
69+
```
70+
71+
Other commonly used parameters include `epochs=<int>` and `max_steps=<int>` (indicating maximum iteration of epochs and batch steps, if you set `max_steps`, `epochs` will be invalid), `learning_rate=<float>`, `train_batch_size=<int>`, `weight_decay=<bool>`, and `grad_clip=<bool>`.
6772

6873
### Partial Experiment
6974

70-
You can run partial experiment with `do_train`, `do_valid`, `do_test`. You can test your pipeline and debug with `quick_test=<amount-of-data-to-load>` to load just a few examples.
75+
You can run the partial experiment with `do_train`, `do_valid`and `do_test`. You can test your pipeline and debug with `quick_test=<amount-of-data-to-load>` to load just a few examples.
7176

72-
The following script loads the trained model from path `example` and conducts generation and evaluation without training and evaluation.
77+
The following script loads the trained model from a local path and conducts generation and evaluation without training and evaluation.
7378
```bash
74-
python run_textbox.py ... --do_train=False --do_valid=False \\
75-
--model_path=example --quick_test=16
79+
python run_textbox.py --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
80+
--do_train=False --do_valid=False
7681
```
7782

7883
## wandb
7984

80-
If you are running your code in jupyter environments, you may want to login by simply setting an environment variable (your key may be stored in plain text):
85+
If you are running your code in jupyter environments, you may want to log in by simply setting an environment variable (your key may be stored in plain text):
8186

8287
```python
8388
%env WANDB_API_KEY=<your-key>
8489
```
8590
Here you can set wandb with `wandb`.
8691

87-
If you are debugging your model, you may want to **disable W&B** with `--wandb=disabled` and **none of the metrics** will be recorded.You can also disable **sync only** with `--wandb=offline` and enable it again with `--wandb=online` to upload to the cloud. Meanwhile, the parameter can be configured in the yaml file like:
92+
If you are debugging your model, you may want to **disable W&B** with `--wandb=disabled`, and **none of the metrics** will be recorded. You can also disable **sync only** with `--wandb=offline` and enable it again with `--wandb=online` to upload to the cloud. Meanwhile, the parameter can be configured in the yaml file like:
8893

8994
```yaml
9095
wandb: online
9196
```
9297

9398
The local files can be uploaded by executing `wandb sync` in the command line.
9499

95-
After configuration, you can throttle wandb prompts by defining environment variable `export WANDB_SILENT=false`. For more information, see [documentation](docs.wandb.ai).
100+
After configuration, you can throttle wandb prompts by defining the environment variable `export WANDB_SILENT=false`. For more information, see [documentation](docs.wandb.ai).

0 commit comments

Comments
 (0)