You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Monitoring configurations (e.g., WandB, TensorBoard or MLFlow)
39
39
...
40
40
service:
41
41
# Services to use
@@ -48,10 +48,12 @@ log:
48
48
...
49
49
```
50
50
51
-
Each of these sections will be explained in detail below.
51
+
Each of these sections will be explained in detail below. For additional details about specific parameters not covered here, please refer to the [source code](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/common/config.py).
52
52
53
-
```{note}
54
-
For additional details about specific parameters not covered here, please refer to the [source code](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/common/config.py).
53
+
```{tip}
54
+
Trinity-RFT uses [OmegaConf](https://omegaconf.readthedocs.io/en/latest/) to load YAML configuration files.
55
+
It supports some advanced features like [variable interpolation](https://omegaconf.readthedocs.io/en/latest/usage.html#variable-interpolation) and [environment variable substitution](https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html#oc-env).
56
+
Users can use these features to simplify configuration.
55
57
```
56
58
57
59
---
@@ -64,7 +66,7 @@ These are general settings that apply to the entire experiment.
64
66
project: Trinity-RFT
65
67
name: example
66
68
mode: both
67
-
checkpoint_root_dir: /PATH/TO/CHECKPOINT
69
+
checkpoint_root_dir: ${oc.env:CHECKPOINT_ROOT_DIR} # CHECKPOINT_ROOT_DIR is an environment variable set in advance
68
70
```
69
71
70
72
- `project`: The name of the project.
@@ -115,13 +117,25 @@ Used to log training metrics during execution.
115
117
```yaml
116
118
monitor:
117
119
monitor_type: wandb
120
+
monitor_args:
121
+
base_url: http://localhost:8080
122
+
api_key: your_api_key
118
123
enable_ray_timeline: False
119
124
```
120
125
121
126
- `monitor_type`: Type of monitoring system. Options:
122
127
- `wandb`: Logs to [Weights & Biases](https://docs.wandb.ai/quickstart/). Requires logging in and setting `WANDB_API_KEY`. Project and run names match the `project` and `name` fields in global configs.
123
128
- `tensorboard`: Logs to [TensorBoard](https://www.tensorflow.org/tensorboard). Files are saved under `<checkpoint_root_dir>/<project>/<name>/monitor/tensorboard`.
124
-
- `enable_ray_timeline`: Whether to export the ray timeline. If set to `True`, a `timeline.json` file will be exported to `<checkpoint_root_dir>/<project>/<name>/monitor`. You can view the timeline file in Chrome at [chrome://tracing](chrome://tracing).
129
+
- `mlflow`: Logs to [MLFlow](https://mlflow.org/). If [MLFlow authentication](https://mlflow.org/docs/latest/ml/auth/) is setup, set `MLFLOW_TRACKING_USERNAME` and `MLFLOW_TRACKING_PASSWORD` as environment variables before running.
130
+
- `monitor_args`: Dictionary of arguments for monitor initialization.
131
+
- For `wandb`:
132
+
- `base_url`: Overrides `WANDB_BASE_URL` if set.
133
+
- `api_key`: Overrides `WANDB_API_KEY` if set.
134
+
- For `mlflow`:
135
+
- `uri`: The URI of your MLFlow instance. Strongly recommended to set; defaults to `http://localhost:5000`.
136
+
- `username`: Overrides `MLFLOW_TRACKING_USERNAME` if set.
137
+
- `password`: Overrides `MLFLOW_TRACKING_PASSWORD` if set.
138
+
- `enable_ray_timeline`: If `True`, exports a `timeline.json` file to `<checkpoint_root_dir>/<project>/<name>/monitor`. Viewable in Chrome at [chrome://tracing](chrome://tracing).
125
139
126
140
---
127
141
@@ -131,8 +145,8 @@ Defines the model paths and token limits.
131
145
132
146
```yaml
133
147
model:
134
-
model_path: /PATH/TO/MODEL/
135
-
critic_model_path: ''
148
+
model_path: ${oc.env:MODEL_PATH} # MODEL_PATH is an environment variable set in advance
149
+
critic_model_path: ${model.model_path} # use the value of model.model_path
136
150
max_response_tokens: 16384
137
151
max_model_len: 20480
138
152
```
@@ -174,10 +188,6 @@ buffer:
174
188
...
175
189
eval_tasksets:
176
190
...
177
-
178
-
explorer_output:
179
-
...
180
-
181
191
trainer_input:
182
192
experience_buffer:
183
193
...
@@ -255,41 +265,6 @@ The configuration for each task dataset is defined as follows:
255
265
- `default_reward_fn_type`: Reward function used during exploration. If not specified, the `buffer.default_reward_fn_type` is used.
256
266
- `workflow_args`: A dictionary of arguments used to supplement dataset-level parameters.
257
267
258
-
259
-
### Explorer Output
260
-
261
-
In [`explore` mode](#global-configuration), since there is no trainer, users can configure an experience buffer via `buffer.explorer_output`, rather than using `buffer.trainer_input`, which will be introduced in the next section.
262
-
263
-
```{note}
264
-
For `both` and `train` modes, users should use `buffer.trainer_input.experience_buffer` instead of `buffer.explorer_output`.
265
-
```
266
-
267
-
```yaml
268
-
buffer:
269
-
...
270
-
explorer_output:
271
-
name: countdown_buffer
272
-
storage_type: queue
273
-
path: sqlite:///countdown_buffer.db
274
-
wrap_in_ray: True
275
-
max_read_timeout: 1800
276
-
```
277
-
278
-
- `name`: The name of the experience buffer. This name will be used as the Ray actor's name, so it must be unique.
279
-
- `storage_type`: The storage type for the experience buffer.
280
-
- `queue`: Experience data is stored in a queue. This storage type is recommended for most use cases.
281
-
- `sql`: Experience data is stored in a SQL database. If your database only supports local access (e.g., SQLite), set `wrap_in_ray` to `True` to wrap the database in a Ray actor, enabling remote access from other nodes.
282
-
- `file`: Experience data is stored in a JSON file. This storage type should be used only for debugging purposes in `explore` mode.
283
-
- `path`: The path to the experience buffer.
284
-
- For `queue` storage type, this field is optional. You can specify a SQLite database or JSON file path here to back up the queue data.
285
-
- For `file` storage type, the path points to the directory containing the dataset files.
286
-
- For `sql` storage type, the path points to the SQLite database file.
287
-
- `wrap_in_ray`: Whether to wrap the experience buffer in a Ray actor. Only take effect when `storage_type` is `sql` or `file`. The `queue` storage always uses a Ray actor.
288
-
- `max_read_timeout`: The maximum waiting time (in seconds) to read new experience data. If exceeded, an incomplete batch will be returned directly. Only take effect when `storage_type` is `queue`. Default is 1800 seconds (30 minutes).
289
-
- `use_priority_queue`: Only take effect when `storage_type` is `queue`. If set to `True`, the queue will be a priority queue, which allows for prioritizing certain experiences over others. Default is `False`.
290
-
- `reuse_cooldown_time`: Only take effect when `storage_type` is `queue` and `use_priority_queue` is `True`. If set, it specifies the cooldown time (in seconds) for reusing experiences. If not specified, the default value is `None`, meaning experiences can not be reused.
291
-
292
-
293
268
### Trainer Input
294
269
295
270
Defines the experience buffer and optional SFT warm-up dataset.
@@ -314,7 +289,19 @@ buffer:
314
289
sft_warmup_steps: 0
315
290
```
316
291
317
-
- `experience_buffer`: Experience buffer used by the trainer, which is logically equivalent to `buffer.explorer_output`.
292
+
- `experience_buffer`: It is the input of Trainer and also the output of Explorer. This field is required even in explore mode.
293
+
- `name`: The name of the experience buffer. This name will be used as the Ray actor's name, so it must be unique.
294
+
- `storage_type`: The storage type for the experience buffer.
295
+
- `queue`: Experience data is stored in a queue. This storage type is recommended for most use cases.
296
+
- `sql`: Experience data is stored in a SQL database.
297
+
- `file`: Experience data is stored in a JSON file. This storage type should be used only for debugging purposes in `explore` mode.
298
+
- `path`: The path to the experience buffer.
299
+
- For `queue` storage type, this field is optional. You can specify a SQLite database or JSON file path here to back up the queue data.
300
+
- For `file` storage type, the path points to the directory containing the dataset files.
301
+
- For `sql` storage type, the path points to the SQLite database file.
302
+
- `max_read_timeout`: The maximum waiting time (in seconds) to read new experience data. If exceeded, an incomplete batch will be returned directly. Only take effect when `storage_type` is `queue`. Default is 1800 seconds (30 minutes).
303
+
- `use_priority_queue`: Only take effect when `storage_type` is `queue`. If set to `True`, the queue will be a priority queue, which allows for prioritizing certain experiences over others. Default is `False`.
304
+
- `reuse_cooldown_time`: Only take effect when `storage_type` is `queue` and `use_priority_queue` is `True`. If set, it specifies the cooldown time (in seconds) for reusing experiences. If not specified, the default value is `None`, meaning experiences can not be reused.
318
305
- `sft_warmup_dataset`: Optional dataset used for pre-training (SFT warmup).
319
306
- `sft_warmup_steps`: Number of steps to use SFT warm-up before RL begins.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -447,7 +447,7 @@ class OPMDPolicyLossFn(PolicyLossFn):
447
447
448
448
The above steps implement the components needed for the algorithm, but these components are scattered and need to be configured in multiple places to take effect.
449
449
450
-
To simplify configuration, Trinity-RFT provides {class}`trinity.algorithm.AlgorithmType` to describe a complete algorithm and registers it in {object}`trinity.algorithm.ALGORITHM_TYPE`, enabling one-click configuration.
450
+
To simplify configuration, Trinity-RFT provides {class}`trinity.algorithm.AlgorithmType` to describe a complete algorithm and registers it in {class}`trinity.algorithm.ALGORITHM_TYPE`, enabling one-click configuration.
451
451
452
452
The `AlgorithmType` class includes the following attributes and methods:
453
453
@@ -473,7 +473,7 @@ class OPMDAlgorithm(AlgorithmType):
0 commit comments