You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,6 +105,10 @@ buffer:
105
105
format:
106
106
prompt_key: 'question'
107
107
response_key: 'answer'
108
+
rollout_args:
109
+
repeat_times: 1
110
+
temperature: 1.0
111
+
logprobs: 0
108
112
eval_tasksets: []
109
113
default_workflow_type: 'math_workflow'
110
114
default_reward_fn_type: 'countdown_reward'
@@ -123,6 +127,9 @@ buffer:
123
127
- `buffer.explorer_input.taskset.path`: The path to the taskset.
124
128
- `buffer.explorer_input.taskset.split`: The split name of the taskset used for training. Default is `train`.
125
129
- `buffer.explorer_input.taskset.format`: The format of the taskset. It includes `prompt_key`, `response_key`, `workflow_key` and `reward_fn_key`.
130
+
- `buffer.explorer_input.taskset.rollout_args.repeat_times`: The number of times to repeat each task, used for GRPO-like algorithms. Default is `1`.
131
+
- `buffer.explorer_input.taskset.rollout_args.temperature`: The temperature used in vLLM. Default is `1.0`.
132
+
- `buffer.explorer_input.taskset.rollout_args.logprobs`: The logprobs used in vLLM. Default is `0`.
126
133
- `buffer.explorer_input.eval_tasksets`: The configuration of the eval tasksets. It is a list of tasksets which will be used for evaluation. And it is empty by default.
127
134
- `buffer.explorer_input.default_workflow_type`: The default workflow type for `taskset` and `eval_tasksets`.
128
135
- `buffer.explorer_input.default_reward_fn_type`: The default reward function type for `taskset` and `eval_tasksets`.
@@ -145,10 +152,7 @@ explorer:
145
152
enable_prefix_caching: false
146
153
enforce_eager: true
147
154
dtype: bfloat16
148
-
temperature: 1.0
149
155
seed: 42
150
-
logprobs: 0
151
-
repeat_times: 5
152
156
use_ray: false
153
157
backend: 'nccl'
154
158
max_pending_requests: 32
@@ -162,10 +166,7 @@ explorer:
162
166
- `explorer.enable_prefix_caching`: Whether to enable prefix caching. Default is `False`.
163
167
- `explorer.enforce_eager`: Whether to enforce eager mode. Default is `True`.
164
168
- `explorer.dtype`: The data type used in vLLM. Default is `bfloat16`.
165
-
- `explorer.temperature`: The temperature used in vLLM. Default is `1.0`.
166
169
- `explorer.seed`: The seed used in vLLM. Default is `42`.
167
-
- `explorer.logprobs`: The logprobs used in vLLM. Default is `0`.
168
-
- `explorer.repeat_times`: The number of times to repeat each task, used for GRPO-like algorithms. Default is `5`.
169
170
- `explorer.use_ray`: Whether to use Ray. Default is `False`.
170
171
- `explorer.backend`: The backend used in vLLM. Default is `nccl`.
171
172
- `explorer.max_pending_requests`: The maximum number of pending requests. Default is `32`.
0 commit comments