You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_data_functionalities.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,8 @@ In this example, you will learn how to apply the data processor of Trinity-RFT t
8
8
2. how to configure the data processor
9
9
3. what the data processor can do
10
10
11
-
Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of the README file](../main.md).
11
+
Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of the README file](../main.md),
12
+
and store the base url and api key in the environment variables `OPENAI_BASE_URL` and `OPENAI_API_KEY` for some agentic or API-model usages if necessary.
12
13
13
14
### Data Preparation
14
15
@@ -218,7 +219,7 @@ Here you can set the input/output buffers for the experience pipeline, and some
218
219
+ `input_buffers`: the input buffers for the experience pipeline. It usually loads from the explorer output buffer, so we need to specify the `explorer_output` in the `buffer` config, and here we only need to specify the name that is aligned with the `explorer_output`. It allows multiple input buffers, but for now, we only need to specify one.
219
220
+ `output_buffer`: the output buffer for the experience pipeline. It usually writes results to the input buffer of trainer, so we only need to the specify the buffer name that is aligned with the `trainer_input` in the `buffer` config.
220
221
+ `format`: some dataset format config items, which are used to map original data field names to unified ones. Here we only need to specify the field name to store the original reward information.
221
-
+ `reward_shaping`: the method to reshap the reward. Usually we use some stats computed by operators in Data-Juicer as new reward items. It's a list that allows multiple methods to reshape rewards. Each item in the list has the following config items:
222
+
+ `reward_shaping`: the method to reshape the reward. Usually we use some stats computed by operators in Data-Juicer as new reward items. It's a list that allows multiple methods to reshape rewards. Each item in the list has the following config items:
222
223
+ `stats_key`: which stats to use as the new reward item.
223
224
+ `op_type`: the operator to apply the new reward item to the original reward. For now, ["ADD", "SUB", "MUL", "DIV"] are supported.
0 commit comments