You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
More details on dataset downloading are referred to [ModelScope](https://modelscope.cn/docs/datasets/download) or [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space).
39
+
The dataset downloaded from ModelScope may lack the `dtype` field and cause error when loading the dataset. To solve this issue, please delete the `dataset_infos.json` file and run the experiment again.
39
40
40
41
## Step 2: Set up Configuration and Run Experiment
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -163,6 +163,7 @@ model:
163
163
max_prompt_tokens: 4096
164
164
max_response_tokens: 16384
165
165
min_response_tokens: 1
166
+
enable_prompt_truncation: true
166
167
```
167
168
168
169
- `model_path`: Path to the model being trained.
@@ -173,6 +174,7 @@ model:
173
174
- `max_response_tokens`: Maximum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`.
174
175
- `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
175
176
- `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
177
+
- `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`.
176
178
177
179
```{tip}
178
180
If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.
This example shows the usage of GRPO on the [Frozen Lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) task. Note that this task is only tested with Qwen2.5 Instruct models.
4
+
5
+
6
+
## Data and Environment Preparation
7
+
8
+
After setting up the basic environment following the [installation guidance](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html), you need to install the additional dependencies by running the following command:
9
+
10
+
```bash
11
+
pip install gymnasium[toy_text]
12
+
```
13
+
14
+
Then, we prepare the dataset by running the following command:
15
+
16
+
```bash
17
+
cd examples/grpo_frozen_lake
18
+
python get_frozen_lake_data.py
19
+
```
20
+
21
+
This command will save the dataset to the local directory `/path/to/frozenlake`, and print the path of the dataset. Afterwards, make sure to set the environment variable `TRINITY_TASKSET_PATH` to the path of the dataset.
22
+
```bash
23
+
export TRINITY_TASKSET_PATH=/path/to/frozenlake
24
+
```
25
+
26
+
27
+
## Workflow Configuration and Training
28
+
29
+
We use a concatenated multi-turn workflow `FrozenLakeWorkflow` to solve the Frozen Lake task. For each rollout, the multi-turn interaction in between the agent and feedback from the environment are stored in a single `Experience` object.
30
+
The specific configuration is located in [`frozen_lake.yaml`](frozen_lake.yaml).
31
+
32
+
To run this example, you can use the following command:
33
+
34
+
```bash
35
+
trinity run --config examples/grpo_frozen_lake/frozen_lake.yaml
36
+
```
37
+
38
+
## Results
39
+
We show the result with a Qwen2.5-3B-Instruct model in the following. The figures demonstrate both the reward and the test score increase over training steps.
f"Saved dataset '{name}' split '{split}' with {len(data)} examples at {dataset_path}. Make sure to set the environment variable {TASKSET_PATH_ENV_VAR} to {DATA_ROOT_DIR}/{name}."
0 commit comments