You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/metrics_reference.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Metrics Reference
1
+
# Monitor Metrics Reference
2
2
3
3
This document provides an overview of the metric categories used in Trinity-RFT for tracking exploration, evaluation, and training progress.
4
4
@@ -122,7 +122,7 @@ graph TD
122
122
end
123
123
```
124
124
125
-
When you set `monitor.detailed_stats` to `True`, you will get detailed statistics including mean, std, min, max, e.g., `eval/dummy/accuracy/mean@2/mean=0.83`, `eval/dummy/accuracy/mean@2/std=0.062`, `eval/dummy/accuracy/mean@2/max=0.9`, and `eval/dummy/accuracy/mean@2/min=0.75`:
125
+
When you set `monitor.detailed_stats` to `True`, you will get detailed statistics including mean, std, min, max, as shown in the following diagram:
126
126
127
127
```mermaid
128
128
graph TD
@@ -137,20 +137,20 @@ graph TD
137
137
138
138
#### Time Metrics (`time/`)
139
139
140
-
Time metrics measure execution duration for various operations throughout the training pipeline.
140
+
Time metrics measure execution duration for various operations throughout the rollout process.
141
141
142
142
-**Format**: `time/{operation_name}`
143
143
-**Examples**:
144
144
-`time/eval`: Time from the start of submitting evaluation tasks to the end of the evaluation phase; this duration includes both evaluation tasks and some rollout tasks.
145
-
-`time/train_step`: Total time for one training step
145
+
-`time/wait_explore_step`: Time to wait for one rollout step to complete.
146
146
147
147
**Note**:
148
-
- Time measuring can be inaccurate due to the asynchronous nature of the exploration pipeline, but it is still useful for monitoring the overall training progress.
149
-
-Above metrics are reported in seconds unless otherwise specified.
148
+
- Time measuring can be inaccurate due to the asynchronous nature of the rollout process, but it is still useful for monitoring the overall training progress.
149
+
-Time metrics are reported in seconds unless otherwise specified.
150
150
- Some training operations also report per-token timing metrics with the prefix `timing_per_token_ms/` (e.g., `timing_per_token_ms/update_actor`, `timing_per_token_ms/update_critic`, `timing_per_token_ms/adv`, `timing_per_token_ms/values`). These metrics normalize execution time by the number of tokens processed, providing efficiency measurements independent of batch size.
151
151
152
152
153
-
### Training Metrics
153
+
### Trainer Metrics
154
154
155
155
This category includes metrics that track the training dynamics of the policy (actor) model (`actor/`) and the value function (critic) model (`critic/`), as well as some performance metrics (`perf/`, `global_seqlen/`, `response_length/`, `prompt_length/`, `time/`). These metrics are adapted from [veRL](https://github.com/volcengine/verl). Interested users can refer to the [veRL documentation](https://verl.readthedocs.io/en/latest/index.html) for more details.
0 commit comments