|
30 | 30 | - Building Custom RMs: tutorial/building_rm/custom_reward.md |
31 | 31 | - Rubric as Rewards: tutorial/building_rm/autorubric.md |
32 | 32 |
|
33 | | - - Evaluating RM: |
34 | | - - Overview: tutorial/evaluation/overview.md |
35 | | - - RMB: tutorial/evaluation/rmb.md |
36 | | - - RM-Bench: tutorial/evaluation/rmbench.md |
37 | | - - JudgeBench: tutorial/evaluation/judgebench.md |
38 | | - - RewardBench2: tutorial/evaluation/rewardbench2.md |
39 | | - - Conflict Detector: tutorial/evaluation/conflict_detector.md |
| 33 | + - Training RM: |
| 34 | + - Overview: tutorial/training_rm/overview.md |
| 35 | + - Training Bradley-Terry RM: tutorial/training_rm/bradley_terry_rm.md |
| 36 | + - Training RM with SFT: tutorial/training_rm/sft_rm.md |
| 37 | + - Training RM with RL: tutorial/training_rm/training_rm.md |
40 | 38 |
|
41 | 39 | - Using RM: |
42 | 40 | - RM Server: tutorial/rm_serving/rm_server.md |
|
45 | 43 | - Post Training with RM: tutorial/rm_application/post_training.md |
46 | 44 | - Best of N: tutorial/rm_application/best_of_n.md |
47 | 45 |
|
48 | | - - Training RM: |
49 | | - - Overview: tutorial/training_rm/overview.md |
50 | | - - Training Bradley-Terry RM: tutorial/training_rm/bradley_terry_rm.md |
51 | | - - Training RM with SFT: tutorial/training_rm/sft_rm.md |
52 | | - - Training RM with RL: tutorial/training_rm/training_rm.md |
| 46 | + |
| 47 | + - Evaluating RM: |
| 48 | + - Overview: tutorial/evaluation/overview.md |
| 49 | + - RMB: tutorial/evaluation/rmb.md |
| 50 | + - RM-Bench: tutorial/evaluation/rmbench.md |
| 51 | + - JudgeBench: tutorial/evaluation/judgebench.md |
| 52 | + - RewardBench2: tutorial/evaluation/rewardbench2.md |
| 53 | + - Conflict Detector: tutorial/evaluation/conflict_detector.md |
53 | 54 |
|
54 | 55 | - Data: |
55 | 56 | - Overview: tutorial/data/pipeline.md |
|
0 commit comments