The training of policy model

Thanks for your remarkable open-source work. The self-train part in README says

> Regarding Llama3-8B-Instruct and Mistral-7B: MetaMATH, we use the default repo of **[[MAmmoTH]]**(https://github.com/TIGER-AI-Lab/MAmmoTH) to train the policy model and evaluate.
> Regarding SciGLM-6B, we use the default repo of **[[SciGLM]]**(https://github.com/THUDM/SciGLM) to train the policy model and evaluate.

Since collected SFT data and models are available, why choosing these two repos for SFT other than writing training code like `PRM/train_mistral.py` in your repo? Is this just for convenience or does the corresponding repos have some performance advantages for specific model/dataset? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training of policy model #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The training of policy model #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions