Skip to content

The training of policy model #25

@RewindL

Description

@RewindL

Thanks for your remarkable open-source work. The self-train part in README says

Regarding Llama3-8B-Instruct and Mistral-7B: MetaMATH, we use the default repo of [[MAmmoTH]](https://github.com/TIGER-AI-Lab/MAmmoTH) to train the policy model and evaluate.
Regarding SciGLM-6B, we use the default repo of [[SciGLM]](https://github.com/THUDM/SciGLM) to train the policy model and evaluate.

Since collected SFT data and models are available, why choosing these two repos for SFT other than writing training code like PRM/train_mistral.py in your repo? Is this just for convenience or does the corresponding repos have some performance advantages for specific model/dataset?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions