Skip to content

Problem about training #21

@1563245379

Description

@1563245379

When I try to train the projector using the command below, the results I get on gsm8k are significantly lower than those obtained by testing with the released weights (only about 76%). I want to know whether the final saved weights are from the last epoch or from the best epoch on the validation set.

CUDA_VISIBLE_DEVICES=0 python train_softcot.py
--large_model_id Qwen/Qwen2.5-7B-Instruct
--small_model_id Qwen/Qwen2.5-1.5B-Instruct
--output_name [Output Name]
--batch_size 4
--task_name gsm8k
--num_thought_tokens 32
--n_epochs 10 \

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions