Problem about training

When I try to train the projector using the command below, the results I get on gsm8k are significantly lower than those obtained by testing with the released weights  (only about 76%). I want to know whether the final saved weights are from the last epoch or from the best epoch on the validation set.

CUDA_VISIBLE_DEVICES=0 python train_softcot.py \
    --large_model_id Qwen/Qwen2.5-7B-Instruct \
    --small_model_id Qwen/Qwen2.5-1.5B-Instruct \
    --output_name [Output Name] \
    --batch_size 4 \
    --task_name gsm8k \
    --num_thought_tokens 32 \
    --n_epochs 10 \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem about training #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem about training #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions