Skip to content

About the reproduction of the experiment #55

@hj-work

Description

@hj-work

Hello,

I have a question regarding experimental reproducibility. After uncommenting the random seed section in train.py and performing single-GPU training, I observed inconsistent training loss trajectories and varying model performance, potentially due to factors like the DataLoader configuration:
dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=True, num_workers=8)

I trained and tested the model on the Kvasir-SEG dataset for polyp segmentation, the highest test performance achieved was mDice: 0.925, slightly lower than the 0.928 reported in the paper.

To ensure fair comparison:
Should I conduct multiple training runs and report the average performance?
Alternatively, is selecting the best-performing run acceptable?
Would performing only a single training/evaluation run be considered valid?

Apologies if this is too basic – I'm new to this area. Appreciate your guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions