About the reproduction of the experiment

Hello,

I have a question regarding experimental reproducibility. After uncommenting the random seed section in train.py and performing single-GPU training, I observed inconsistent training loss trajectories and varying model performance, potentially due to factors like the DataLoader configuration:
`dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=True, num_workers=8)`

I trained and tested the model on the Kvasir-SEG dataset for polyp segmentation, the highest test performance achieved was mDice: 0.925, slightly lower than the 0.928 reported in the paper.

To ensure fair comparison:
Should I conduct multiple training runs and report the average performance?
Alternatively, is selecting the best-performing run acceptable?
Would performing only a single training/evaluation run be considered valid?

Apologies if this is too basic – I'm new to this area. Appreciate your guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the reproduction of the experiment #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About the reproduction of the experiment #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions