When reproducing the model using COCO dataset, I observed a significant gap in bit accuracy during pretraining phase
Performance Discrepancy
After first training phase
My implementation achieves only ~0.8 bit_acc (single watermark case)
Paper reports 0.95 bit_acc (single watermark case)
Calculation Method Conflict
I found conflicting thresholding logic in the codebase when computing bit accuracy:
In msg_predict_inference (IPython notebook):
preds = preds > 0.5 # Explicit 0.5 threshold
In bit_accuracy_inference (training metrics):
preds = preds > threshold # Default threshold=0.0
This discrepancy significantly impacts binary predictions. Which approach is canonical?