-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Training the model for 48 hours on the Dataset using mscnn_train.py appears to show the model converging, however I understand that further training time might be required i.e. the full 100k steps.
However, when running mscnn_eval.py on a snapshot, the following output is observed:
time: 10.645102999999999 loss_value: inf counting:172.0000040 predict:4326801291354836713957490688.0000000 diff:-4326801291354836713957490688.0000000 ./mscnn_eval.py:108: RuntimeWarning: overflow encountered in multiply sum_all_mse += sum_ab * sum_ab
As you can seem, the predicted value and loss value do not make sense. Is the evaluation routine broken?
After editing mscnn_eval.py to use the batch norm version of the inference method (as used by mscnn_train.py):
#predict_op = mscnn.inference(images)
predict_op = mscnn.inference_bn(images)
The results now seem more sensible:
time: 0.2461100000000016 loss_value: 18.351429 counting:370.9999983 predict:327.6858826 diff:43.3141174
time: 0.24157399999999996 loss_value: 13.386545 counting:501.9999339 predict:500.1551819 diff:1.8447571
time: 0.21929399999999788 loss_value: 24.295664 counting:1067.9998121 predict:985.3113403 diff:82.6884155
time: 0.2419879999999992 loss_value: 37.284615 counting:320.9999975 predict:236.7593231 diff:84.2406769
Is this the correct thing to do?