Skip to content

Loss doesn't go down lower than 0.7 for market1501 dataset #91

@mazatov

Description

@mazatov

Trying to train on market1501 as a proof of concept before modifying anything. The training loss quickly foes down to 0.7 and stays there forever. So far haven't changed anything in the script. Just changed the batch_p = 16 , as I was running out of memory on my comptuer. Any ideas on what I might be not doing right?

Colocations handled automatically by placer.
2020-02-25 16:08:37,651 [WARNING] tensorflow: From C:\Users\mazat\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-02-25 16:08:37,651 [WARNING] tensorflow: From C:\Users\mazat\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-02-25 16:08:37,665 [WARNING] tensorflow: From C:\Users\mazat\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\ops\math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2020-02-25 16:08:37,665 [WARNING] tensorflow: From C:\Users\mazat\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\ops\math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2020-02-25 16:08:46,499 [INFO] tensorflow: experiments\market_train\checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2020-02-25 16:08:46,499 [INFO] tensorflow: experiments\market_train\checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2020-02-25 16:09:02,141 [INFO] train: Starting training from iteration 0.
2020-02-25 16:09:09,892 [INFO] train: iter:     1, loss min|avg|max: 0.865|4.333|17.520, batch-p@3: 8.33%, ETA: 2 days, 5:48:35 (7.75s/it)
2020-02-25 16:09:10,247 [INFO] train: iter:     2, loss min|avg|max: 0.872|2.155|11.571, batch-p@3: 13.02%, ETA: 2:25:25 (0.35s/it)
2020-02-25 16:09:10,604 [INFO] train: iter:     3, loss min|avg|max: 0.886|2.971|11.878, batch-p@3: 6.25%, ETA: 2:27:55 (0.36s/it)
2020-02-25 16:09:10,963 [INFO] train: iter:     4, loss min|avg|max: 0.878|1.741| 5.409, batch-p@3: 11.46%, ETA: 2:28:15 (0.36s/it)
2020-02-25 16:09:11,322 [INFO] train: iter:     5, loss min|avg|max: 0.803|1.908| 6.864, batch-p@3: 12.50%, ETA: 2:28:21 (0.36s/it)
2020-02-25 16:09:11,678 [INFO] train: iter:     6, loss min|avg|max: 0.820|1.692| 5.620, batch-p@3: 9.38%, ETA: 2:27:01 (0.35s/it)
2020-02-25 16:09:12,030 [INFO] train: iter:     7, loss min|avg|max: 0.854|1.858| 8.326, batch-p@3: 7.29%, ETA: 2:25:25 (0.35s/it)
2020-02-25 16:09:12,385 [INFO] train: iter:     8, loss min|avg|max: 0.845|1.461| 3.330, batch-p@3: 9.90%, ETA: 2:26:38 (0.35s/it)
2020-02-25 16:09:12,744 [INFO] train: iter:     9, loss min|avg|max: 0.802|1.675| 8.456, batch-p@3: 5.21%, ETA: 2:28:42 (0.36s/it)
2020-02-25 16:09:13,104 [INFO] train: iter:    10, loss min|avg|max: 0.798|1.475| 3.716, batch-p@3: 7.29%, ETA: 2:29:01 (0.36s/it)
2020-02-25 16:09:13,455 [INFO] train: iter:    11, loss min|avg|max: 0.764|1.357| 2.830, batch-p@3: 10.94%, ETA: 2:25:00 (0.35s/it)
2020-02-25 16:09:13,817 [INFO] train: iter:    12, loss min|avg|max: 0.786|1.250| 3.432, batch-p@3: 9.90%, ETA: 2:28:41 (0.36s/it)
2020-02-25 16:09:14,172 [INFO] train: iter:    13, loss min|avg|max: 0.668|1.259| 2.927, batch-p@3: 7.81%, ETA: 2:26:36 (0.35s/it)
2020-02-25 16:09:14,519 [INFO] train: iter:    14, loss min|avg|max: 0.643|1.094| 4.106, batch-p@3: 18.75%, ETA: 2:23:16 (0.34s/it)
2020-02-25 16:09:14,878 [INFO] train: iter:    15, loss min|avg|max: 0.785|1.343| 4.559, batch-p@3: 15.10%, ETA: 2:28:03 (0.36s/it)
2020-02-25 16:09:15,234 [INFO] train: iter:    16, loss min|avg|max: 0.731|1.216| 4.446, batch-p@3: 11.98%, ETA: 2:26:59 (0.35s/it)
2020-02-25 16:09:15,592 [INFO] train: iter:    17, loss min|avg|max: 0.766|1.130| 4.649, batch-p@3: 10.94%, ETA: 2:28:17 (0.36s/it)
2020-02-25 16:09:15,947 [INFO] train: iter:    18, loss min|avg|max: 0.748|1.143| 2.784, batch-p@3: 14.06%, ETA: 2:26:52 (0.35s/it)
2020-02-25 16:09:16,304 [INFO] train: iter:    19, loss min|avg|max: 0.704|1.067| 2.621, batch-p@3: 9.38%, ETA: 2:27:26 (0.35s/it)
2020-02-25 16:09:16,690 [INFO] train: iter:    20, loss min|avg|max: 0.783|1.122| 2.887, batch-p@3: 8.85%, ETA: 2:38:55 (0.38s/it)
2020-02-25 16:09:17,052 [INFO] train: iter:    21, loss min|avg|max: 0.754|1.062| 3.799, batch-p@3: 8.33%, ETA: 2:29:52 (0.36s/it)
2020-02-25 16:09:17,414 [INFO] train: iter:    22, loss min|avg|max: 0.748|1.123| 1.990, batch-p@3: 10.94%, ETA: 2:29:28 (0.36s/it)
2020-02-25 16:09:17,795 [INFO] train: iter:    23, loss min|avg|max: 0.736|0.985| 1.747, batch-p@3: 11.46%, ETA: 2:37:45 (0.38s/it)
2020-02-25 16:09:18,155 [INFO] train: iter:    24, loss min|avg|max: 0.742|1.086| 6.032, batch-p@3: 11.98%, ETA: 2:28:37 (0.36s/it)
2020-02-25 16:09:18,518 [INFO] train: iter:    25, loss min|avg|max: 0.719|1.022| 1.805, batch-p@3: 9.90%, ETA: 2:29:44 (0.36s/it)
2020-02-25 16:09:18,871 [INFO] train: iter:    26, loss min|avg|max: 0.741|1.071| 2.763, batch-p@3: 10.94%, ETA: 2:26:08 (0.35s/it)
2020-02-25 16:09:19,227 [INFO] train: iter:    27, loss min|avg|max: 0.715|0.953| 2.764, batch-p@3: 11.46%, ETA: 2:26:58 (0.35s/it)
2020-02-25 16:09:19,585 [INFO] train: iter:    28, loss min|avg|max: 0.711|0.932| 2.323, batch-p@3: 11.98%, ETA: 2:27:46 (0.36s/it)
2020-02-25 16:09:19,941 [INFO] train: iter:    29, loss min|avg|max: 0.740|1.007| 2.782, batch-p@3: 8.85%, ETA: 2:26:56 (0.35s/it)
2020-02-25 16:09:20,295 [INFO] train: iter:    30, loss min|avg|max: 0.736|0.973| 3.527, batch-p@3: 16.15%, ETA: 2:26:17 (0.35s/it)
2020-02-25 16:09:20,651 [INFO] train: iter:    31, loss min|avg|max: 0.720|0.993| 2.995, batch-p@3: 16.15%, ETA: 2:26:59 (0.35s/it)
2020-02-25 16:09:21,009 [INFO] train: iter:    32, loss min|avg|max: 0.728|1.068| 3.389, batch-p@3: 14.58%, ETA: 2:27:42 (0.35s/it)
2020-02-25 16:09:21,364 [INFO] train: iter:    33, loss min|avg|max: 0.735|0.901| 1.411, batch-p@3: 13.02%, ETA: 2:27:19 (0.35s/it)
2020-02-25 16:09:21,719 [INFO] train: iter:    34, loss min|avg|max: 0.728|0.884| 1.307, batch-p@3: 6.77%, ETA: 2:26:16 (0.35s/it)
2020-02-25 16:09:22,080 [INFO] train: iter:    35, loss min|avg|max: 0.718|0.905| 1.174, batch-p@3: 8.85%, ETA: 2:28:59 (0.36s/it)
2020-02-25 16:09:22,439 [INFO] train: iter:    36, loss min|avg|max: 0.720|0.931| 1.618, batch-p@3: 12.50%, ETA: 2:27:29 (0.35s/it)
2020-02-25 16:09:22,795 [INFO] train: iter:    37, loss min|avg|max: 0.731|0.982| 3.842, batch-p@3: 12.50%, ETA: 2:27:17 (0.35s/it)
2020-02-25 16:09:23,154 [INFO] train: iter:    38, loss min|avg|max: 0.734|0.879| 2.909, batch-p@3: 14.06%, ETA: 2:28:08 (0.36s/it)
2020-02-25 16:09:23,507 [INFO] train: iter:    39, loss min|avg|max: 0.699|0.912| 2.311, batch-p@3: 15.10%, ETA: 2:25:40 (0.35s/it)
2020-02-25 16:09:23,884 [INFO] train: iter:    40, loss min|avg|max: 0.697|0.910| 1.829, batch-p@3: 15.62%, ETA: 2:35:33 (0.37s/it)
2020-02-25 16:09:24,287 [INFO] train: iter:    41, loss min|avg|max: 0.721|0.864| 1.853, batch-p@3: 16.67%, ETA: 2:46:49 (0.40s/it)
2020-02-25 16:09:24,664 [INFO] train: iter:    42, loss min|avg|max: 0.729|0.861| 1.211, batch-p@3: 11.46%, ETA: 2:35:34 (0.37s/it)
2020-02-25 16:09:25,047 [INFO] train: iter:    43, loss min|avg|max: 0.727|0.911| 1.790, batch-p@3: 8.33%, ETA: 2:38:28 (0.38s/it)
2020-02-25 16:09:25,417 [INFO] train: iter:    44, loss min|avg|max: 0.715|0.839| 1.201, batch-p@3: 17.19%, ETA: 2:33:04 (0.37s/it)
2020-02-25 16:09:25,793 [INFO] train: iter:    45, loss min|avg|max: 0.755|0.902| 1.494, batch-p@3: 10.94%, ETA: 2:35:27 (0.37s/it)
2020-02-25 16:09:26,165 [INFO] train: iter:    46, loss min|avg|max: 0.700|0.866| 1.410, batch-p@3: 11.46%, ETA: 2:34:07 (0.37s/it)
2020-02-25 16:09:26,531 [INFO] train: iter:    47, loss min|avg|max: 0.714|0.811| 1.563, batch-p@3: 12.50%, ETA: 2:31:06 (0.36s/it)
2020-02-25 16:09:26,892 [INFO] train: iter:    48, loss min|avg|max: 0.650|0.792| 1.180, batch-p@3: 13.54%, ETA: 2:28:54 (0.36s/it)
2020-02-25 16:09:27,245 [INFO] train: iter:    49, loss min|avg|max: 0.685|0.848| 1.405, batch-p@3: 11.98%, ETA: 2:25:37 (0.35s/it)
2020-02-25 16:09:27,605 [INFO] train: iter:    50, loss min|avg|max: 0.710|0.901| 1.537, batch-p@3: 10.42%, ETA: 2:28:03 (0.36s/it)
2020-02-25 16:09:27,960 [INFO] train: iter:    51, loss min|avg|max: 0.720|0.840| 1.334, batch-p@3: 11.46%, ETA: 2:26:00 (0.35s/it)
2020-02-25 16:09:28,316 [INFO] train: iter:    52, loss min|avg|max: 0.705|0.815| 1.043, batch-p@3: 18.23%, ETA: 2:27:15 (0.35s/it)
2020-02-25 16:09:28,674 [INFO] train: iter:    53, loss min|avg|max: 0.711|0.847| 1.636, batch-p@3: 14.58%, ETA: 2:27:22 (0.35s/it)
2020-02-25 16:09:29,034 [INFO] train: iter:    54, loss min|avg|max: 0.733|0.883| 1.860, batch-p@3: 4.69%, ETA: 2:28:26 (0.36s/it)
2020-02-25 16:09:29,405 [INFO] train: iter:    55, loss min|avg|max: 0.704|0.830| 1.334, batch-p@3: 11.46%, ETA: 2:33:08 (0.37s/it)
2020-02-25 16:09:29,771 [INFO] train: iter:    56, loss min|avg|max: 0.734|0.841| 1.506, batch-p@3: 9.90%, ETA: 2:30:49 (0.36s/it)
2020-02-25 16:09:30,126 [INFO] train: iter:    57, loss min|avg|max: 0.707|0.845| 1.865, batch-p@3: 7.81%, ETA: 2:26:48 (0.35s/it)
2020-02-25 16:09:30,497 [INFO] train: iter:    58, loss min|avg|max: 0.718|0.854| 1.185, batch-p@3: 11.98%, ETA: 2:32:59 (0.37s/it)
2020-02-25 16:09:30,863 [INFO] train: iter:    59, loss min|avg|max: 0.718|0.800| 1.581, batch-p@3: 8.85%, ETA: 2:30:54 (0.36s/it)
2020-02-25 16:09:31,252 [INFO] train: iter:    60, loss min|avg|max: 0.728|0.820| 1.343, batch-p@3: 7.29%, ETA: 2:33:10 (0.37s/it)
2020-02-25 16:09:31,607 [INFO] train: iter:    61, loss min|avg|max: 0.725|0.796| 1.221, batch-p@3: 7.81%, ETA: 2:27:21 (0.35s/it)
2020-02-25 16:09:31,963 [INFO] train: iter:    62, loss min|avg|max: 0.704|0.768| 1.018, batch-p@3: 13.02%, ETA: 2:27:07 (0.35s/it)
2020-02-25 16:09:32,317 [INFO] train: iter:    63, loss min|avg|max: 0.686|0.807| 1.369, batch-p@3: 16.15%, ETA: 2:26:21 (0.35s/it)
2020-02-25 16:09:32,675 [INFO] train: iter:    64, loss min|avg|max: 0.724|0.827| 1.182, batch-p@3: 6.25%, ETA: 2:27:33 (0.36s/it)
2020-02-25 16:09:33,036 [INFO] train: iter:    65, loss min|avg|max: 0.719|0.785| 1.116, batch-p@3: 8.85%, ETA: 2:29:19 (0.36s/it)
2020-02-25 16:09:33,414 [INFO] train: iter:    66, loss min|avg|max: 0.712|0.801| 1.183, batch-p@3: 11.98%, ETA: 2:36:10 (0.38s/it)
2020-02-25 16:09:33,814 [INFO] train: iter:    67, loss min|avg|max: 0.723|0.800| 1.365, batch-p@3: 8.85%, ETA: 2:45:05 (0.40s/it)
2020-02-25 16:09:34,181 [INFO] train: iter:    68, loss min|avg|max: 0.703|0.781| 1.249, batch-p@3: 12.50%, ETA: 2:31:15 (0.36s/it)
2020-02-25 16:09:34,551 [INFO] train: iter:    69, loss min|avg|max: 0.711|0.800| 1.218, batch-p@3: 13.54%, ETA: 2:32:55 (0.37s/it)
2020-02-25 16:09:34,920 [INFO] train: iter:    70, loss min|avg|max: 0.722|0.809| 1.138, batch-p@3: 11.98%, ETA: 2:32:59 (0.37s/it)
2020-02-25 16:09:35,296 [INFO] train: iter:    71, loss min|avg|max: 0.716|0.796| 1.108, batch-p@3: 10.94%, ETA: 2:35:18 (0.37s/it)
2020-02-25 16:09:35,665 [INFO] train: iter:    72, loss min|avg|max: 0.688|0.784| 1.115, batch-p@3: 16.15%, ETA: 2:32:34 (0.37s/it)
2020-02-25 16:09:36,024 [INFO] train: iter:    73, loss min|avg|max: 0.719|0.800| 1.315, batch-p@3: 10.94%, ETA: 2:28:35 (0.36s/it)
2020-02-25 16:09:36,380 [INFO] train: iter:    74, loss min|avg|max: 0.714|0.792| 1.027, batch-p@3: 14.06%, ETA: 2:27:07 (0.35s/it)
2020-02-25 16:09:36,751 [INFO] train: iter:    75, loss min|avg|max: 0.710|0.778| 1.118, batch-p@3: 9.90%, ETA: 2:33:15 (0.37s/it)
2020-02-25 16:09:37,138 [INFO] train: iter:    76, loss min|avg|max: 0.699|0.776| 1.313, batch-p@3: 11.98%, ETA: 2:40:23 (0.39s/it)
2020-02-25 16:09:37,504 [INFO] train: iter:    77, loss min|avg|max: 0.721|0.809| 1.144, batch-p@3: 12.50%, ETA: 2:30:51 (0.36s/it)
2020-02-25 16:09:37,866 [INFO] train: iter:    78, loss min|avg|max: 0.729|0.801| 0.942, batch-p@3: 11.46%, ETA: 2:29:15 (0.36s/it)
2020-02-25 16:09:38,225 [INFO] train: iter:    79, loss min|avg|max: 0.697|0.782| 0.967, batch-p@3: 15.10%, ETA: 2:28:17 (0.36s/it)
2020-02-25 16:09:38,586 [INFO] train: iter:    80, loss min|avg|max: 0.710|0.815| 1.793, batch-p@3: 9.38%, ETA: 2:29:01 (0.36s/it)
2020-02-25 16:09:38,941 [INFO] train: iter:    81, loss min|avg|max: 0.707|0.826| 1.555, batch-p@3: 10.94%, ETA: 2:26:38 (0.35s/it)

And lots of iterations later:

2020-02-25 19:33:32,859 [INFO] train: iter:  8898, loss min|avg|max: 0.415|0.687| 1.047, batch-p@3: 56.25%, ETA: 1:37:59 (0.37s/it)
2020-02-25 19:33:33,222 [INFO] train: iter:  8899, loss min|avg|max: 0.656|0.787| 1.747, batch-p@3: 55.73%, ETA: 1:36:35 (0.36s/it)
2020-02-25 19:33:33,579 [INFO] train: iter:  8900, loss min|avg|max: 0.691|0.737| 1.346, batch-p@3: 54.17%, ETA: 1:34:59 (0.35s/it)
2020-02-25 19:33:33,940 [INFO] train: iter:  8901, loss min|avg|max: 0.691|0.826| 2.345, batch-p@3: 42.19%, ETA: 1:36:04 (0.36s/it)
2020-02-25 19:33:34,302 [INFO] train: iter:  8902, loss min|avg|max: 0.638|0.717| 0.965, batch-p@3: 58.33%, ETA: 1:36:22 (0.36s/it)
2020-02-25 19:33:34,670 [INFO] train: iter:  8903, loss min|avg|max: 0.353|0.677| 0.728, batch-p@3: 53.65%, ETA: 1:37:55 (0.36s/it)
2020-02-25 19:33:35,031 [INFO] train: iter:  8904, loss min|avg|max: 0.691|0.745| 1.200, batch-p@3: 54.69%, ETA: 1:36:09 (0.36s/it)
2020-02-25 19:33:35,397 [INFO] train: iter:  8905, loss min|avg|max: 0.693|0.777| 3.160, batch-p@3: 43.23%, ETA: 1:37:22 (0.36s/it)
2020-02-25 19:33:35,757 [INFO] train: iter:  8906, loss min|avg|max: 0.692|0.925| 4.062, batch-p@3: 37.50%, ETA: 1:36:02 (0.36s/it)
2020-02-25 19:33:36,120 [INFO] train: iter:  8907, loss min|avg|max: 0.692|0.804| 1.711, batch-p@3: 46.88%, ETA: 1:36:18 (0.36s/it)
2020-02-25 19:33:36,483 [INFO] train: iter:  8908, loss min|avg|max: 0.378|0.677| 0.697, batch-p@3: 72.40%, ETA: 1:36:34 (0.36s/it)
2020-02-25 19:33:36,878 [INFO] train: iter:  8909, loss min|avg|max: 0.693|0.744| 3.838, batch-p@3: 57.81%, ETA: 1:45:06 (0.39s/it)
2020-02-25 19:33:37,249 [INFO] train: iter:  8910, loss min|avg|max: 0.689|0.846| 3.782, batch-p@3: 59.38%, ETA: 1:38:25 (0.37s/it)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions