Accuracy in STN drops rapidly during training #12082

rohitrango · 2018-08-08T10:36:31Z

rohitrango
Aug 8, 2018

Description

While training a Spatial Transformer Network on a modified MNIST dataset (40*40 images with rotated, scaled and translated images), the accuracy increases and then drops suddenly. I have tried different architectures, and the custom initialisation as well, but the problem persists. A baseline architecture without STNs gets ~96% val. accuracy on the dataset.

Environment info

Here is the output of `diagnose.py'

----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Dec  4 2017 14:50:18'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '18.0')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version      :', '1.2.1')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/mxnet')
('Commit Hash   :', '106391a1f0ee012b1ea38764d711e76774ce77e1')
----------System Info----------
('Platform     :', 'Linux-4.13.0-36-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', 'MagentaEye')
('release      :', '4.13.0-36-generic')
('version      :', '#40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz
Stepping:              3
CPU MHz:               3700.000
CPU max MHz:           3700.0000
CPU min MHz:           800.0000
BogoMIPS:              7392.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti retpoline intel_pt rsb_ctxsw tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0015 sec, LOAD: 1.2692 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0012 sec, LOAD: 1.2239 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1208 sec, LOAD: 1.7882 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, <urlopen error ('_ssl.c:574: The handshake operation timed out',)>, DNS finished in 0.0552489757538 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2040 sec, LOAD: 1.3690 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.4650 sec, LOAD: 1.0381 sec.

I'm using Python mxnet-cu92==1.2.1.post1

Training log:

WARNING:root:Already bound, ignoring bind()
/usr/local/lib/python2.7/dist-packages/scipy/ndimage/interpolation.py:583: UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.
  "the returned array has changed.", UserWarning)
[13:56:24] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [100]	Speed: 313.16 samples/sec	cross-entropy=2.185727	accuracy=0.205644
INFO:root:Epoch[0] Batch [200]	Speed: 325.85 samples/sec	cross-entropy=1.729788	accuracy=0.385500
INFO:root:Epoch[0] Batch [300]	Speed: 321.29 samples/sec	cross-entropy=1.357834	accuracy=0.536600
INFO:root:Epoch[0] Batch [400]	Speed: 335.16 samples/sec	cross-entropy=1.131152	accuracy=0.627700
INFO:root:Epoch[0] Batch [500]	Speed: 301.69 samples/sec	cross-entropy=0.815976	accuracy=0.734200
INFO:root:Epoch[0] Train-cross-entropy=0.817999
INFO:root:Epoch[0] Train-accuracy=0.738061
INFO:root:Epoch[0] Time cost=185.885
INFO:root:Epoch[0] Validation-cross-entropy=0.779935
INFO:root:Epoch[0] Validation-accuracy=0.747576
INFO:root:Epoch[1] Batch [100]	Speed: 317.79 samples/sec	cross-entropy=0.676048	accuracy=0.784356
INFO:root:Epoch[1] Batch [200]	Speed: 332.12 samples/sec	cross-entropy=0.643721	accuracy=0.797900
INFO:root:Epoch[1] Batch [300]	Speed: 304.31 samples/sec	cross-entropy=0.446212	accuracy=0.855900
INFO:root:Epoch[1] Batch [400]	Speed: 314.56 samples/sec	cross-entropy=0.410928	accuracy=0.869100
INFO:root:Epoch[1] Batch [500]	Speed: 324.77 samples/sec	cross-entropy=0.488531	accuracy=0.842300
INFO:root:Epoch[1] Train-cross-entropy=0.404039
INFO:root:Epoch[1] Train-accuracy=0.875816
INFO:root:Epoch[1] Time cost=189.229
INFO:root:Epoch[1] Validation-cross-entropy=0.959124
INFO:root:Epoch[1] Validation-accuracy=0.694444
.INFO:root:Epoch[2] Batch [100]	Speed: 313.64 samples/sec	cross-entropy=0.401807	accuracy=0.875743
INFO:root:Epoch[2] Batch [200]	Speed: 306.60 samples/sec	cross-entropy=1.228554	accuracy=0.539900
INFO:root:Epoch[2] Batch [300]	Speed: 325.30 samples/sec	cross-entropy=2.303548	accuracy=0.113300
^[^KINFO:root:Epoch[2] Batch [400]	Speed: 307.10 samples/sec	cross-entropy=2.301380	accuracy=0.116400
INFO:root:Epoch[2] Batch [500]	Speed: 314.42 samples/sec	cross-entropy=2.302255	accuracy=0.107400
INFO:root:Epoch[2] Train-cross-entropy=2.301545
INFO:root:Epoch[2] Train-accuracy=0.111429
INFO:root:Epoch[2] Time cost=189.642
INFO:root:Epoch[2] Validation-cross-entropy=2.301050
INFO:root:Epoch[2] Validation-accuracy=0.113434
INFO:root:Epoch[3] Batch [100]	Speed: 305.17 samples/sec	cross-entropy=2.300846	accuracy=0.113564
INFO:root:Epoch[3] Batch [200]	Speed: 319.70 samples/sec	cross-entropy=2.301263	accuracy=0.114000
INFO:root:Epoch[3] Batch [300]	Speed: 317.78 samples/sec	cross-entropy=2.301824	accuracy=0.110700
INFO:root:Epoch[3] Batch [400]	Speed: 337.87 samples/sec	cross-entropy=2.300997	accuracy=0.114900
INFO:root:Epoch[3] Batch [500]	Speed: 318.91 samples/sec	cross-entropy=2.301557	accuracy=0.109200
INFO:root:Epoch[3] Train-cross-entropy=2.301959
INFO:root:Epoch[3] Train-accuracy=0.110816
INFO:root:Epoch[3] Time cost=185.790
INFO:root:Epoch[3] Validation-cross-entropy=2.300995
INFO:root:Epoch[3] Validation-accuracy=0.113737

Minimum reproducible example

This is the localization net that I'm using:

def localization_net(image):
    # 40 * 40    
    data = conv_net(image, kernel=(3, 3), num_filter=16, name_num=1)
    data = conv_net(data, kernel=(3, 3), num_filter=16, name_num=2)
    # 36 * 36
    data = sym.Pooling(data, kernel=(2, 2), stride=(2, 2), pool_type='max', name="maxpool1")
    # 18 * 18
    data = conv_net(data, kernel=(3, 3), num_filter=32, name_num=3)
    data = conv_net(data, kernel=(3, 3), num_filter=32, name_num=4)
    data = conv_net(data, kernel=(3, 3), num_filter=32, name_num=5)    
    # 12 * 12
    data = sym.Pooling(data, kernel=(2, 2), stride=(2, 2), pool_type='max', name="maxpool2")
    # 6 * 6
    data = sym.Flatten(data)
    # 36*32
    data = sym.FullyConnected(data, num_hidden=500, name="fc1")
    data = sym.relu(data)
    data = sym.FullyConnected(data, num_hidden=100, name="fc2")
    data = sym.relu(data)
    data = sym.FullyConnected(data, num_hidden=6, name="fc3")
    return data

Here is the main model (an STN followed by a normal feedforward net):

def STN_example(image, batch_size, output_dims, out_name):
    theta = localization_net(image)
    img_crop = mx.sym.SpatialTransformer(image, theta, (24, 24), 'affine', 'bilinear')
    # 24*24
    # Run a small convnet on this
    data = conv_net(img_crop, kernel=(3, 3), num_filter=32, name_num="11")
    data = conv_net(data, kernel=(3 ,3), num_filter=32, name_num="12")
    # 20*20
    data = conv_net(data, kernel=(5, 5), num_filter=64, name_num="15")
    # 16*16
    data= sym.Pooling(data, kernel=(2, 2), stride=(2, 2), pool_type="max", name="maxpool_0")
    # 8*8
    data = sym.Flatten(data, name="flatten")
    data = sym.relu(sym.FullyConnected(data, num_hidden=1000), name="fc_0")
    data = sym.Dropout(data, p=0.2, name="dropout1")
    data = sym.relu(sym.FullyConnected(data, num_hidden=500), name="fc_1")
    data = sym.Dropout(data, p=0.2, name="dropout1")
    data = sym.FullyConnected(data, num_hidden=output_dims, name="fc_2")   
    loss = sym.SoftmaxOutput(data=data, name=out_name)
    return loss

And this is my training loop:

batch_size = 100
train_iter = MNISTIter(batch_size)
val_iter = MNISTIter(batch_size, train=False)
sym = model(batch_size)

mod = mx.mod.Module(sym, context=mx.gpu(0), data_names=['image'], label_names=['softmax_label'])
print("Module:", mod)
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
mod.fit(train_iter, 
              eval_data=val_iter,
              optimizer='sgd',
              optimizer_params={'learning_rate': 0.05},
              eval_metric=['ce','acc'],
              batch_end_callback = mx.callback.Speedometer(batch_size, 100), 
              num_epoch=20,
              initializer=mx.init.Normal(0.01),
              allow_missing=True,
              arg_params={'fc3_weights': mx.nd.zeros((6, 100)) , 'fc3_bias': mx.nd.array([1.0, 0.0, 0.0, 0.0, 1.0, 0.0])}
              )

I have initialised the weights and biases of the last regression layer to get the identity transform.

Steps to reproduce

Run the training loop with the network.

What have you tried to solve it?

A myriad of initialisations : normal, uniform, Xavier all with different params, different optimizers - Adam, SGD, a variety of learning rates and weight decays.
Wrote a custom STN with some inspiration from this repo: https://github.com/kevinzakka/spatial-transformer-network.
Applied a sigmoid/tanh activation at the last layer of localization network as well.

The same problem happens every time. What should I do? Is it a problem with the STN or the localisation network?

lanking520 · 2018-08-08T20:54:54Z

lanking520
Aug 8, 2018
Collaborator

Hi @rohitrango , thanks for your issue. @ThomasDelteil as ML guru to help you out. Also please try to send this issue to https://discuss.mxnet.io/ as more discussion on Accuracy and performance will be there.

@mxnet-label-bot could you please add [question, performance, python] here?

0 replies

rohitrango · 2018-08-08T21:06:29Z

rohitrango
Aug 8, 2018
Author

Thank you @lanking520, I will post it there as well.

0 replies

tlatlbtle · 2019-03-13T08:40:39Z

tlatlbtle
Mar 13, 2019

mark. I want know that why add tanh to the last layer make STN works well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy in STN drops rapidly during training #12082

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Accuracy in STN drops rapidly during training #12082

Uh oh!

rohitrango Aug 8, 2018

Description

Environment info

Training log:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

Replies: 3 comments

Uh oh!

lanking520 Aug 8, 2018 Collaborator

Uh oh!

rohitrango Aug 8, 2018 Author

Uh oh!

tlatlbtle Mar 13, 2019

rohitrango
Aug 8, 2018

lanking520
Aug 8, 2018
Collaborator

rohitrango
Aug 8, 2018
Author

tlatlbtle
Mar 13, 2019