-
Notifications
You must be signed in to change notification settings - Fork 8
Description
When i want to train the net for :
th main.lua -weights <path/to/downloaded_weights/model_snapshot_7scenes.t7> -dataset_src_path </path/to/7Scenes>
without -do_evaluation I have meet some problem.
Here is error:
{
val_batch_size : 40
beta1 : 0.9
do_evaluation : false
use_dropout : false
dataset_src_path : "/data/code/camera-relocalisation/7Scenes"
gamma : 0.001
image_size : 224
epoch_number : 1
weights : "/data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7"
train_batch_size : 64
validation_dataset_size : 10402
max_epoch : 250
dataset_name : "7-Scenes"
nGPU : 1
momentum : 0.9
logs : "./logs/7scenes.log"
beta : 1
manualSeed : 333
learning_rate : 0.1
beta2 : 0.999
model_zoo_path : "./pretrained_models"
precomputed_data_path : "./data"
results_filename : "./results/7scenes_res.bin"
snapshot_dir : "./snapshots"
GPU : 1
weight_decay : 1e-05
power : 0.5
training_dataset_size : 39999
}
this is a test for load_training_data
==> Training GT labels have been loaded successfully
==> Validation GT labels have been loaded successfully
==> loading model from pretained weights from file: /data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7
==> configuring optimizer
==> number of batches: 624
==> learning rate: 0.1
==> Number of parameters in the model: 22350215
==> online epoch # 1 [batchSize = 64]
==> time taken to randomize input training data: 2.7921199798584 ms
/torch/install/bin/luajit: /torch/install/share/lua/5.1/nn/Container.lua:67: ...........] ETA: 0ms | Step: 0ms
In 1 module of nn.Sequential:
In 1 module of nn.ParallelTable:
In 2 module of nn.Sequential:
/torch/install/share/lua/5.1/nn/THNN.lua:110: input_ and gradOutput_ shapes do not match: input_ [2 x 64 x 112 x 112], gradOutput_ [64 x 64 x 112 x 112] at /torch/extra/cunn/lib/THCUNN/generic/BatchNormalization.cu:74
stack traceback:
[C]: in function 'v'
/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_backward'
/torch/install/share/lua/5.1/nn/BatchNormalization.lua:154: in function </torch/install/share/lua/5.1/nn/BatchNormalization.lua:140>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:70: in function </torch/install/share/lua/5.1/nn/Sequential.lua:63>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/ParallelTable.lua:27: in function 'accGradParameters'
/torch/install/share/lua/5.1/nn/Module.lua:32: in function </torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
and i think the problem in local in here:
for t,v in ipairs(indices) do
xlua.progress(t, #indices)
local mini_batch_info = make_training_minibatch(v)
local mini_batch_data = mini_batch_info.data:cuda()
local orientation_gt = mini_batch_info.quaternion_labels:cuda()
local translation_gt = mini_batch_info.translation_labels:cuda()
cutorch.synchronize()
collectgarbage()
feval = function(x)
if x ~= parameters then parameters:copy(x) end
model:zeroGradParameters()
local outputs = model:forward({mini_batch_data[{{}, 1, {}, {}, {}}], mini_batch_data[{{}, 2, {}, {}, {}}]})
local err = criterion:forward(outputs, {translation_gt, orientation_gt})
meter_train_t:add(criterion.weights[1] * criterion.criterions[1].output)
meter_train_q:add(criterion.weights[2] * criterion.criterions[2].output)
local df_do = criterion:backward(outputs, {translation_gt, orientation_gt})
model:backward(mini_batch_data, df_do)
return err, gradParameters
end
optim.adam(feval, parameters, optimState)
============================================
especial when i note optim.adam(feval, parameters, optimState) ,the training can work well.
i don't know what's going on,could you please help me ?
THANKS ADVANCED!