Check if the bitmaps for the same number of nodes but for different instances give different outputs
Give the neural net some past values and see if it predicts correctly
relu vs sigmoid in the first layer
Modify the distiller class to handle different sets of training data for teacher and student
Only the n_truth has to be replaced in teacher's predictions
Train student (onlive vs offline)
Try regularization techniques
Fitting to training data aggresively in online scenario
Documentation + cleaning up repo