- 
Transition from the magrittr pipe to the base R pipe. 
- 
To try to help avoiding numeric overflow in the loss functions: - 
Tensors are stored as a 64-bit float instead of 32-bit. 
- 
Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation. 
- 
The results always contain the initial results to use as a fallback if there is overflow during the first epoch. 
- 
brulee_mlp()has two additional parameters,grad_value_clipandgrad_value_clip, that prevent issues.
- 
The warning was changed to "Early stopping occurred at epoch {X} due to numerical overflow of the loss function." 
 
- 
- 
Several new SGD optimizers were added: "ADAMw","Adadelta","Adagrad", and"RMSprop".
- 
Mixture parameter values different than zero cannot be used for several optimizers since they require L2 penalties.