Adam optimizer

This package is really useful as learning rate updaters. I'm using a variant of the Adam scheme here for SGD.

I think it is unnecessary to have \rho_i^t as vectors. Shouldn't these be Float64's?
Also, pedantic, I'm not sure why they are called \rho instead of \beta.
https://github.com/JuliaML/StochasticOptimization.jl/blob/master/src/paramupdaters.jl#L123-L124