Oversized return value for get_action

The function `get_action` (for the DRL solvers) returns `rand(distribution, policy.solver.action_size)`, but since `distribution` is already multivariate, this allocates and fills a square matrix with samples (of which only the first is relevant / used) -- I think the intended return value is `rand(distribution)`. I can submit a pull request if that's preferred.