The function get_action (for the DRL solvers) returns rand(distribution, policy.solver.action_size), but since distribution is already multivariate, this allocates and fills a square matrix with samples (of which only the first is relevant / used) -- I think the intended return value is rand(distribution). I can submit a pull request if that's preferred.