Motivation
Implement PILCO (Probabilistic Inference for Learning Control), as requested in #509.
Solution
It uses Gaussian Processes to model dynamics and analytic moment matching to propagate uncertainty, allowing for direct gradient-based policy optimization.
Alternatives
NA
Additional context
Reference: PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Checklist