[Feature] PILCO by PSXBRosa · Pull Request #3537 · pytorch/rl

PSXBRosa · 2026-02-27T18:56:23Z

Description

This PR introduces the implementation of the PILCO (Probabilistic Inference for Learning Control) algorithm to TorchRL.

Key details of the implementation:

Gaussian Process Regression: I utilized the external libraries botorch and gpytorch for the GPR components. this avoids the overhead and complexity of maintaining a custom GPR implementation.
Moment Matching: I initially considered and experimented with a Monte Carlo approach for moment matching to simplify the underlying mathematics. While I couldn't get the MC approach to stabilize and work correctly (though it remains an interesting alternative), the current working implementation relies on the analytical forms for moment matching. This aligns directly with what was done in the original PILCO paper by Deisenroth and Rasmussen.
Credits: I want to give a huge thanks and credit to @nrontsis. I used the code from their repository (https://github.com/nrontsis/PILCO) as a valuable implementation reference and to test/validate different parts of my own PyTorch adaptation.

Motivation and Context

PILCO is a highly sample-efficient model-based reinforcement learning algorithm, making it a valuable addition to the library's algorithm suite.

close #3513

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2026-02-27T18:56:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3537

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Cancelled Jobs

As of commit 5e2e9a6 with merge base 4d2c3cb ():

NEW FAILURE - The following job has failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration

CANCELLED JOBS - The following jobs were cancelled. Please retry:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Libs Tests on Linux / unittests-gym (3.10, 12.8) / linux-job (gh)
Libs Tests on Linux / unittests-sklearn (3.10, 12.8) / linux-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-27T18:56:34Z

⚠️ PR Title Label Error

Unknown or invalid prefix [Algorithm].

Current title: [Algorithm] PILCO

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

vmoens

Excellent first attempt!

Let's try to move most of this to core!

torchrl/modules/models/
    gp.py                    # BoTorchGPWorldModel (renamed: GPWorldModel?)
    rbf_controller.py        # RBFController

torchrl/objectives/
    pilco.py                 # SaturatingCost (the generic cost module)

sota-implementations/pilco/
    pilco.py                 # Training loop (stays here)
    utils.py                 # make_env, pendulum_cost (thin wrappers, stays here)
    config.yaml              # Config (stays here)

Missing tests:

Unit tests for RBFController moment matching (forward pass, squash_sin)
Unit tests for BoTorchGPWorldModel (fit, deterministic_forward, uncertain_forward)
At minimum a smoke test for the full PILCO loop (see workflow in sota-implementations CI workflow)
Numerical validation against the reference implementation (the author credits nrontsis/PILCO) if possible - ok if not

There are no docs. No docstrings on any class or method beyond the one-line pendulum_cost docstring. For core components, all public methods need proper docstrings with shapes documented (especially the moment matching formulas which are dense linear algebra). Docs must be linked in docs/source/reference/...

Avoid single letter variables unless they're indices (for in in range(...)) which are heavily used throughout the moment matching code (m, s, c, B, D, L, U, Q, t, z). These follow the paper notation, which is fine, but in core they should have comments referencing which equation in the paper each block corresponds to.

policy_for_env closure (pilco.py lines 166-200) -- this is an ad-hoc bridge between the Gaussian policy interface and a standard env that expects deterministic actions. In core, this should be a proper transform or wrapper (e.g., MeanActionSelector or similar) rather than a closure rebuilt every epoch.

vmoens · 2026-03-01T18:08:04Z