[Feature] Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy by ParamThakkar123 · Pull Request #3449 · pytorch/rl

ParamThakkar123 · 2026-02-05T04:51:18Z

Description

Describe your changes in detail.

Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy to mcts policies.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

This issue doesn't close but is a subtask of #2357

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

This reverts commit 1f6f327.

pytorch-bot · 2026-02-05T04:51:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3449

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Cancelled Job

As of commit 77868ed with merge base ab49b59 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Lint / python-source-and-configs / linux-job (gh)
test/test_mcts.py:21:1: E302 expected 2 blank lines, found 1
Tutorials Tests on Linux / tests (3.12, 12.4) / linux-job (gh)
RuntimeError: Command docker exec -t 7392bf4551d64c9415d10580a71db1084dc9a04d3defea34ffe5ecf8b06e4edd /exec failed with exit code 1
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh)
test/test_custom_envs.py::TestPendulum::test_pendulum_env[device1]
Unit-tests on Linux / tests-optdeps (3.12, 13.0) / linux-job (gh)
test/test_distributions.py::TestDelta::test_tanhdelta_inv_ones[device0]

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-05T04:51:27Z

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

ParamThakkar123 · 2026-02-05T06:10:48Z

1 test is currently failing. Working on fixing that

ParamThakkar123 · 2026-02-05T09:51:28Z

All tests passed locally

…licies

vmoens

Thanks see my comments

torchrl/data/tensor_specs.py

vmoens · 2026-02-08T07:55:28Z

torchrl/modules/mcts/policies.py

+class AlphaStarPolicy(AlphaGoPolicy):
+    """AlphaStar-style MCTS policy with a lower exploration constant.
+
+    This policy is similar to AlphaGo but uses a smaller exploration constant (c=1.0) for potentially
+    more exploitative behavior.
+
+    Args:
+        c (float, optional): Exploration constant. Defaults to 1.0.
+        **kwargs: Additional keyword arguments passed to AlphaGoPolicy.
+    """
+    def __init__(self, *, c: float = 1.0, **kwargs) -> None:
+        super().__init__(c=c, **kwargs)
+
+class MuZeroPolicy(AlphaGoPolicy):
+    """MuZero-style MCTS policy with a specific exploration constant.
+
+    This policy implements the selection from MuZero, using PUCT with c=1.25.
+
+    Args:
+        c (float, optional): Exploration constant. Defaults to 1.25.
+        **kwargs: Additional keyword arguments passed to AlphaGoPolicy.
+    """
+    def __init__(self, *, c: float = 1.25, **kwargs) -> None:
+        super().__init__(c=c, **kwargs)


This is a 3-class hierarchy to set a single float.
Let's consider whether class methods or constants would be more appropriate (e.g., MCTSPolicy.from_alphago(), MCTSPolicy.from_muzero())?

Additionally, the naming is somewhat misleading -- AlphaStar and MuZero use substantially different MCTS variants beyond just exploration constant tuning (e.g., MuZero uses a learned model for tree expansion). I guess users might expect these classes to encapsulate those algorithmic differences.

Oh got it @vmoens I will try to refine these implementations that way

torchrl/modules/mcts/policies.py

vmoens · 2026-02-08T07:59:08Z

test/test_mcts.py

Only 3 test methods for the entire new feature:

No batched input tests -- the policy classes explicitly handle broadcasting and multi-dim masks, but none of this is tested.

No edge case tests -- e.g., all-False mask (should raise ValueError per the code), different score modules (UCB, EXP3).

No test for MCTSPolicyBase abstractness -- verify it can't be instantiated directly.

No test for custom key overrides on the specialized policies.

The existing test changes (torch.tensor() removal from assert_close calls) are an unrelated cleanup and should be a separate PR.

vmoens · 2026-02-08T07:59:22Z

torchrl/modules/mcts/scores.py

 from tensordict.nn import TensorDictModuleBase


+class _ScoreFactory:


Unrelated to this PR?

Yeah this one is related to the scores function PR which gave some errors while executing it with the other policies

ParamThakkar123 added 16 commits January 20, 2026 00:23

Fixed MultiSyncCollector set_seed and split_trajs issue

1f6f327

Merge branch 'main' of https://github.com/pytorch/rl

e2aaf6b

Revert "Fixed MultiSyncCollector set_seed and split_trajs issue"

40642d5

This reverts commit 1f6f327.

Merge branch 'main' of https://github.com/pytorch/rl

efdc89c

Merge branch 'main' of https://github.com/pytorch/rl

628f44b

Merge branch 'main' of https://github.com/pytorch/rl

a476a77

Merge branch 'main' of https://github.com/pytorch/rl

0f565c5

Merge branch 'main' of https://github.com/pytorch/rl

7fb086b

Merge branch 'main' of https://github.com/pytorch/rl

ff72793

Added Support for index_select in TensorSpec

69001ed

Merge branch 'main' of https://github.com/pytorch/rl

4ab13be

rebase

2e8face

Merge branch 'main' of https://github.com/pytorch/rl

56e1529

Merge branch 'main' of https://github.com/pytorch/rl

ba6a19f

Merge branch 'main' of https://github.com/pytorch/rl

8be545b

Added Policies to MCTS

d6f64ea

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2026

github-actions bot added Documentation Improvements or additions to documentation Modules labels Feb 5, 2026

ParamThakkar123 changed the title ~~Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy~~ [Feature] Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy Feb 5, 2026

github-actions bot added the Feature New feature label Feb 5, 2026

Updates

0252382

Merge branch 'main' of https://github.com/pytorch/rl into add/mcts-po…

f8c54e5

…licies

vmoens reviewed Feb 8, 2026

View reviewed changes

Few Edits

77868ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy#3449

[Feature] Added MCTSPolicyBase, MCTSPolicy, AlphaGoPolicy, AlphaStarPolicy, and MuZeroPolicy#3449
ParamThakkar123 wants to merge 19 commits intopytorch:mainfrom
ParamThakkar123:add/mcts-policies

ParamThakkar123 commented Feb 5, 2026

Uh oh!

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

ParamThakkar123 commented Feb 5, 2026

Uh oh!

ParamThakkar123 commented Feb 5, 2026

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

vmoens Feb 8, 2026

Uh oh!

ParamThakkar123 Feb 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vmoens Feb 8, 2026

Uh oh!

vmoens Feb 8, 2026

Uh oh!

ParamThakkar123 Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from tensordict.nn import TensorDictModuleBase


		class _ScoreFactory:

Conversation

ParamThakkar123 commented Feb 5, 2026

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

pytorch-bot bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3449

❌ 5 New Failures, 1 Cancelled Job

Uh oh!

github-actions bot commented Feb 5, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

ParamThakkar123 commented Feb 5, 2026

Uh oh!

ParamThakkar123 commented Feb 5, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vmoens Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

ParamThakkar123 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vmoens Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

vmoens Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

ParamThakkar123 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading