Skip to content

Conversation

kamil-kaczmarek
Copy link
Contributor

Why are these changes needed?

Fix failing release test: learning_tests_multi_agent_cartpole_appo_multi_gpu.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kamil-kaczmarek kamil-kaczmarek self-assigned this Aug 13, 2025
@kamil-kaczmarek kamil-kaczmarek marked this pull request as ready for review August 13, 2025 06:01
@kamil-kaczmarek kamil-kaczmarek requested a review from a team as a code owner August 13, 2025 06:01
Copy link
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR @kamil-kaczmarek! Thanks a lot for working through this complex setup. I left some comments and request another debug iteration to understand better why this happens now and the intended process is somehow interrupted.

@@ -361,6 +363,9 @@ def _sample(
# Try stepping the environment.
results = self._try_env_step(actions_for_env)
if results == ENV_STEP_FAILURE:
logging.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice message. Can we unify all messages patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice single pattern established across RLlib. If you like this direction I can unify for this PR first, then expand and unify component by component. WDYT?

@@ -346,7 +348,7 @@ def _sample(
metrics_prefix_key=(MODULE_TO_ENV_CONNECTOR,),
)
# In case all environments had been terminated `to_module` will be
# empty and no actions are needed b/c we reset all environemnts.
# empty and no actions are needed b/c we reset all environments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this happens now that the to_module is None. Can we debug another round and see where this happens? Then check the autoreset and the connector run (I know this is complex).

What should happen is: env resets automatically; init obs goes through the to_module connector pipeline and produces to_module which can in turn passed through the module and the to_env pipeline to produce an action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me investigate this more. This first happened last Thursday in the release tests. Will look into the code diff.

@sven1977 sven1977 changed the title Fix failing env step in MultiAgentEnvRunner [RLlib] Fix failing env step in MultiAgentEnvRunner. Aug 13, 2025
@ray-gardener ray-gardener bot added rllib RLlib related issues release-test release test labels Aug 14, 2025
Copy link

github-actions bot commented Sep 4, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 4, 2025
@kamil-kaczmarek
Copy link
Contributor Author

kamil-kaczmarek commented Sep 4, 2025

unstale

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-test release test rllib RLlib related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants