Skip to content

Update openenv examples to use environment_factory#5235

Draft
sergiopaniego wants to merge 4 commits intomainfrom
update-openenv-examples
Draft

Update openenv examples to use environment_factory#5235
sergiopaniego wants to merge 4 commits intomainfrom
update-openenv-examples

Conversation

@sergiopaniego
Copy link
Member

What does this PR do?

TODO:

  • Migrate notebooks
  • Update TRL-OpenEnv guide
  • Add multi-env example

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

sergiopaniego and others added 4 commits March 4, 2026 10:12
- catch.py: Format observations as readable text, normalize reward to 0-1,
  handle incomplete episodes
- echo.py: Rename step->echo and MyEchoEnv->EchoToolEnv, wrap in main()
- wordle.py: Normalize reward to 0-1, add RichProgressCallback
- sudoku.py: Fix cumulative message handling (diff-based), add board
  validation for move validity, add progress/hints/tried-moves to responses,
  add LoRA support, tune defaults for memory efficiency
- vllm_generation.py: Add </tool_call> stop token for tool calling loop
- grpo_trainer.py: Skip tool calls for environments that are done

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant