GenEnv is a novel co-training framework that simultaneously trains an Agent LLM and an Environment LLM. The key insight is that the Environment LLM learns to generate training tasks at the boundary of the Agent's capabilityβneither too easy nor too hardβcreating an adaptive curriculum that maximizes learning efficiency.
- π Co-Training Loop: Agent and Environment LLMs are trained alternately, each improving the other
- π Adaptive Curriculum: Environment generates tasks calibrated to the Agent's current skill level
- π― Boundary Learning: Focus on tasks where the Agent has ~50% success rate for maximum gradient signal
- β‘ Built on veRL: Leverages the efficient veRL framework for distributed GRPO training
# Clone the repository
git clone https://github.com/Gen-Verse/GenEnv.git
cd GenEnv
# Install dependencies
pip install -r requirements.txtGenEnv is built on top of veRL. Please follow veRL's installation instructions first.
This codebase provides the training framework for GenEnv. To use it for your specific task, you need to customize:
-
Reward Function (
genenv/utils/reward_functions.py)- Replace
RewardManager.compute_reward()with your domain-specific reward logic - Examples provided for math reasoning, tool calling, and action-based tasks
- Replace
-
Environment Prompt Template (
genenv/trainer/genenv_trainer.py)- Modify
_generate_new_tasks()to customize how the Env LLM generates new tasks - Adjust the prompt template based on your task format
- Modify
-
Task Parsing (
genenv/trainer/genenv_trainer.py)- Update the parsing logic in
_generate_new_tasks()to extract tasks from Env LLM outputs
- Update the parsing logic in
-
Initial Training Data (
configs/genenv_config.yaml)- Prepare your training data in parquet format with prompts and ground truth answers
Edit configs/genenv_config.yaml:
# Key paths to customize
env_model_path: /path/to/your/env/model # Environment LLM
actor_rollout_ref.model.path: /path/to/agent # Agent LLM
data.train_files: /path/to/train.parquet # Training data
data.val_files: /path/to/val.parquet # Validation data
trainer.default_local_dir: /path/to/checkpoints
# GenEnv specific parameters
genenv:
enable: True
filtering_k: 0.1 # Filter top/bottom 10% of prompts
num_generations_per_prompt: 4# Using the provided script
bash scripts/run_genenv.sh --model /path/to/model --env-model /path/to/env/model
# Or directly with Python
python -m genenv.train \
genenv.enable=True \
env_model_path=/path/to/env/model \
actor_rollout_ref.model.path=/path/to/agent \
data.train_files=/path/to/train.parquet \
data.val_files=/path/to/val.parquetGenEnv/
βββ genenv/
β βββ __init__.py
β βββ train.py # Main training entry point
β βββ trainer/
β β βββ __init__.py
β β βββ genenv_trainer.py # Core GenEnv training loop
β βββ utils/
β βββ __init__.py
β βββ reward_functions.py # Reward function implementations
βββ configs/
β βββ genenv_config.yaml # Training configuration
βββ scripts/
β βββ run_genenv.sh # Training launch script
βββ requirements.txt
βββ README.md
def compute_reward(self, generated_text: str, ground_truth: Any) -> float:
pred_answer = self._extract_boxed_answer(generated_text)
gold_answer = self._get_gold_answer(ground_truth)
return 1.0 if pred_answer == gold_answer else 0.0from genenv.utils import ToolCallingRewardManager
reward_fn = ToolCallingRewardManager(tokenizer=tokenizer)
# Checks if <tool_call>{"name": ..., "parameters": ...}</tool_call> matches ground truthclass MyRewardManager(RewardManager):
def compute_reward(self, generated_text: str, ground_truth: Any) -> float:
# Your custom reward logic here
return scoreYour training data should be in parquet format with at least these columns:
| Column | Description |
|---|---|
prompt |
The task prompt (can be string or list of chat messages) |
reward_model |
Dict containing {"ground_truth": <answer>} |
Example:
import pandas as pd
data = [
{
"prompt": [{"role": "user", "content": "What is 2 + 2?"}],
"reward_model": {"ground_truth": "4"}
},
# ... more examples
]
pd.DataFrame(data).to_parquet("train.parquet")This project is built upon the excellent work of:
We thank the authors for making their code publicly available.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you find GenEnv useful for your research, please consider citing:
@misc{guo2025genenvdifficultyalignedcoevolutionllm,
title={GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators},
author={Jiacheng Guo and Ling Yang and Peter Chen and Qixin Xiao and Yinjie Wang and Xinzhe Juan and Jiahao Qiu and Ke Shen and Mengdi Wang},
year={2025},
eprint={2512.19682},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.19682},
}Princeton AI Lab | Gen-Verse
