Skip to content

Commit 3a5d829

Browse files
committed
add general openenv recipe
1 parent 17cd39b commit 3a5d829

File tree

7 files changed

+1835
-0
lines changed

7 files changed

+1835
-0
lines changed
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# OpenEnv Implementation Summary
2+
3+
## 🎉 Successfully Created Centralized OpenEnv Framework
4+
5+
### Created Files
6+
7+
```
8+
/home/kaiwu/work/kaiwu/forge/apps/openenv/
9+
├── main.py # Generic main script for all tasks
10+
├── julia_utils.py # Julia-specific utilities
11+
├── python_utils.py # Python/coding utilities
12+
├── llama3_8b_julia.yaml # Julia training configuration
13+
├── llama3_8b_coding.yaml # Python coding configuration
14+
└── README.md # Comprehensive documentation
15+
```
16+
17+
## 📝 Key Design Features
18+
19+
### 1. Single Centralized Folder
20+
- All OpenEnv-related code in one place: `/home/kaiwu/work/kaiwu/forge/apps/openenv/`
21+
- No scattered task folders needed
22+
- Easy to maintain and extend
23+
24+
### 2. Language-Specific Utils
25+
- `julia_utils.py` - All Julia task logic
26+
- `python_utils.py` - All Python task logic
27+
- Add more by creating `<lang>_utils.py` files
28+
29+
### 3. YAML Configuration with !function References
30+
31+
```yaml
32+
task:
33+
env_name: "julia"
34+
build_action: !function julia_utils.build_julia_action
35+
evaluate_response: !function julia_utils.evaluate_julia_response
36+
transform_sample: !function julia_utils.transform_julia_sample
37+
```
38+
39+
### 4. Generic Main Script
40+
- Single `main.py` that dynamically loads task-specific functions
41+
- No code changes needed when adding new languages
42+
- Works with any OpenEnv environment via AutoEnv
43+
44+
## 🚀 Usage Examples
45+
46+
### Julia Training
47+
```bash
48+
python -m apps.openenv.main --config apps/openenv/llama3_8b_julia.yaml
49+
```
50+
51+
### Python Coding Training
52+
```bash
53+
python -m apps.openenv.main --config apps/openenv/llama3_8b_coding.yaml
54+
```
55+
56+
## 🔧 Adding New Languages
57+
58+
### Example: Adding Rust Support
59+
60+
1. **Create `rust_utils.py`**:
61+
```python
62+
from envs.rust_env import RustAction
63+
64+
def build_rust_action(response: str, sample: dict) -> RustAction:
65+
code = extract_rust_code(response)
66+
return RustAction(code=code, test_code=sample.get("test", ""))
67+
68+
def evaluate_rust_response(result, response: str, sample: dict) -> float:
69+
return 1.0 if result.observation.exit_code == 0 else 0.0
70+
71+
def transform_rust_sample(sample: dict, tokenizer) -> dict | None:
72+
prompt = build_rust_prompt(sample, tokenizer)
73+
return {"request": prompt, "target": sample.get("test", ""), "task_id": sample.get("task_id", "")}
74+
75+
def extract_rust_code(response: str) -> str:
76+
# Extract from markdown...
77+
pass
78+
```
79+
80+
2. **Create YAML config** (`llama3_8b_rust.yaml`):
81+
```yaml
82+
task:
83+
env_name: "rust"
84+
build_action: !function rust_utils.build_rust_action
85+
evaluate_response: !function rust_utils.evaluate_rust_response
86+
transform_sample: !function rust_utils.transform_rust_sample
87+
88+
# ... rest of config
89+
```
90+
91+
3. **Run it**:
92+
```bash
93+
python -m apps.openenv.main --config apps/openenv/llama3_8b_rust.yaml
94+
```
95+
96+
That's it! No changes to main.py needed.
97+
98+
## 📦 Task Utils API
99+
100+
Each `<lang>_utils.py` file should implement:
101+
102+
### Required Functions
103+
104+
1. **`build_<lang>_action(response: str, sample: dict) -> Action`**
105+
- Converts model response to environment action
106+
- Example: `build_julia_action`, `build_python_action`
107+
108+
2. **`evaluate_<lang>_response(result, response: str, sample: dict) -> float`**
109+
- Evaluates execution result and returns reward (0.0 to 1.0)
110+
- Example: `evaluate_julia_response`, `evaluate_python_response`
111+
112+
3. **`transform_<lang>_sample(sample: dict, tokenizer) -> dict | None`**
113+
- Transforms raw dataset sample to training format
114+
- Returns dict with 'request', 'target', 'task_id' or None
115+
- Example: `transform_julia_sample`, `transform_python_sample`
116+
117+
### Optional Helper Functions
118+
119+
- `get_<lang>_system_prompt() -> str`: System prompt for the language
120+
- `build_<lang>_prompt(sample: dict, tokenizer) -> str`: Build formatted prompt
121+
- `extract_<lang>_code(response: str) -> str`: Extract code from markdown
122+
123+
## 🎯 Benefits
124+
125+
1. **Easy to Extend**: Add new languages by creating one utils file and one YAML
126+
2. **No Code Duplication**: Single main.py reused for all tasks
127+
3. **Clear Organization**: Language-specific logic separated into utils files
128+
4. **Simple Configuration**: YAML references make dependencies explicit
129+
5. **AutoEnv Integration**: Automatic environment/action class loading
130+
131+
## 📊 Comparison: Before vs After
132+
133+
### Before (Scattered)
134+
```
135+
apps/
136+
julia-grpo/
137+
main.py (300+ lines)
138+
config.yaml
139+
coding-grpo/
140+
main.py (similar 300+ lines)
141+
config.yaml
142+
# Lots of duplicated code!
143+
```
144+
145+
### After (Centralized)
146+
```
147+
apps/
148+
openenv/
149+
main.py (generic, 600 lines)
150+
julia_utils.py (200 lines)
151+
python_utils.py (150 lines)
152+
llama3_8b_julia.yaml
153+
llama3_8b_coding.yaml
154+
# No duplication, easy to extend!
155+
```
156+
157+
## ✅ Implementation Complete
158+
159+
All files created and validated. Ready to use!
160+
161+
- ✅ Generic main.py with dynamic function loading
162+
- ✅ Julia utils with prompt building, action creation, reward evaluation
163+
- ✅ Python utils for coding tasks
164+
- ✅ YAML configs using !function references
165+
- ✅ Comprehensive README documentation
166+
- ✅ No lint errors (except pre-existing external file issues)

apps/openenv/README.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# OpenEnv - Generic Training Framework
2+
3+
A centralized framework for training language models on any OpenEnv task using GRPO (Grouped Relative Policy Optimization).
4+
5+
## 📁 Folder Structure
6+
7+
```
8+
apps/openenv/
9+
├── main.py # Generic training script
10+
├── julia_utils.py # Julia task utilities
11+
├── python_utils.py # Python/coding task utilities
12+
├── llama3_8b_julia.yaml # Julia training config
13+
└── llama3_8b_coding.yaml # Python coding training config
14+
```
15+
16+
## 🎯 Key Features
17+
18+
- **Single Main Script**: One `main.py` works for all OpenEnv tasks
19+
- **Task-Specific Utils**: Language-specific logic in separate files (e.g., `julia_utils.py`, `python_utils.py`)
20+
- **YAML Configuration**: Use `!function` references to load task-specific functions
21+
- **AutoEnv Integration**: Automatic environment and action class loading
22+
- **Easy Extension**: Add new languages by creating new utils files
23+
24+
## 🚀 Usage
25+
26+
### Run Julia Training
27+
28+
```bash
29+
python -m apps.openenv.main --config apps/openenv/llama3_8b_julia.yaml
30+
```
31+
32+
### Run Python Coding Training
33+
34+
```bash
35+
python -m apps.openenv.main --config apps/openenv/llama3_8b_coding.yaml
36+
```
37+
38+
## 📝 YAML Configuration
39+
40+
Each task config needs:
41+
42+
```yaml
43+
task:
44+
env_name: "julia" # Environment name for AutoEnv
45+
build_action: !function julia_utils.build_julia_action
46+
evaluate_response: !function julia_utils.evaluate_julia_response
47+
transform_sample: !function julia_utils.transform_julia_sample
48+
```
49+
50+
The `!function` tag references functions from the utils files in the same directory.
51+
52+
## 🔧 Adding a New Language
53+
54+
To add support for a new language (e.g., Rust):
55+
56+
### 1. Create Utils File
57+
58+
Create `/home/kaiwu/work/kaiwu/forge/apps/openenv/rust_utils.py`:
59+
60+
```python
61+
from envs.rust_env import RustAction
62+
63+
def build_rust_action(response: str, sample: dict) -> RustAction:
64+
"""Build RustAction from model response."""
65+
code = extract_rust_code(response)
66+
return RustAction(code=code, test_code=sample.get("test", ""))
67+
68+
def evaluate_rust_response(result, response: str, sample: dict) -> float:
69+
"""Evaluate Rust code execution and return reward."""
70+
if result.observation.exit_code == 0:
71+
return 1.0
72+
return 0.0
73+
74+
def transform_rust_sample(sample: dict, tokenizer) -> dict | None:
75+
"""Transform dataset sample for Rust tasks."""
76+
# Build prompt using tokenizer
77+
prompt = build_rust_prompt(sample, tokenizer)
78+
return {
79+
"request": prompt,
80+
"target": sample.get("test", ""),
81+
"task_id": sample.get("task_id", ""),
82+
}
83+
84+
def extract_rust_code(response: str) -> str:
85+
"""Extract Rust code from markdown blocks."""
86+
# Implementation...
87+
pass
88+
```
89+
90+
### 2. Create YAML Config
91+
92+
Create `/home/kaiwu/work/kaiwu/forge/apps/openenv/llama3_8b_rust.yaml`:
93+
94+
```yaml
95+
task:
96+
env_name: "rust"
97+
build_action: !function rust_utils.build_rust_action
98+
evaluate_response: !function rust_utils.evaluate_rust_response
99+
transform_sample: !function rust_utils.transform_rust_sample
100+
101+
dataset:
102+
path: "path/to/rust/dataset"
103+
# ... other dataset config
104+
105+
# ... rest of config (same as other tasks)
106+
```
107+
108+
### 3. Run It
109+
110+
```bash
111+
python -m apps.openenv.main --config apps/openenv/llama3_8b_rust.yaml
112+
```
113+
114+
That's it! No changes to `main.py` needed.
115+
116+
## 📋 Task Utils API
117+
118+
Each task utils file should implement these functions:
119+
120+
### Required Functions
121+
122+
1. **`build_<lang>_action(response: str, sample: dict) -> Action`**
123+
- Builds environment action from model response
124+
- Example: `build_julia_action`, `build_python_action`
125+
126+
2. **`evaluate_<lang>_response(result, response: str, sample: dict) -> float`**
127+
- Evaluates execution result and returns reward (0.0 to 1.0)
128+
- Example: `evaluate_julia_response`, `evaluate_python_response`
129+
130+
3. **`transform_<lang>_sample(sample: dict, tokenizer) -> dict | None`**
131+
- Transforms raw dataset sample into training format
132+
- Returns dict with 'request', 'target', 'task_id' or None if invalid
133+
- Example: `transform_julia_sample`, `transform_python_sample`
134+
135+
### Optional Helper Functions
136+
137+
- **`get_<lang>_system_prompt() -> str`**: Get system prompt for the language
138+
- **`build_<lang>_prompt(sample: dict, tokenizer) -> str`**: Build formatted prompt
139+
- **`extract_<lang>_code(response: str) -> str`**: Extract code from markdown
140+
141+
## 🔍 How It Works
142+
143+
1. **Configuration Loading**: YAML config is loaded with `!function` references
144+
2. **Function Loading**: `main.py` dynamically loads functions from utils files
145+
3. **Environment Setup**: AutoEnv automatically loads correct env/action classes
146+
4. **Training Loop**: Generic GRPO loop uses task-specific functions for:
147+
- Dataset transformation
148+
- Action building
149+
- Reward evaluation
150+
151+
## 📊 Dataset Format
152+
153+
Each transformed sample should have:
154+
155+
```python
156+
{
157+
"request": str, # Formatted prompt for model
158+
"target": str, # Test code or target data
159+
"task_id": str, # Unique task identifier
160+
}
161+
```
162+
163+
## 🎓 Examples
164+
165+
### Julia Utils
166+
167+
- System prompt with strict formatting rules
168+
- Extract code from markdown blocks
169+
- Dense reward based on test pass rate
170+
- Handles Julia-specific syntax requirements
171+
172+
### Python Utils
173+
174+
- Simple system prompt for Python coding
175+
- Binary/proportional reward structure
176+
- Extracts code from markdown blocks
177+
- Works with HumanEval dataset format
178+
179+
## 🔗 Integration with OpenEnv
180+
181+
This framework uses OpenEnv's AutoEnv feature:
182+
183+
```python
184+
from envs import AutoEnv, AutoAction
185+
186+
env_class = AutoEnv.from_name("julia") # Loads JuliaEnv
187+
action_class = AutoAction.from_env("julia") # Loads JuliaAction
188+
```
189+
190+
Make sure your environment is registered in OpenEnv's registry.
191+
192+
## 🐛 Debugging
193+
194+
Enable debug logging in main.py to see:
195+
- Function loading
196+
- Environment setup
197+
- Reward calculation
198+
- Code extraction
199+
200+
Set log level in YAML:
201+
```yaml
202+
metric_logging:
203+
console:
204+
log_per_rank: True
205+
```
206+
207+
## 📚 References
208+
209+
- **GRPO Algorithm**: Grouped Relative Policy Optimization
210+
- **OpenEnv**: Generic environment framework
211+
- **AutoEnv**: Automatic environment detection
212+
- **GenericOpenEnvActor**: Docker-based environment execution

0 commit comments

Comments
 (0)