Skip to content

Commit 51f6138

Browse files
committed
temp-save,julia working
1 parent 3a5d829 commit 51f6138

14 files changed

+1235
-60
lines changed

apps/openenv/FIX_DOCUMENTATION.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Fix for ModuleNotFoundError: No module named 'julia_utils'
2+
3+
## Problem
4+
5+
The application was crashing with the following error when using Monarch actors:
6+
7+
```
8+
ModuleNotFoundError: No module named 'julia_utils'
9+
```
10+
11+
This error occurred when remote Monarch actors tried to unpickle function references that were loaded from the `julia_utils` module.
12+
13+
## Root Cause
14+
15+
The issue happened because:
16+
17+
1. The main process loads functions from `julia_utils` using `load_function_from_string()`
18+
2. These functions are passed as parameters to actor classes (`GenericDatasetActor`, `GenericRewardActor`)
19+
3. When actors are spawned as remote actors, the function objects are pickled and sent to remote processes
20+
4. During unpickling, Python needs to import the `julia_utils` module
21+
5. **The openenv directory wasn't in `sys.path` yet** because:
22+
- The unpickling happens during actor initialization (when deserializing constructor parameters)
23+
- The `setup()` endpoint runs AFTER actor initialization
24+
- Therefore, `sys.path` wasn't modified before unpickling occurred
25+
26+
## Solution
27+
28+
Added module-level code to `/home/kaiwu/work/kaiwu/forge/apps/openenv/main.py` that adds the openenv directory to `sys.path` BEFORE any actor definitions:
29+
30+
```python
31+
# CRITICAL: Add openenv directory to sys.path at module level
32+
# This ensures that when remote actors unpickle function references (e.g., julia_utils functions),
33+
# the module can be imported successfully. This must happen BEFORE any actor definitions.
34+
_openenv_dir = Path(__file__).parent
35+
if str(_openenv_dir) not in sys.path:
36+
sys.path.insert(0, str(_openenv_dir))
37+
```
38+
39+
This code runs when the module is first imported, ensuring that:
40+
- Remote actors that import `main.py` will have the openenv directory in their `sys.path`
41+
- Functions from `julia_utils` can be successfully unpickled in remote processes
42+
- The fix happens early enough to prevent the ModuleNotFoundError
43+
44+
## Testing
45+
46+
Created comprehensive tests to verify the fix:
47+
48+
1. **test_module_import.py** - Tests basic import and pickling functionality
49+
2. **test_monarch_actor_simulation.py** - Simulates the exact Monarch actor scenario where a remote process receives pickled functions
50+
51+
Both test suites pass successfully, confirming that:
52+
- `julia_utils` can be imported after importing `main.py`
53+
- Functions from `julia_utils` can be pickled and unpickled across process boundaries
54+
- Remote actors can successfully deserialize function references
55+
56+
## Files Modified
57+
58+
- `/home/kaiwu/work/kaiwu/forge/apps/openenv/main.py` - Added module-level sys.path setup
59+
60+
## Files Added
61+
62+
- `/home/kaiwu/work/kaiwu/forge/apps/openenv/test_module_import.py` - Basic import/pickle tests
63+
- `/home/kaiwu/work/kaiwu/forge/apps/openenv/test_monarch_actor_simulation.py` - Comprehensive simulation tests
64+
65+
## Verification
66+
67+
Run tests to verify the fix:
68+
```bash
69+
cd /home/kaiwu/work/kaiwu/forge/apps/openenv
70+
python test_module_import.py
71+
python test_monarch_actor_simulation.py
72+
```
73+
74+
Both should show "✓ All tests passed!"

apps/openenv/julia_utils.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
import re
1313
from typing import Dict, Any
1414

15-
from envs.julia_env import JuliaAction
1615
from forge.observability.metrics import record_metric, Reduce
1716

1817

@@ -75,7 +74,7 @@ def build_julia_prompt(sample: Dict[str, Any], tokenizer) -> str:
7574
return formatted_request
7675

7776

78-
def build_julia_action(response: str, sample: Dict[str, Any]) -> JuliaAction:
77+
def build_julia_action(response: str, sample: Dict[str, Any]):
7978
"""
8079
Build JuliaAction from model response and dataset sample.
8180
@@ -86,6 +85,12 @@ def build_julia_action(response: str, sample: Dict[str, Any]) -> JuliaAction:
8685
Returns:
8786
JuliaAction instance with core code and test code
8887
"""
88+
# Import AutoAction dynamically to avoid pickle issues
89+
from envs import AutoAction
90+
91+
# Get JuliaAction class dynamically
92+
JuliaAction = AutoAction.from_env("julia")
93+
8994
# Extract code from markdown if present
9095
code = extract_julia_code(response)
9196

apps/openenv/llama3_8b_coding.yaml

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ rollout_threads: 1
1616
# Task-specific configuration
1717
task:
1818
env_name: "coding" # Used to load CodingEnv and CodingAction via AutoEnv
19-
build_action: !function python_utils.build_python_action
20-
evaluate_response: !function python_utils.evaluate_python_response
21-
transform_sample: !function python_utils.transform_python_sample
19+
build_action: !function apps.openenv.python_utils.build_python_action
20+
evaluate_response: !function apps.openenv.python_utils.evaluate_python_response
21+
transform_sample: !function apps.openenv.python_utils.transform_python_sample
2222

2323
# Observability configuration
2424
metric_logging:
@@ -31,13 +31,19 @@ metric_logging:
3131
log_per_rank: True
3232

3333
# Dataset configuration
34+
#dataset:
35+
# path: "TIGER-Lab/AceCode-87K"
36+
# revision: "main"
37+
# data_split: "train"
38+
# streaming: true
39+
# model: ${model}
3440
dataset:
35-
path: "openai/humaneval" # HumanEval dataset from HuggingFace
41+
path: "/home/kaiwu/work/kaiwu/AceCoder/train/train_rl/OpenRLHF/scripts/data/acecode_89k/acecode_hard02.json"
42+
#path: "/home/kaiwu/work/kaiwu/AceCoder/train/train_rl/OpenRLHF/data/acecode_87K/acecode_87K.json"
3643
revision: "main"
37-
data_split: "test"
44+
data_split: "train"
3845
streaming: false
3946
model: ${model}
40-
4147
# OpenEnv configuration for GenericOpenEnvActor
4248
openenv_config:
4349
docker_image: "coding-env:latest"
@@ -80,10 +86,10 @@ trainer:
8086
lr_scheduler:
8187
warmup_steps: 0
8288
training:
83-
local_batch_size: ${batch_size}
89+
local_batch_size: ${multiply:${batch_size},${group_size}}
8490
seq_len: ${sum:${max_req_tokens},${max_res_tokens}}
8591
max_norm: 1.0
86-
steps: 2000
92+
steps: 100
8793
dtype: bfloat16
8894
gc_freq: 1
8995
compile:

apps/openenv/llama3_8b_julia.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ rollout_threads: 1
1616
# Task-specific configuration
1717
task:
1818
env_name: "julia" # Used to load JuliaEnv and JuliaAction via AutoEnv
19-
build_action: !function julia_utils.build_julia_action
20-
evaluate_response: !function julia_utils.evaluate_julia_response
21-
transform_sample: !function julia_utils.transform_julia_sample
19+
build_action: !function apps.openenv.julia_utils.build_julia_action
20+
evaluate_response: !function apps.openenv.julia_utils.evaluate_julia_response
21+
transform_sample: !function apps.openenv.julia_utils.transform_julia_sample
2222

2323
# Observability configuration
2424
metric_logging:
@@ -81,10 +81,10 @@ trainer:
8181
lr_scheduler:
8282
warmup_steps: 0
8383
training:
84-
local_batch_size: ${batch_size}
84+
local_batch_size: ${multiply:${batch_size},${group_size}}
8585
seq_len: ${sum:${max_req_tokens},${max_res_tokens}}
8686
max_norm: 1.0
87-
steps: 3000
87+
steps: 100
8888
dtype: bfloat16
8989
gc_freq: 1
9090
compile:

0 commit comments

Comments
 (0)