-
Notifications
You must be signed in to change notification settings - Fork 214
[draft] RLM Environment for SkyRL Environments #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/gemini review |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a promising RLM (Recursive Language Model) environment for SkyRL Gym, featuring a sandboxed REPL for executing Python code from model responses and supporting recursive LLM calls. The implementation is well-structured across new files for the environment logic, utilities, and testing. My review focuses on enhancing security, correctness, and maintainability. I've identified a critical sandbox escape vulnerability in the REPL execution that needs immediate attention. Additionally, I've provided suggestions to improve correctness by handling potential null values and unsafe configuration access, and to increase maintainability by adhering to Python best practices for imports and resource management. Overall, this is a strong initial draft, and addressing these points will significantly improve its robustness and security.
| combined = {**self.globals_dict, **self.locals_dict} | ||
| exec(code, combined, combined) | ||
|
|
||
| for key, value in combined.items(): | ||
| if key not in self.globals_dict and not key.startswith("_"): | ||
| self.locals_dict[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical security vulnerability here. The exec function is given a combined dictionary for both globals and locals which contains a reference to self.globals_dict['__builtins__']. Malicious code can modify this __builtins__ dictionary in-place (e.g., __builtins__['open'] = some_function), bypassing the sandbox for subsequent code executions within the same step. To fix this, you should provide exec with a copy of the globals, especially the mutable __builtins__ dictionary, to prevent such modifications.
exec_globals = self.globals_dict.copy()
exec_globals["__builtins__"] = self.globals_dict["__builtins__"].copy()
combined = {**exec_globals, **self.locals_dict}
exec(code, combined, combined)
for key, value in combined.items():
if key not in exec_globals and not key.startswith("_"):
self.locals_dict[key] = value| openai_api_key = env_cfg.openai_api_key or os.getenv("OPENAI_API_KEY") | ||
| if openai_api_key is None: | ||
| raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)") | ||
|
|
||
| base_url = env_cfg.base_url | ||
| model = env_cfg.model | ||
| init_prompt = env_cfg.init_prompt | ||
| if not base_url or not model or not init_prompt: | ||
| raise ValueError("env_cfg must include base_url, model, and init_prompt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing keys from an omegaconf.DictConfig using attribute access (e.g., env_cfg.openai_api_key) will raise an exception if the key is not present, which can lead to unexpected crashes. It is safer to use the .get() method, which allows you to handle cases where a key might be missing.
| openai_api_key = env_cfg.openai_api_key or os.getenv("OPENAI_API_KEY") | |
| if openai_api_key is None: | |
| raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)") | |
| base_url = env_cfg.base_url | |
| model = env_cfg.model | |
| init_prompt = env_cfg.init_prompt | |
| if not base_url or not model or not init_prompt: | |
| raise ValueError("env_cfg must include base_url, model, and init_prompt") | |
| openai_api_key = env_cfg.get("openai_api_key") or os.getenv("OPENAI_API_KEY") | |
| if openai_api_key is None: | |
| raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)") | |
| base_url = env_cfg.get("base_url") | |
| model = env_cfg.get("model") | |
| init_prompt = env_cfg.get("init_prompt") | |
| if not base_url or not model or not init_prompt: | |
| raise ValueError("env_cfg must include base_url, model, and init_prompt") |
| return ( | ||
| self.lm_client.chat.completions.create( | ||
| model=used_model, | ||
| messages=msgs, | ||
| ) | ||
| .choices[0] | ||
| .message.content | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The llm_query function is type-hinted to return a str, but it can return None if the OpenAI API response has message.content as None. This violates the type hint and can cause errors in the code executed in the REPL. You should ensure it always returns a string, for example by using or "".
response = self.lm_client.chat.completions.create(
model=used_model,
messages=msgs,
)
return response.choices[0].message.content or ""| "compile": None, | ||
| "globals": None, | ||
| "locals": None, | ||
| "open": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The open function is disabled in SAFE_BUILTINS for security, which is a good practice for a REPL. However, the RLMEnvironment.load_context method in env.py creates a file and exposes its path via the context_path variable in the REPL's scope. This creates a design contradiction: a file path is provided, but there's no way to use it from within the sandboxed code. If file access is not intended, consider removing the context_path variable to avoid confusion. If it is intended, a custom, sandboxed file-access utility should be provided instead of the disabled open.
| elif isinstance(ctx, dict): | ||
| total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000)) | ||
| preview = [len(ctx)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preview generated for a dictionary in _context_metadata_prompt is just its length, which is inconsistent with the more detailed previews for strings and lists (which show lengths of first few items). For better introspection and consistency, consider providing a preview of the lengths of the first few items in the dictionary, similar to how it's done for lists.
| elif isinstance(ctx, dict): | |
| total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000)) | |
| preview = [len(ctx)] | |
| elif isinstance(ctx, dict): | |
| total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000)) | |
| preview = [len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 10)] |
| stdout = execution_result.get("stdout", "") or "" | ||
| stderr = execution_result.get("stderr", "") or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The or "" is redundant in these lines. execution_result.get("stdout", "") already returns an empty string if the key is missing or its value is None. The _execute_code method also ensures that stdout and stderr are always strings.
stdout = execution_result.get("stdout", "")
stderr = execution_result.get("stderr", "")| import io | ||
| import sys | ||
| import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def __del__(self): | ||
| self.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using __del__ for resource cleanup (like removing the temporary directory) is unreliable in Python. It's not guaranteed to be called, and the environment at the time of its execution can be unpredictable. Since you have correctly implemented a context manager (__enter__/__exit__), you should rely on that for cleanup. Relying on __del__ as a fallback is risky and should be avoided.
| @ray.remote(num_cpus=1) | ||
| def run_simple_test(prompt: str): | ||
|
|
||
| import dotenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
|
|
||
| def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter names globals and locals shadow Python's built-in functions. It's a best practice to avoid this to prevent confusion and potential bugs. Consider renaming them by appending an underscore (e.g., globals_, locals_) as recommended by PEP 8.
| def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0): | |
| def safe_import(name: str, globals_=None, locals_=None, fromlist=(), level: int = 0): |
Working draft and design for RLM (see blog) for SkyRL Gym.
<repl>blocks in model responsesllm_query()for nested LLM calls from codemax_turnsuntil a final answer is foundFINAL(...)orFINAL_VAR(name)patternsReference Code: rlm/rlm repo