[draft] RLM Environment for SkyRL Environments #850

devpatelio · 2026-01-07T03:16:10Z

Working draft and design for RLM (see blog) for SkyRL Gym.

REPL execution: Executes Python code from <repl> blocks in model responses
Recursive LLM calls: Supports llm_query() for nested LLM calls from code
Context management: Loads and exposes context data (strings, dicts, lists) to the REPL
Multi-turn loop: Iterates up to max_turns until a final answer is found
Final answer extraction: Detects FINAL(...) or FINAL_VAR(name) patterns

Reference Code: rlm/rlm repo

devpatelio · 2026-01-07T03:18:50Z

/gemini review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devpatelio · 2026-01-07T03:36:37Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a promising RLM (Recursive Language Model) environment for SkyRL Gym, featuring a sandboxed REPL for executing Python code from model responses and supporting recursive LLM calls. The implementation is well-structured across new files for the environment logic, utilities, and testing. My review focuses on enhancing security, correctness, and maintainability. I've identified a critical sandbox escape vulnerability in the REPL execution that needs immediate attention. Additionally, I've provided suggestions to improve correctness by handling potential null values and unsafe configuration access, and to increase maintainability by adhering to Python best practices for imports and resource management. Overall, this is a strong initial draft, and addressing these points will significantly improve its robustness and security.

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+            combined = {**self.globals_dict, **self.locals_dict}
+            exec(code, combined, combined)
+
+            for key, value in combined.items():
+                if key not in self.globals_dict and not key.startswith("_"):
+                    self.locals_dict[key] = value


There is a critical security vulnerability here. The exec function is given a combined dictionary for both globals and locals which contains a reference to self.globals_dict['__builtins__']. Malicious code can modify this __builtins__ dictionary in-place (e.g., __builtins__['open'] = some_function), bypassing the sandbox for subsequent code executions within the same step. To fix this, you should provide exec with a copy of the globals, especially the mutable __builtins__ dictionary, to prevent such modifications.

exec_globals = self.globals_dict.copy() exec_globals["__builtins__"] = self.globals_dict["__builtins__"].copy() combined = {**exec_globals, **self.locals_dict} exec(code, combined, combined) for key, value in combined.items(): if key not in exec_globals and not key.startswith("_"): self.locals_dict[key] = value

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+        openai_api_key = env_cfg.openai_api_key or os.getenv("OPENAI_API_KEY")
+        if openai_api_key is None:
+            raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)")
+
+        base_url = env_cfg.base_url
+        model = env_cfg.model
+        init_prompt = env_cfg.init_prompt
+        if not base_url or not model or not init_prompt:
+            raise ValueError("env_cfg must include base_url, model, and init_prompt")


Accessing keys from an omegaconf.DictConfig using attribute access (e.g., env_cfg.openai_api_key) will raise an exception if the key is not present, which can lead to unexpected crashes. It is safer to use the .get() method, which allows you to handle cases where a key might be missing.

Suggested change

openai_api_key = env_cfg.openai_api_key or os.getenv("OPENAI_API_KEY")

if openai_api_key is None:

raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)")

base_url = env_cfg.base_url

model = env_cfg.model

init_prompt = env_cfg.init_prompt

if not base_url or not model or not init_prompt:

raise ValueError("env_cfg must include base_url, model, and init_prompt")

openai_api_key = env_cfg.get("openai_api_key") or os.getenv("OPENAI_API_KEY")

if openai_api_key is None:

raise ValueError("`OPENAI_API_KEY` must be set (as parameter, in env_cfg, or as environment variable)")

base_url = env_cfg.get("base_url")

model = env_cfg.get("model")

init_prompt = env_cfg.get("init_prompt")

if not base_url or not model or not init_prompt:

raise ValueError("env_cfg must include base_url, model, and init_prompt")

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+        return (
+            self.lm_client.chat.completions.create(
+                model=used_model,
+                messages=msgs,
+            )
+            .choices[0]
+            .message.content
+        )


The llm_query function is type-hinted to return a str, but it can return None if the OpenAI API response has message.content as None. This violates the type hint and can cause errors in the code executed in the REPL. You should ensure it always returns a string, for example by using or "".

response = self.lm_client.chat.completions.create( model=used_model, messages=msgs, ) return response.choices[0].message.content or ""

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/utils.py

+    "compile": None,
+    "globals": None,
+    "locals": None,
+    "open": None,


The open function is disabled in SAFE_BUILTINS for security, which is a good practice for a REPL. However, the RLMEnvironment.load_context method in env.py creates a file and exposes its path via the context_path variable in the REPL's scope. This creates a design contradiction: a file path is provided, but there's no way to use it from within the sandboxed code. If file access is not intended, consider removing the context_path variable to avoid confusion. If it is intended, a custom, sandboxed file-access utility should be provided instead of the disabled open.

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+        elif isinstance(ctx, dict):
+            total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000))
+            preview = [len(ctx)]


The preview generated for a dictionary in _context_metadata_prompt is just its length, which is inconsistent with the more detailed previews for strings and lists (which show lengths of first few items). For better introspection and consistency, consider providing a preview of the lengths of the first few items in the dictionary, similar to how it's done for lists.

Suggested change

elif isinstance(ctx, dict):

total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000))

preview = [len(ctx)]

elif isinstance(ctx, dict):

total = sum(len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 1000))

preview = [len(str(k)) + len(str(v)) for k, v in itertools.islice(ctx.items(), 10)]

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+                stdout = execution_result.get("stdout", "") or ""
+                stderr = execution_result.get("stderr", "") or ""


The or "" is redundant in these lines. execution_result.get("stdout", "") already returns an empty string if the key is missing or its value is None. The _execute_code method also ensures that stdout and stderr are always strings.

stdout = execution_result.get("stdout", "") stderr = execution_result.get("stderr", "")

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+        import io
+        import sys
+        import time


Imports should generally be at the top of the file, as recommended by PEP 8. Moving io, sys, and time imports to the top level will improve code readability and consistency.

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/env.py

+    def __del__(self):
+        self.close()


Using __del__ for resource cleanup (like removing the temporary directory) is unreliable in Python. It's not guaranteed to be called, and the environment at the time of its execution can be unpredictable. Since you have correctly implemented a context manager (__enter__/__exit__), you should rely on that for cleanup. Relying on __del__ as a fallback is risky and should be avoided.

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/test.py

+@ray.remote(num_cpus=1)
+def run_simple_test(prompt: str):
+
+    import dotenv


The import dotenv statement is inside a function. According to PEP 8, imports should be at the top of the file. This improves readability and makes dependencies clear. Please move this import to the top level of the module.

gemini-code-assist · 2026-01-07T03:38:53Z

skyrl-gym/skyrl_gym/envs/rlm/utils.py

+}
+
+
+def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0):


The parameter names globals and locals shadow Python's built-in functions. It's a best practice to avoid this to prevent confusion and potential bugs. Consider renaming them by appending an underscore (e.g., globals_, locals_) as recommended by PEP 8.

Suggested change

def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0):

def safe_import(name: str, globals_=None, locals_=None, fromlist=(), level: int = 0):

Dev Patel added 3 commits January 7, 2026 03:09

rlm env draft

2838b49

RLM v0 implementation for SkyRL Env

dca6083

linter

f550ae5

This comment was marked as outdated.

Sign in to view

devpatelio and others added 4 commits January 6, 2026 22:24

Update skyrl-gym/skyrl_gym/envs/rlm/utils.py

b64c48d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update skyrl-gym/skyrl_gym/envs/rlm/test.py

77d871f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update skyrl-gym/skyrl_gym/envs/rlm/test.py

5de1378

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

x

a7fa019

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

		stdout = execution_result.get("stdout", "") or ""
		stderr = execution_result.get("stderr", "") or ""

		}


		def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0):

	def safe_import(name: str, globals=None, locals=None, fromlist=(), level: int = 0):
	def safe_import(name: str, globals_=None, locals_=None, fromlist=(), level: int = 0):

[draft] RLM Environment for SkyRL Environments #850

Are you sure you want to change the base?

[draft] RLM Environment for SkyRL Environments #850

Conversation

devpatelio commented Jan 7, 2026

Uh oh!

devpatelio commented Jan 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

devpatelio commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant