feat: add Volcengine sandbox support by XChen-Zero · Pull Request #1160 · modelscope/evalscope

XChen-Zero · 2026-01-20T03:57:55Z

Add Volcengine (SandboxFusion) sandbox backend

This PR adds support for Volcengine Code Sandbox (SandboxFusion) as a new EvalScope sandbox backend.

Introduces sandbox_type="volcengine".
Executes code via SandboxFusion’s /run_code HTTP API.
Maps existing tools (python_executor, shell_executor) without changing benchmark logic.
Requires no changes to existing task or benchmark configurations beyond sandbox_type.

This provides a remote, stateless sandbox option and removes the dependency on local Docker or enclave-based sandboxes for code evaluation.

Tested with SandboxFusion Docker deployment on multiple and livecodebench benchmarks.

gemini-code-assist · 2026-01-20T03:58:16Z

Summary of Changes

Hello @XChen-Zero, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Volcengine Code Sandbox (SandboxFusion) into EvalScope, providing a remote and stateless environment for code execution. This significantly reduces the dependency on local Docker or enclave-based sandboxes, offering a more flexible and scalable solution for code evaluation. The changes include a new architecture for managing diverse sandbox types, ensuring compatibility with existing benchmark configurations while expanding the range of available execution environments.

Highlights

New Volcengine Sandbox Backend: Adds support for Volcengine Code Sandbox (SandboxFusion) as a remote, stateless sandbox option, enhancing EvalScope's code evaluation capabilities.
Flexible Sandbox Management: Introduces an _external_manager attribute and associated logic to handle external sandbox integrations, allowing EvalScope to work seamlessly with both local (Docker/enclave) and remote sandboxes.
Tool Mapping and Configuration: Existing tools like python_executor and shell_executor are automatically mapped to the Volcengine sandbox, requiring no changes to existing task or benchmark configurations.
Dynamic Sandbox Initialization: The sandbox initialization process is updated to dynamically differentiate between internal and external sandbox types, ensuring the correct manager and configuration are utilized.
Enhanced Argument and Configuration: Command-line arguments and configuration handling are updated to support the new volcengine sandbox type, providing clear options for users.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR adds support for Volcengine sandbox, which is a great addition for providing a remote, stateless sandbox option. The implementation is mostly solid, introducing a new SandboxFusionClient and integrating it via the SandboxMixin. The code is well-structured to handle this new external sandbox type.

My review focuses on improving maintainability, robustness, and adherence to best practices. Key areas of feedback include:

Replacing broad except Exception clauses with more specific ones.
Reducing code duplication, particularly for sandbox type aliases.
Improving the design to be more extensible by avoiding hardcoded imports of specific sandbox implementations.
Enhancing readability and logging.

Overall, this is a good contribution. Addressing the feedback will make the code more robust and easier to maintain in the long run.

gemini-code-assist · 2026-01-20T04:00:15Z

evalscope/api/mixin/sandbox_mixin.py

+        manager_class = manager_config.get('manager_class')
+        if not manager_class:
+            if self._is_external_sandbox():
+                from evalscope.sandbox.volcengine import SandboxFusionSandboxManager


Hardcoding the import of SandboxFusionSandboxManager inside SandboxMixin creates a tight coupling and violates the Open/Closed Principle. This makes it difficult to add other external sandbox types in the future without modifying this mixin. A more extensible approach would be to use a factory pattern or a registry to map sandbox_type strings to their corresponding manager classes.

gemini-code-assist · 2026-01-20T04:00:15Z

evalscope/api/mixin/sandbox_mixin.py

+            try:
+                return sandbox_type_enum(sandbox_type)
+            except Exception:


Using except Exception: is too broad and can mask unexpected errors. It's better to catch the specific exception you expect. In this case, creating an enum member from an invalid value raises ValueError.

except ValueError: return sandbox_type

gemini-code-assist · 2026-01-20T04:00:15Z

evalscope/api/mixin/sandbox_mixin.py

+        sandbox_type = self._task_config.sandbox_type if self._task_config else None
+        if not sandbox_type:
+            return False
+        return str(sandbox_type).lower() in {'volcengine', 'volcano', 'volc'}


The set of aliases {'volcengine', 'volcano', 'volc'} is also used in _resolve_sandbox_config on line 179 and in evalscope/config.py on line 292. Duplicating this set makes it harder to maintain and can lead to inconsistencies. Consider defining it as a constant at the module or class level and reusing it in all these places to improve maintainability.

gemini-code-assist · 2026-01-20T04:00:15Z

evalscope/api/mixin/sandbox_mixin.py

+                return lambda tool_name, tool_input, timeout=None: sandbox.execute(  # noqa: E731
+                    tool_name, tool_input, timeout=timeout
+                )


Using lambda here requires a noqa to suppress a linting error (E731). While concise, this can be less readable. For better clarity and to adhere to style guidelines without needing noqa, consider using a nested function (def). This pattern is repeated for other executors in this method.

def executor(tool_name, tool_input, timeout=None): return sandbox.execute(tool_name, tool_input, timeout=timeout) return executor

gemini-code-assist · 2026-01-20T04:00:15Z

evalscope/sandbox/volcengine.py

+                    parsed = json.loads(trimmed)
+                    if isinstance(parsed, dict):
+                        input_dict = parsed
+                except Exception:


Catching a broad Exception can hide unexpected errors. It's better to catch only the specific exception you expect. When parsing JSON, json.loads raises json.JSONDecodeError on failure.

Suggested change

except Exception:

except json.JSONDecodeError:

gemini-code-assist · 2026-01-20T04:00:16Z

evalscope/sandbox/volcengine.py

+        if stderr:
+            merged_output += ('' if merged_output.endswith('\n') or not merged_output else '\n') + stderr


This pattern to conditionally add a newline is repeated multiple times below. To improve readability and reduce duplication, consider extracting it into a small helper function.

gemini-code-assist · 2026-01-20T04:00:16Z

evalscope/sandbox/volcengine.py

+        except Exception:
+            pass


Silently swallowing all exceptions with except Exception: pass is risky, as it can hide important problems that occur during resource cleanup. At a minimum, you should log the exception as a warning so that issues can be diagnosed. Note: you will need to import and initialize a logger at the top of the file for this to work.

Suggested change

except Exception:

pass

except Exception as e:

logger.warning('Failed to close sandbox client session.', exc_info=True)

XChen-Zero · 2026-01-20T07:15:36Z

Improve sandbox language compatibility and add more test results

This PR extends the Volcengine sandbox support with configurable dataset-to-sandbox language mapping, fixing compatibility issues where dataset language identifiers differ from sandbox requirements (e.g. r → R).

What’s added

Introduced dataset_language_map in sandbox_manager_config to normalize dataset language names before execution.
Works for Volcengine Sandbox and remains compatible with existing MS sandbox implementations.
Keeps backward compatibility with existing configs.

Example

sandbox_type="volcengine",
use_sandbox=True,
sandbox_manager_config={
    "base_url": "http://<IP>:<PORT>",
    "dataset_language_map": {
        "r": "R",
        "d_ut": "D_ut",
        "ts": "typescript"
    }
}

Additional results

LiveCodeBench (release_latest)
qwen3-4b-instruct-2507: mean_acc / pass@1 = 0.4474 (1055 samples)
multiple_mbpp (multi-language)
Overall mean_acc = 0.6778 across 18 languages

These results validate correct language dispatch and stable sandbox execution across multiple datasets and languages.

…o feat/add_volcengine_sandbox

Yunnglin

LGTM! Thanks for the contribution. I've updated sandbox_mixin code and documentation. Merging now.

Yunnglin · 2026-01-20T08:22:27Z

If possible, we also welcome PRs to ms-enclave to natively support Volcengine's sandbox. It would be more reasonable for ms-enclave to support more backends.

feat: add Volcengine sandbox support

b20de3a

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

XChen-Zero mentioned this pull request Jan 20, 2026

Support for Vol-Sandbox as Code Execution Environment? #1155

Closed

XChen-Zero and others added 2 commits January 20, 2026 15:14

feat: support configurable dataset-to-sandbox language mapping

d118aea

Merge branch 'modelscope:main' into feat/add_volcengine_sandbox

354f10a

XChen-Zero and others added 3 commits January 20, 2026 15:27

delete: Delete some debugging information

39ff862

Merge remote-tracking branch 'origin/feat/add_volcengine_sandbox' int…

678f8d6

…o feat/add_volcengine_sandbox

update code and doc

4140e60

Yunnglin approved these changes Jan 20, 2026

View reviewed changes

Yunnglin linked an issue Jan 20, 2026 that may be closed by this pull request

Support for Vol-Sandbox as Code Execution Environment? #1155

Closed

Yunnglin merged commit a336e04 into modelscope:main Jan 20, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Volcengine sandbox support#1160

feat: add Volcengine sandbox support#1160
Yunnglin merged 6 commits intomodelscope:mainfrom
XChen-Zero:feat/add_volcengine_sandbox

XChen-Zero commented Jan 20, 2026

Uh oh!

gemini-code-assist bot commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

XChen-Zero commented Jan 20, 2026

Uh oh!

Yunnglin left a comment

Uh oh!

Yunnglin commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if stderr:
		merged_output += ('' if merged_output.endswith('\n') or not merged_output else '\n') + stderr

-        except Exception:
-            pass
+        except Exception as e:
+            logger.warning('Failed to close sandbox client session.', exc_info=True)

Conversation

XChen-Zero commented Jan 20, 2026

Add Volcengine (SandboxFusion) sandbox backend

Uh oh!

gemini-code-assist bot commented Jan 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

XChen-Zero commented Jan 20, 2026

Improve sandbox language compatibility and add more test results

What’s added

Example

Additional results

Uh oh!

Yunnglin left a comment

Choose a reason for hiding this comment

Uh oh!

Yunnglin commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants