Add local LCB evaluation by nv-alicheng · Pull Request #96 · mlcommons/endpoints

nv-alicheng · 2026-01-13T23:55:09Z

What does this PR do?

Adds LCB evaluation functionality via LCBServe. Followup PR will move LCBServe to a containerized microservice.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

…n't manually reporting progress

github-actions · 2026-01-13T23:55:18Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-01-13T23:55:28Z

Summary of Changes

Hello @nv-alicheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the evaluation capabilities by incorporating LiveCodeBench (LCB) into the SGLang framework. It establishes a robust mechanism for evaluating code generation models by introducing a dedicated service for secure code execution, a specialized extractor for Python code, and seamlessly integrating LCB into the comprehensive accuracy benchmark suite. This lays the groundwork for more rigorous and secure code-based model assessments.

Highlights

LiveCodeBench (LCB) Integration: Introduced functionality to evaluate models against the LiveCodeBench dataset, a benchmark for code generation tasks.
LCBServe Component: Added a new LCBServe module designed to run LCB evaluation in a controlled, isolated environment, with a future plan for containerization.
Python Code Extraction: Implemented a PythonCodeExtractor to accurately extract Python code from markdown blocks in model outputs, crucial for LCB evaluation.
Benchmark Suite Expansion: Integrated LCB evaluation into the existing SGLang Accuracy Suite Benchmark, alongside GPQA and AIME25, and updated related scripts and report naming conventions.
LCB Setup Script: Provided a setup_lcb.sh script to streamline the cloning and dependency installation process for the LiveCodeBench repository.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for LiveCodeBench (LCB) evaluation. The changes include a new evaluation script for LCB, a setup script, the core evaluation logic in lcb_serve.py, a new PythonCodeExtractor, and a LiveCodeBenchScorer. The main benchmark runner is also updated to include LCB. My review focuses on improving robustness, configurability, and correctness. I've identified several hardcoded paths that should be made configurable, a bug where a timeout parameter is ignored, and some opportunities for code improvement and making the implementation less brittle.

src/inference_endpoint/dataset_manager/predefined/livecodebench/lcb_serve.py

src/inference_endpoint/evaluation/scoring.py

examples/07_GPT-OSS-120B_SGLang_Example/run.py

examples/07_GPT-OSS-120B_SGLang_Example/setup_lcb.sh

src/inference_endpoint/dataset_manager/predefined/livecodebench/lcb_serve.py

…tion, fix run.py usage

src/inference_endpoint/dataset_manager/__init__.py

examples/07_GPT-OSS-120B_SGLang_Example/setup_lcb.sh

src/inference_endpoint/dataset_manager/predefined/livecodebench/lcb_serve.py

arekay-nv

Great work - thanks!

src/inference_endpoint/evaluation/scoring.py

…b eval

…to avoid environment corruption

…for LCBScorer with option to silence

nv-alicheng added 5 commits January 12, 2026 14:43

Add PythonCodeExtractor

73936b1

Add local LCB evaluator using lcb_runner

8f50ba0

Fix small bugs for LCBServe, re-enable TQDM for progress since we are…

a518dca

…n't manually reporting progress

Add LiveCodeBench to example07

82f7fee

Add temporary LCB setup script for local testing

b7713ed

nv-alicheng requested a review from a team as a code owner January 13, 2026 23:55

github-actions bot requested review from arekay-nv and nvzhihanj January 13, 2026 23:55

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

nv-alicheng added 2 commits January 13, 2026 16:04

Use chdir() scope in LCBWorker since it is not running in a subprocess

9003d91

Move preset transforms to special PRESETS field for less code duplica…

080cdb5

…tion, fix run.py usage

github-code-quality bot found potential problems Jan 14, 2026

View reviewed changes

src/inference_endpoint/dataset_manager/__init__.py Fixed Show fixed Hide fixed

nvzhihanj reviewed Jan 14, 2026

View reviewed changes

examples/07_GPT-OSS-120B_SGLang_Example/setup_lcb.sh Show resolved Hide resolved

nvzhihanj reviewed Jan 14, 2026

View reviewed changes

src/inference_endpoint/dataset_manager/predefined/livecodebench/lcb_serve.py Show resolved Hide resolved

nvzhihanj approved these changes Jan 14, 2026

View reviewed changes

arekay-nv approved these changes Jan 14, 2026

View reviewed changes

src/inference_endpoint/evaluation/scoring.py Show resolved Hide resolved

nv-alicheng added 3 commits January 13, 2026 20:44

Address comments from Gemini about hardcoded values, fix inputs to lc…

55c88f3

…b eval

Allow lcb_serve to be run as a standalone script and as a subprocess …

299bdae

…to avoid environment corruption

Add more configuration options to example 07, support output display …

9a69cf2

…for LCBScorer with option to silence

nv-alicheng merged commit 72ec971 into main Jan 14, 2026
4 checks passed

github-actions bot locked and limited conversation to collaborators Jan 14, 2026

arekay-nv deleted the feature/alicheng-lcb-eval branch April 2, 2026 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local LCB evaluation#96

Add local LCB evaluation#96
nv-alicheng merged 10 commits intomainfrom
feature/alicheng-lcb-eval

nv-alicheng commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nv-alicheng commented Jan 13, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jan 13, 2026 •

edited

Loading