Qwen 8B code golf RL by pawalt · Pull Request #57 · modal-labs/multinode-training-guide

pawalt · 2026-02-24T00:54:06Z

Summary

This PR introduces a new end-to-end example for Reinforcement Learning (RL) a Qwen 8B model to excel at "code golf" using Slime and Harbor on Modal.

Key features include:

MBPP Dataset Integration: Converts the MBPP dataset to Harbor format, dynamically extracting function names for prompts.
Custom Code Golf Reward Model: Implements a Slime custom reward model that scores completions based on correctness (verified by Harbor in Modal Sandboxes) and code brevity.
Modal Sandbox Rollouts & Evals: Leverages Modal Sandboxes for isolated and highly concurrent execution of both training rollouts and evaluation tasks.
SGLang Serving for Evaluation: Provides an SGLang server on Modal to serve the latest checkpoint, enabling high-concurrency evaluations via a custom Harbor agent.

This example demonstrates a complete pipeline for fine-tuning a code generation model with complex, custom reward functions and scalable evaluation infrastructure.

Checklist

Example is documented with comments throughout, in a Literate Programming style.
Example does not require third-party dependencies to be installed locally
Example follows the style guide
Example pins its dependencies
- Example pins container images to a stable tag, not a dynamic tag like latest
- Example specifies a python_version for the base image, if it is used
- Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
- Example dependencies with version < 1 are pinned to patch version, ==0.y.z

(Modal's internal guide page for this repo is Multi-node examples guidance.)

Outside contributors

You're great! Thanks for your contribution.

Co-authored-by: Peyton Walters <pawalt@hey.com>

cursor · 2026-02-24T00:54:08Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Add SLIME Harbor Modal code-golf Qwen8B example

da08989

Co-authored-by: Peyton Walters <pawalt@hey.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen 8B code golf RL#57

Qwen 8B code golf RL#57
pawalt wants to merge 1 commit intomainfrom
cursor/qwen-8b-code-golf-rl-4ece

pawalt commented Feb 24, 2026

Uh oh!

cursor bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pawalt commented Feb 24, 2026

Summary

Checklist

Outside contributors

Uh oh!

cursor bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants