Skip to content

adds nemotron example#1511

Merged
charlesfrye merged 3 commits intomainfrom
charlesfrye/nemotron_inference
Feb 26, 2026
Merged

adds nemotron example#1511
charlesfrye merged 3 commits intomainfrom
charlesfrye/nemotron_inference

Conversation

@charlesfrye
Copy link
Copy Markdown
Collaborator

@charlesfrye charlesfrye commented Feb 26, 2026

This PR adds a sample script for running Nvidia Nemotron models on Modal using SGLang.

It runs the Nemotron 3 Nano model in FP4 quant on a single B200 and gets about 150 tok/s at concurrency 1.

Type of Change

  • New example for the GitHub repo
    • New example for the documentation site (Linked from a discoverable page, e.g. via the sidebar in /docs/examples)

Monitoring Checklist

  • Example is configured for testing in the synthetic monitoring system, or lambda-test: false is provided in the example frontmatter and I have gotten approval from a maintainer
    • Example is tested by executing with modal run, or an alternative cmd is provided in the example frontmatter (e.g. cmd: ["modal", "serve"])
    • Example is tested by running the cmd with no arguments, or the args are provided in the example frontmatter (e.g. args: ["--prompt", "Formula for room temperature superconductor:"]
    • Example does not require third-party dependencies besides fastapi to be installed locally (e.g. does not import requests or torch in the global scope or other code executed locally)

Documentation Site Checklist

Content

  • Example is documented with comments throughout, in a Literate Programming style
  • All media assets for the example that are rendered in the documentation site page are retrieved from modal-cdn.com

Build Stability

  • Example pins all dependencies in container images
    • Example pins container images to a stable tag like v1, not a dynamic tag like latest
    • Example specifies a python_version for the base image, if it is used
    • Example pins all dependencies to at least SemVer minor version, ~=x.y.z or ==x.y, or we expect this example to work across major versions of the dependency and are committed to maintenance across those versions
      • Example dependencies with version < 1 are pinned to patch version, ==0.y.z

Open with Devin

@charlesfrye charlesfrye requested a review from shababo February 26, 2026 04:48
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@charlesfrye
Copy link
Copy Markdown
Collaborator Author

@prbot approve

Copy link
Copy Markdown

@modal-pr-review-automation modal-pr-review-automation bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved 👍. @shababo will follow-up review this.

@charlesfrye charlesfrye merged commit 41da872 into main Feb 26, 2026
6 checks passed
@charlesfrye charlesfrye deleted the charlesfrye/nemotron_inference branch February 26, 2026 04:55
@shababo
Copy link
Copy Markdown
Collaborator

shababo commented Feb 27, 2026

nice. lgtm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants