Skip to content

Merge sglang model docs into cookbook#141

Open
JingwenGu0829 wants to merge 1 commit intosgl-project:mainfrom
JingwenGu0829:fix/merge-sglang-docs-to-cookbook
Open

Merge sglang model docs into cookbook#141
JingwenGu0829 wants to merge 1 commit intosgl-project:mainfrom
JingwenGu0829:fix/merge-sglang-docs-to-cookbook

Conversation

@JingwenGu0829
Copy link

@JingwenGu0829 JingwenGu0829 commented Feb 8, 2026

Motivation

This is a companion PR to sglang#18427 - consolidate documentation. Merges model-specific content from sglang/docs into cookbook pages.

Detailed roadmap

The change is a SITUATION 2 change as mentioned in the issue since there's a lot of unique contents on sglang document compared to cookbook.

Model Checklist

Models listed in sglang/docs/basic_usage/popular_model_usage.rst (toctree):

  • GPT-OSS
  • MiniMax M2 / M2.1
  • Qwen3-Next
  • Qwen3-VL
  • Llama 4
  • DeepSeek V3
  • DeepSeek V3.1
  • DeepSeek R1
  • DeepSeek V3.2
  • DeepSeek OCR
  • GLM-4.5 / GLM-4.6 / GLM-4.7
  • GLM-4.5V / GLM-4.6V

Philosophy / Priorities in Merging

Followed a strict set of priorities to keep the merge predictable and minimize risk:

  1. If the cookbook already has a section covering a topic, leave it as-is, even if the sglang doc phrases it differently. The cookbook is the user-facing source of truth.

  2. Add content from the sglang doc only when the cookbook has no equivalent section. This means every addition in this PR represents information that was previously only available in docs.sglang.io.

  3. When adding content, we copy it as closely as possible from the original sglang doc — preserving commands, arguments, code examples, and wording. We do not rewrite or editorialize the technical content.

  4. Minimal structural changes during merge. The only things we change when inserting content are:

    • Section numbering — renumbered to fit the cookbook's existing #.#.# hierarchy (e.g., a new section becomes 3.4 if 3.3 already exists).
    • Section placement — inserted at the most logical position within the cookbook's existing structure (e.g., hardware notes go under Deployment, curl examples go under Tool Calling).
    • File/component renaming — only when the sglang doc covers a broader scope than the original cookbook file (e.g., Llama4-Scout.mdLlama4.md since the doc covers both Scout and Maverick).
    • Format adaptation — converting Sphinx/RST directives (```{tip}, ```{note}) to Markdown equivalents, and adjusting internal links to work within Docusaurus.
  5. Self-contained pages over cross-references. When one sglang doc covers multiple models (e.g., deepseek_v3.md covers V3, V3.1, R1), shared content (hardware table, optimizations, multi-node, FAQ) is duplicated into each cookbook page rather than centralizing it in one page and linking. This ensures a user setting up R1 doesn't have to navigate to the V3 page for basic setup info.

Tags:

  • [Merge] Cookbook had content; merged missing sections from sglang doc.
  • [New] Cookbook was empty placeholder; replaced with sglang doc content, renumbered sections.
  • [Renamed] File renamed; sidebar and intro references updated.
  • [Lot] Lots of modifications and new contents likely make mistakes

Note on DeepSeek V3/V3.1/R1: The sglang doc deepseek_v3.md covers all three jointly.
Shared content (hardware table, optimizations, multi-node, FAQ) is duplicated into each
cookbook page so that each page is self-contained — a user setting up R1 shouldn't have
to visit the V3 page for shared setup info, same thing goes for GLM series model.

Details

GPT-OSS: [Merge]

  • Added Responses API section (4.1.1).
  • Added Built-in Tools content under 4.2.2 Tool Calling (avoids overlap).
  • Added Speculative Decoding EAGLE3 section (4.1.2).

MiniMax M2/M2.1: [New]

  • Replaced placeholder with sglang doc content, renumbered sections.

Qwen3-Next: [Merge]

  • Note: sglang doc qwen3.md is about Qwen3-Next only.
  • Added Mamba Radix Cache section (3.3).
  • Added EAGLE Speculative Decoding NEXTN section (3.4).

Qwen3-VL: [Merge]

  • Added multimodal server parameters and optimized launch example into 3.2.
  • Added Hardware-Specific Notes as 3.3.

Llama4: [New] [Renamed]

  • Renamed Llama4-Scout.md -> Llama4.md (covers both Scout and Maverick).
  • Replaced placeholder with sglang doc content, renumbered sections.
  • Updated sidebars.js and docs/intro.md references.
  • Deleted old Llama4-Scout.md and unused Llama4ScoutConfigGenerator/.

DeepSeek V3: [Merge] [Lot]

  • Added Hardware Requirements table (3.2).
  • Added Download Weights tip to Configuration Tips (3.3).
  • Added Multi-Node Deployment section with blog/example links (3.4).
  • Added Optimizations section (3.5): MLA, DP Attention, Multi-Node TP, Block-wise FP8, MTP.
  • Added curl examples (non-streaming + streaming) to Tool Calling (4.2.2).
  • Added FAQ section (6) for NCCL timeout.

DeepSeek V3.1: [Merge] [Lot]

  • Same shared content as V3 (hardware table, optimizations, multi-node, FAQ).
  • Added curl examples adapted for V3.1 (deepseekv31 parser, tool_chat_template_deepseekv31.jinja).
  • Model paths use deepseek-ai/DeepSeek-V3.1 where applicable.

DeepSeek R1: [Merge] [Lot]

  • Same shared content as V3/V3.1 (hardware table, optimizations, multi-node, FAQ) — numbered 3.3–3.6 since R1 has an extra "Optimal Configurations" section (3.2).
  • Added curl examples adapted for R1 (deepseekv3 parser, tool_chat_template_deepseekr1.jinja).
  • Added Thinking Budget section (4.2.3) — R1-unique, uses CustomLogitProcessor.
  • Model paths use deepseek-ai/DeepSeek-R1-0528 where applicable.

DeepSeek V3.2: [Merge] [Lot]

  • Added Installation section with Docker images and build from source (3.2).
  • Added Launch Examples for TP+DP, EP+DP, Pure TP, MI30x/MI35x (3.3).
  • Expanded Configuration Tips with DP attention, attention kernel choices, default configs (3.4).
  • Added Multi-token Prediction with EAGLE (3.5).
  • Added NVFP4 Checkpoint launch on Blackwell (3.6).
  • Added PD Disaggregation with Prefill/Decode/Router commands (3.7).
  • Added DSA Context Parallel (experimental): in-seq split, round-robin, PP+CP, PD+PP+CP (3.8).
  • Added benchmarks: GSM8K 20-shot, GPQA-Diamond, AIME 2025 with all 3 variants (5.2.3–5.2.5).

DeepSeek OCR: [Merge]

  • Added prompt examples (4.2) and OpenAI-compatible request example (4.3).

GLM-4.5: [Merge]

  • Added EAGLE Speculative Decoding section (3.3).
  • Added Thinking Budget section (4.2.3) using Glm4MoeThinkingBudgetLogitProcessor.

GLM-4.6: [Merge]

  • Same as GLM-4.5: EAGLE Speculative Decoding (3.3) and Thinking Budget (4.2.3).

GLM-4.7: No changes needed — sglang doc has no 4.7-specific content beyond what's already in cookbook.

GLM-4.5V: [Merge]

  • Expanded 3.2 Configuration Tips with Hardware-Specific Notes (3.3) and Multimodal Server Parameters (3.4) including optimized launch example.
  • Added Thinking Budget section (4.2.4) using Glm4MoeThinkingBudgetLogitProcessor.

GLM-4.6V: [Merge]

  • Same as GLM-4.5V: Hardware-Specific Notes (3.3), Multimodal Server Parameters (3.4), and Thinking Budget (before section 5).

  Companion PR to sglang#18427 - consolidate documentation.

  Merges model-specific content from sglang/docs into cookbook pages:

  GPT-OSS: Added Responses API, Built-in Tools, EAGLE3 speculative decoding
  MiniMax-M2: Replaced placeholder with full sglang doc content
  Qwen3-Next: Added Mamba Radix Cache, EAGLE NEXTN speculative decoding
  Qwen3-VL: Added multimodal parameters, hardware-specific notes
  Llama4: Renamed from Llama4-Scout, covers Scout and Maverick
  DeepSeek-V3/V3.1/R1: Added hardware table, optimizations, multi-node, FAQ
  DeepSeek-V3.2: Added installation, launch examples, DSA context parallel
  DeepSeek-OCR: Added prompt examples, OpenAI-compatible requests
  GLM-4.5/4.6: Added EAGLE speculative decoding, Thinking Budget
  GLM-4.7: Parser updated to glm47
  GLM-4.5V/4.6V: Added hardware notes, multimodal parameters, optimized launch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant