[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs #42

nvpohanh · 2025-08-26T08:26:20Z

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs

gemini-code-assist

Summary of Changes

Hello @nvpohanh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the documentation for Llama3/4 and GPT-OSS recipes, specifically tailored for NVIDIA GPUs. The changes focus on improving the deployment experience by simplifying Docker image acquisition, addressing NCCL initialization issues on Blackwell architecture, and refining vLLM server launch configurations. Additionally, the GPT-OSS recipe now includes extensive guidance on performance benchmarking and tuning, providing users with clearer instructions and insights into optimizing model serving for various throughput and latency requirements.

Highlights

Streamlined Docker Image Acquisition: The documentation now guides users to pull pre-built vLLM Docker images from a specific ECR repository instead of building them locally. This significantly simplifies the setup process by removing the need for local compilation and dependency management.
Critical NCCL Version Installation: A new step has been added to install a specific version of NCCL (nvidia-nccl-cu12==2.26.2.post1). This addresses known issues with long NCCL initialization times on NVIDIA Blackwell architecture, ensuring smoother and faster model deployment.
Optimized vLLM Server Launch Commands: The vllm serve commands across all recipes have been updated with refined compilation configurations and flag usage. This includes removing deprecated environment variables and introducing new flags like NO_PREFIX_CACHING_FLAG for more consistent and optimized performance measurements.
Enhanced Performance Benchmarking and Tuning Guidance: The GPT-OSS recipe now includes comprehensive sections on performance benchmarking, detailing how to use vllm bench serve and how to interpret its output. It also provides in-depth guidance on tunable parameters and recommended configurations for balancing throughput and latency on NVIDIA GPUs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the documentation for Llama 3.3, Llama 4, and GPT-OSS models to provide clearer and more up-to-date instructions for running them on NVIDIA GPUs, particularly the new Blackwell architecture. Key changes include switching from building a Docker image to pulling a pre-built one, adding a necessary step to update NCCL for Blackwell, and providing more detailed server launch commands with explanations for various parameters. The documentation for performance benchmarking and balancing throughput vs. latency has also been significantly expanded and standardized across the recipes. The changes are well-structured and greatly improve the usability and clarity of the guides. I've found a minor typo and a couple of small inconsistencies in the documentation which I've commented on.

Llama/Llama3.3-70B.md

Llama/Llama4-Scout.md

OpenAI/GPT-OSS.md

nvpohanh · 2025-08-26T09:06:18Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the documentation for Llama3, Llama4, and GPT-OSS models on NVIDIA GPUs. The changes include switching from building a Docker image from source to pulling a pre-built image, which simplifies the setup process. It also adds instructions for installing a newer version of NCCL required for Blackwell architecture and updates various vLLM flags and commands to align with recent changes. The documentation for GPT-OSS is significantly expanded with detailed examples, parameter explanations, and comprehensive sections on performance benchmarking and tuning.

My review identifies a couple of minor issues in OpenAI/GPT-OSS.md: a formatting inconsistency in the parameter list and a typo at the end of the file. Applying these suggestions will improve the clarity and consistency of the documentation. Overall, this is a valuable update that makes the recipes more user-friendly and informative.

OpenAI/GPT-OSS.md

nvpohanh · 2025-09-01T07:32:01Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the documentation for Llama and GPT-OSS models, shifting from building Docker images from source to using pre-built images and updating configurations for NVIDIA's Blackwell and Hopper GPUs. This simplifies the setup process and provides more current instructions. My review has identified a few critical errors in the provided Docker commands and some typos in the documentation that could lead to user confusion. Addressing these will improve the quality and usability of the recipes.

Llama/Llama3.3-70B.md

Llama/Llama4-Scout.md

OpenAI/GPT-OSS.md

Llama/Llama3.3-70B.md

Llama/Llama4-Scout.md

OpenAI/GPT-OSS.md

Signed-off-by: Po-Han Huang <[email protected]>

nvpohanh · 2025-09-01T08:38:33Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the documentation for running Llama3, Llama4, and GPT-OSS models on NVIDIA GPUs. The changes simplify the setup process by switching to a pre-built Docker image, add necessary steps for installing updated dependencies like NCCL and FlashInfer, and refine the server launch configurations for improved performance on newer hardware architectures. The instructions are generally clear and the updates are consistent across the different model recipes. I've identified one issue in the Llama4-Scout.md recipe where a command uses an incorrect model name, which would cause it to fail. I have provided a suggestion to correct this.

nvpohanh · 2025-09-03T00:18:20Z

@heheda12345 Could you review this and merge this? Thanks!

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

Llama/Llama3.3-70B.md Show resolved Hide resolved

Llama/Llama4-Scout.md Show resolved Hide resolved

OpenAI/GPT-OSS.md Outdated Show resolved Hide resolved

nvpohanh force-pushed the dev/nvpohanh/llama-gptoss-v0.4 branch from 6ea9d90 to 658e615 Compare August 26, 2025 08:36

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

OpenAI/GPT-OSS.md Outdated Show resolved Hide resolved

OpenAI/GPT-OSS.md Outdated Show resolved Hide resolved

nvpohanh force-pushed the dev/nvpohanh/llama-gptoss-v0.4 branch 2 times, most recently from b5c8d62 to 5b6aa93 Compare September 1, 2025 06:12

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs

0b14830

Signed-off-by: Po-Han Huang <[email protected]>

nvpohanh force-pushed the dev/nvpohanh/llama-gptoss-v0.4 branch from 5b6aa93 to 0b14830 Compare September 1, 2025 08:34

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs #42

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs #42

nvpohanh commented Aug 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Aug 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Sep 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Sep 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

nvpohanh commented Sep 3, 2025

Uh oh!

Uh oh!

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs #42

Are you sure you want to change the base?

[Docs] Update Llama3/4 and GPT-OSS recipe for NVIDIA GPUs #42

Conversation

nvpohanh commented Aug 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Aug 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Sep 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Sep 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

nvpohanh commented Sep 3, 2025

Uh oh!

Uh oh!