Skip to content

Conversation

haic0
Copy link

@haic0 haic0 commented Aug 7, 2025

Update the Deepseek-R1 performance guide for the AMD GPU.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @haic0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation by adding a dedicated performance guide for running DeepSeek-R1 models on AMD MI300X GPUs. My aim is to provide clear, actionable steps for users to achieve optimal performance, including environment setup, vLLM configuration, and benchmarking. Additionally, I've updated the main project README to ensure this new guide is easily accessible, improving overall user experience and documentation clarity.

Highlights

  • New Deepseek-R1 Performance Guide for AMD GPUs: I've introduced a new DeepSeek/AMD_GPU/README.md file. This guide provides detailed, step-by-step instructions for setting up and running DeepSeek-R1 models specifically on AMD MI300X GPUs. It covers everything from verifying the GPU environment and locking GPU frequency to launching the Rocm-vllm Docker container and executing vLLM online serving commands with optimized parameters. The guide also includes valuable tips for adjusting parameters like max-model-len, max-num-batched-tokens, and max-num-seqs to fine-tune performance for throughput, memory, and latency, along with a sample benchmark serving script and its expected output.
  • Main README Update for AMD GPU Support: To ensure discoverability of the new guide, I've updated the main README.md file. A new section titled "AMD GPU Support" has been added, which includes a direct link to the DeepSeek/AMD_GPU/README.md file. This makes it easy for users to find the relevant performance information for AMD GPUs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new performance guide for running DeepSeek-R1 on AMD GPUs and links to it from the main README. The new guide is detailed and provides useful commands and benchmark results. My feedback focuses on improving the formatting and readability of the new markdown file for a better user experience, and on restructuring the link in the main README to be more consistent with the existing document structure.

Comment on lines +21 to +22
### AMD GPU Support
For the user guide,kindly review the AMD-GPU repository within the model directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new section for AMD GPU support is a bit vague and seems misplaced. Since the guide is specific to running DeepSeek models on AMD GPUs, it would be more organized if it were listed under the existing ### DeepSeek section.

However, if it's intended to be a separate section, the title and link should be more descriptive. The current text also has a typo and grammatical issue (guide,kindly).

Here is a suggestion to make it clearer while keeping it as a separate section:

Suggested change
### AMD GPU Support
For the user guide,kindly review the AMD-GPU repository within the model directory.
### DeepSeek on AMD GPU
- [DeepSeek-R1 Performance Guide for AMD GPU](DeepSeek/AMD_GPU/README.md)

@@ -0,0 +1,120 @@
## AMD GPU Installation and Benchmarking Guide
#### Support Matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line has a trailing whitespace. Several other lines in this file also have trailing whitespaces (e.g., 14, 37, 55, 85, 89). Please remove them to improve formatting consistency.

Suggested change
#### Support Matrix
#### Support Matrix

Comment on lines +4 to +7
##### GPU TYPE
MI300X
##### DATA TYPE
FP8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current formatting for the support matrix is a bit difficult to read and inconsistent. Using a markdown table would make this much clearer and more standard.

Suggested change
##### GPU TYPE
MI300X
##### DATA TYPE
FP8
| GPU TYPE | DATA TYPE |
|----------|-----------|
| MI300X | FP8 |

rocm-smi --setperfdeterminism 1900
```

### Step 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heading level for 'Step 2' is ###, which is inconsistent with the other steps (e.g., 'Step 1', 'Step 3') that use ####. For consistency, please use #### for all steps.

Suggested change
### Step 2
#### Step 2

Comment on lines +54 to +55
pip install -U "huggingface_hub[cli]"
huggingface-cli login
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The commands in this code block have unnecessary indentation, which can be confusing. It's best to remove it for clarity.

Suggested change
pip install -U "huggingface_hub[cli]"
huggingface-cli login
pip install -U "huggingface_hub[cli]"
huggingface-cli login


--max-num-batched-tokens=32768: Balances throughput with manageable memory/latency.

If OOM errors or sluggish performance occur → decrease max-model-len (e.g., 32k or 8k) or reduce max-num-batched-tokens (e.g., 16k or 8k).For low latency needs, consider reducing max-num-batched-tokens.To maximize throughput and you have available VRAM, keep it high—but stay aware of latency trade-offs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This paragraph is dense and hard to read. It would be better to use bullet points or line breaks to separate the different pieces of advice for tuning max-model-len and max-num-batched-tokens.

Suggested change
If OOM errors or sluggish performance occur → decrease max-model-len (e.g., 32k or 8k) or reduce max-num-batched-tokens (e.g., 16k or 8k).For low latency needs, consider reducing max-num-batched-tokens.To maximize throughput and you have available VRAM, keep it high—but stay aware of latency trade-offs.
If OOM errors or sluggish performance occur:
- Decrease `max-model-len` (e.g., to 32k or 8k).
- Reduce `max-num-batched-tokens` (e.g., to 16k or 8k).
For low latency needs, consider reducing `max-num-batched-tokens`.
To maximize throughput with available VRAM, keep `max-num-batched-tokens` high, but stay aware of latency trade-offs.


If OOM errors or sluggish performance occur → decrease max-model-len (e.g., 32k or 8k) or reduce max-num-batched-tokens (e.g., 16k or 8k).For low latency needs, consider reducing max-num-batched-tokens.To maximize throughput and you have available VRAM, keep it high—but stay aware of latency trade-offs.

--max-num-seqs=1024: It affects throughput vs latency trade-offs:Higher values yield better throughput (more parallel requests) but may raise memory pressure and latency.Lower values reduce GPU memory footprint and latency, at the cost of throughput.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This paragraph about max-num-seqs is also quite dense. Using a list would make the trade-offs between higher and lower values much clearer to the user.

Suggested change
--max-num-seqs=1024: It affects throughput vs latency trade-offs:Higher values yield better throughput (more parallel requests) but may raise memory pressure and latency.Lower values reduce GPU memory footprint and latency, at the cost of throughput.
--max-num-seqs=1024: Affects throughput vs. latency trade-offs:
- **Higher values**: Yield better throughput (more parallel requests) but may increase memory pressure and latency.
- **Lower values**: Reduce GPU memory footprint and latency, at the cost of throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant