-
Notifications
You must be signed in to change notification settings - Fork 258
feat(gpu): update doc gpu #4621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bene2k1 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
pages/gpu/reference-content/understanding-nvidia-nvlink.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| --- | ||
| meta: | ||
| title: Understanding NVIDIA NVLink | ||
| description: This section provides information about NVIDIA NVLink | ||
| content: | ||
| h1: Understanding NVIDIA NVLink | ||
| paragraph: This section provides information about NVIDIA NVLink | ||
| tags: NVIDIA NVLink | ||
| dates: | ||
| validation: 2025-03-13 | ||
| posted: 2025-03-13 | ||
| categories: | ||
| - compute | ||
| --- | ||
|
|
||
| NVLink is NVIDIA's high-bandwidth, low-latency GPU-to-GPU interconnect with built-in resiliency features, available on Scaleway's [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table). It was designed to significantly improve the performance and efficiency when connecting GPUs, CPUs, and other components within the same node. | ||
| It provides much higher bandwidth (up to 900 GB/s total GPU-to-GPU bandwidth in an 8-GPU configuration) and lower latency compared to traditional PCIe Gen 4 (up to 32 GB/s per link). | ||
| This allows more data to be transferred between GPUs in less time while also reducing latency. | ||
|
|
||
| The high bandwidth and low latency make NVLink ideal for applications that require real-time data synchronization and processing, such as AI and HPC use-case scenarios. | ||
| NVLink provides up to 900 GB/s total bandwidth for multi-GPU I/O and shared memory accesses, which is 7x the bandwidth of PCIe Gen 5. | ||
| NVLink allows direct GPU-to-GPU interconnection, improving data transfer efficiency and reducing the need for CPU intervention, which can introduce bottlenecks. | ||
|
|
||
| NVLink supports the connection of multiple GPUs, enabling the creation of powerful multi-GPU systems capable of handling more complex and demanding workloads. | ||
| Unified Memory Access allows GPUs to access each other's memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads. | ||
|
|
||
| ### Comparison: NVLink vs. PCIe | ||
| NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is specifically designed to address the bandwidth and latency bottlenecks of PCIe in multi-GPU setups. | ||
|
|
||
| | Feature | NVLink 4.0 (H100-SGX) | PCIe 5.0 | | ||
| |-------------------|---------------------------|------------------------------------| | ||
| | **Use case** | High-performance computing, deep learning | General-purpose computing, graphics | | ||
| | **Bandwidth** | Up to 900 GB/s (aggregate, multi-GPU) | 128 GB/s (x16 bidirectional) | | ||
| | **Latency** | Lower than PCIe (sub-microsecond) | Higher compared to NVLink | | ||
| | **Communication** | Direct GPU-to-GPU | Through CPU or PCIe switch | | ||
| | **Memory sharing** | Unified memory space across GPUs | Requires CPU intervention (higher overhead) | | ||
| | **Scalability** | Multi-GPU direct connection via NVSwitch | Limited by PCIe lanes | | ||
| | **Efficiency** | Optimized for GPU workloads | More general-purpose | | ||
|
|
||
| In summary, NVLink, available on [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table), is **superior** for **multi-GPU AI and HPC** workloads due to its **higher bandwidth, lower latency, and memory-sharing capabilities**, while PCIe remains essential for broader system connectivity and general computing. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.