Does BackendAI's Fractional GPU Show Allocated GPU Fractions From Multiple Physical GPUs as One? #3856

Yeosangho · 2025-02-27T13:21:35Z

Yeosangho
Feb 27, 2025

Hello, I am a big fan of backend.ai's fractional GPU technology!

Since I first encountered this technology around October 2023, I have been thinking a lot about how it is implemented.
Recently, I heard that when multiple physical GPUs exist, backend.ai can make them appear as a single GPU when allocated as fractional GPUs.

For example, if the remaining capacity of physical GPUs(pGPUs) is 0.1, 0.1, 0.1 at each pGPU, and the size of the requested fGPU is equivalent to 0.3 pGPU, it would appear as if a single GPU with a size of 0.3 was allocated to the container, even if three physical GPU is allocated in the container actually. As a result, I heard that the user don't need to change their training code (i.e., it don't even need to modify their model code to work with multiple GPUs.)

Is this correct?

I have heard many rumors about how the Fractional GPU technology is implemented from external sources. I thought the only way to find out whether they are true or not is to ask directly. Since it's the property of Backend.AI, there might be parts that are difficult to answer... but I would be really grateful if you could provide even a small response!

Answered by achimnol

Mar 26, 2025

When fGPU is enabled, Backend.AI splits large GPUs into smaller GPUs, but does not "merge" smaller GPUs into a large one. Though, it helps automatic configuration of multi-node, multi-GPU workloads by providing GPU-config environment variables along with homogeneously-sized fractions when it splits the GPUs to satisfy the resource requirements.

View full answer

Yaminyam · 2025-03-25T05:21:15Z

Yaminyam
Mar 25, 2025
Maintainer

As you said, Backend.AI uses fgpu to allow you to use multiple pgpu allocations without changing the code.
Please see #3253
You can refer to that issue for more details.

3 replies

Yeosangho Mar 25, 2025
Author

First of all, thank you for your response.

As you mentioned in your email, since the detailed implementation is considered the property of Backend.AI, I will refrain from asking further about the specifics.

However, could you clarify what level of code modification is considered unnecessary at the user level?

For example, in the issue you shared, GPU_COUNT appears to represent the number of physical accelerators visible in the actual container environment. From what I understand, the reason a variable like GPU_COUNT is necessary is that the number of GPUs allocated as fGPUs may differ from the actual number of physical GPUs.

That said, based on this assumption and the example application use case shared in the issue, it seems that the application at least needs to know how many physical GPUs have been assigned to the container. For instance, TF_GPU_MEMORY_ALLOC, as seen in https://legacy-docs-oss.rasa.com/docs/rasa/tuning-your-model/#restricting-absolute-gpu-memory-available, is an environment variable used to make the application aware of multiple physical GPUs.

If Backend.AI can indeed merge split spaces from multiple physical GPUs and present them as a single virtual GPU, I believe this is truly a technology worth promoting!

Am I understanding the issue you shared correctly? From what I can tell, Backend.AI’s fGPU does not merge split spaces from multiple physical GPUs into a single virtual GPU, but rather relies on existing applications that can easily make use of multiple GPUs.

achimnol Mar 26, 2025
Maintainer

When fGPU is enabled, Backend.AI splits large GPUs into smaller GPUs, but does not "merge" smaller GPUs into a large one. Though, it helps automatic configuration of multi-node, multi-GPU workloads by providing GPU-config environment variables along with homogeneously-sized fractions when it splits the GPUs to satisfy the resource requirements.

Answer selected by Yeosangho

Yeosangho Mar 27, 2025
Author

Thank you for explaining it so clearly!

I was confused because I had previously heard about Backend.AI merging multiple partitioned GPUs into a single virtual GPU.

In fact, I also think that since the code changes for distributed deep learning training/inference are minimal, even if multiple partitioned GPUs are assigned to a single instance, the application-level code modifications can be considered minimal.

Additionally, What's interesting about Backend.AI's GPU partitioning is that it can handle situations where GPU fragmentation occurs. Since GPU fragmentation inevitably happens when allocating partitioned GPUs, the ability to schedule efficiently in these problem situations seems to be a good point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does BackendAI's Fractional GPU Show Allocated GPU Fractions From Multiple Physical GPUs as One? #3856

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Does BackendAI's Fractional GPU Show Allocated GPU Fractions From Multiple Physical GPUs as One? #3856

Uh oh!

Yeosangho Feb 27, 2025

Replies: 1 comment · 3 replies

Uh oh!

Yaminyam Mar 25, 2025 Maintainer

Uh oh!

Uh oh!

Yeosangho Mar 25, 2025 Author

Uh oh!

Uh oh!

achimnol Mar 26, 2025 Maintainer

Uh oh!

Uh oh!

Yeosangho Mar 27, 2025 Author

Yeosangho
Feb 27, 2025

Replies: 1 comment 3 replies

Yaminyam
Mar 25, 2025
Maintainer

Yeosangho Mar 25, 2025
Author

achimnol Mar 26, 2025
Maintainer

Yeosangho Mar 27, 2025
Author