Can't test on multi GPU AMD system #316

dimon777 · 2025-11-09T04:20:26Z

dimon777
Nov 9, 2025

I have Debian 12 system with two AMD R9700 GPUs. ./make.sh test runs fine on one GPU, but I can't run the built-in simulation on both GPUs:
$ ./bin/FluidX3D 0 1 reports:

| Device ID    0 | AMD Radeon AI PRO R9700                                    |
| Device ID    1 | AMD Radeon AI PRO R9700                                    |
|----------------'------------------------------------------------------------|
| Warning: Incorrect number of devices specified. Using single fastest device |
|          for all domains.

And test still runs on one GPU. What do I do wrong?

ProjectPhysX · 2025-11-09T08:08:30Z

ProjectPhysX
Nov 9, 2025
Maintainer

Hi @dimon777,

just assigning 2 device IDs through ./make.sh 0 1 doesn't tell FluidX3D how the 3D simulation box should be split into domains - 2x1x1, 1x2x1, or 1x1x2?

For multi-GPU benchmarking/simulations you need to make some small adjustments to src/setup.cpp:

change LBM constructor to use 2x1x1 devices
for the benchmark increase grid resolution / memory allocation to ~95% of VRAM capacity - to reflect multi-GPU use-case where you want as much VRAM as possible, and to minimize domain communication overhead relative to the time spent on computing domains)
reduce number of timesteps in benchmark so that it doesn't take forever

#ifdef BENCHMARK
#include "info.hpp"
void main_setup() { // benchmark; required extensions in defines.hpp: BENCHMARK, optionally FP16S or FP16C
	// ################################################################## define simulation box size, viscosity and volume force ###################################################################
	uint mlups = 0u; {
                //LBM lbm( 32u,  32u,  32u, 1.0f);
                //LBM lbm( 64u,  64u,  64u, 1.0f);
                //LBM lbm(128u, 128u, 128u, 1.0f);
-               LBM lbm(256u, 256u, 256u, 1.0f); // default
+               //LBM lbm(256u, 256u, 256u, 1.0f); // default
                //LBM lbm(384u, 384u, 384u, 1.0f);
                //LBM lbm(512u, 512u, 512u, 1.0f);

-               //const uint memory = 31500u; // memory occupation in MB (for multi-GPU benchmarks: make this close to as large as the GPU's VRAM capacity)
-               //const uint3 lbm_N = (resolution(float3(1.0f, 1.0f, 1.0f), memory)/4u)*4u; // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
+               const uint memory = 1488u; // memory occupation in MB (for multi-GPU benchmarks: make this close to as large as the GPU's VRAM capacity)
+               const uint3 lbm_N = (resolution(float3(1.0f, 1.0f, 1.0f), memory)/4u)*4u; // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
                //LBM lbm(1u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 1u, 1u, 1u, 1.0f); // 1 GPU
-               //LBM lbm(2u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 2u, 1u, 1u, 1.0f); // 2 GPUs
+               LBM lbm(2u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 2u, 1u, 1u, 1.0f); // 2 GPUs
                //LBM lbm(2u*lbm_N.x, 2u*lbm_N.y, 1u*lbm_N.z, 2u, 2u, 1u, 1.0f); // 4 GPUs
                //LBM lbm(2u*lbm_N.x, 2u*lbm_N.y, 2u*lbm_N.z, 2u, 2u, 2u, 1.0f); // 8 GPUs

                // #########################################################################################################################################################################################
-               for(uint i=0u; i<1000u; i++) {
-                       lbm.run(10u, 1000u*10u);
+               for(uint i=0u; i<100u; i++) {
+                       lbm.run(10u, 100u*10u);
                        mlups = max(mlups, to_uint((double)lbm.get_N()*1E-6/info.runtime_lbm_timestep_smooth));
                }
	} // make lbm object go out of scope to free its memory
	print_info("Peak MLUPs/s = "+to_string(mlups));
#if defined(_WIN32)
	wait();
#endif // Windows
} /**/
#endif // BENCHMARK

Then, run with ./make.sh 0 1

FP16S mode will be chosen by default.
To benchmark also ´FP32´ mode, edit src/defines.hpp and comment out //#define FP16S.
To benchmark FP16C mode, uncomment #define FP16C

Thanks and kind regards,
Moritz

1 reply

dimon777 Nov 10, 2025
Author

Yep, thank you Moritz. I figured it out and run 2 GPU test.
Actually, after adjusting per your changes, my number are much much better. Will upload it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't test on multi GPU AMD system #316

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Can't test on multi GPU AMD system #316

Uh oh!

dimon777 Nov 9, 2025

Replies: 1 comment · 1 reply

Uh oh!

ProjectPhysX Nov 9, 2025 Maintainer

Uh oh!

Uh oh!

dimon777 Nov 10, 2025 Author

dimon777
Nov 9, 2025

Replies: 1 comment 1 reply

ProjectPhysX
Nov 9, 2025
Maintainer

dimon777 Nov 10, 2025
Author