-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
In the HiRep-CUDA branch I tried compile with both WITH_GPU and WITH_NEW_GEOMETRY on and off, in two scenarios it prints different plaquette values in the log file, the one without gpu(cpu only I think) give plaquette around 0.2, while the one with the two options turned on gives an abnormally small plaquette around 1e-3, is this a problem about the cuda code in HiRep?
the input is as follow
// Global variables
GLB_T = 8
GLB_X = 8
GLB_Y = 8
GLB_Z = 8
MPI_BLK_T = 1
MPI_BLK_X = 1
MPI_BLK_Y = 1
MPI_BLK_Z = 1
NP_T = 1
NP_X = 1
NP_Y = 1
NP_Z = 1
N_REP = 1
rlx_level = 1
rlx_seed = 43215
//Logger levels (default = -1)
log:default = -1
log:inverter = -1
log:forcestat = 0
// Pure gauge coupling
beta = 6.1
// Bare anisotropy parameter
anisotropy = 2.464
// number of heathbath steps
nhb = 1
// number of overelaxations steps
nor = 2
WF:make = false
//WF integrators type 0=Euler 1=3rd order Runge-Kutta 2=Adaptive 3rd order Runge-Kutta
WF:integrator = 2
// Wilson Flow time
WF:tmax = 14.0
WF:nmeas = 40
WF:eps = 0.01
WF:delta = 0.00001
WF:anisotropy = 3.0
polyakov:make = false
// Run control variables
therm = 50
run name = test_without_gpu
save freq = 100
conf dir = cnfg
gauge start = ranDoM
//separations steps between each measure
nit = 2
//gauge start = Unit
//gauge start = run1_32x8x8x8nc3b6.000000an1.000000n1
//rlx_state =
//rlx_start =
last conf = +10
and the log is
[READINPUT][0]Warning: input parameter [gpuID] not found in [input_file]!
[GPU_INIT][0]Initializing GPU
[GPU_INIT][0]GPU_ID = 0
[GPU_INIT][10]CUDA Capability Major/Minor version number: 8.9
[GPU_INIT][10]CUDA Driver Version / Runtime Version: 12.40 / 12.30
[GPU_INIT][10]Device: NVIDIA GeForce RTX 4090 D
[GPU_INIT][10]Total amount of global memory: 24210 MB (25386352640B)
[GPU_INIT][10]Memory Clock rate: 10501.001 Mhz
[GPU_INIT][10]Memory Bus Width: 384-bit
[GPU_INIT][10]Maximum memory pitch: 2147483647B
[GPU_INIT][10]Integrated GPU sharing Host Memory: No
[GPU_INIT][10]Support host page-locked memory mapping: Yes
[GPU_INIT][10]Total amount of shared memory per block: 49152B
[GPU_INIT][10]L2 Cache Size: 75497472B
[GPU_INIT][10]Max Texture dimension size (x,y,z): 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
[GPU_INIT][10]Max Layered Texture dimension size (dim) x layers: 1D=(32768) x 2048, 2D=(32768,32768) x 2048
[GPU_INIT][10]Texture alignment: 512B
[GPU_INIT][10]Total amount of constant memory: 65536B
[GPU_INIT][10]Alignment requirement for Surfaces: Yes
[GPU_INIT][10]Multiprocessors: 114
[GPU_INIT][10]GPU Clock Speed: 2.52 GHz
[GPU_INIT][10]Total number of register per block: 65536B
[GPU_INIT][10]Warp size: 32B
[GPU_INIT][10]Maximum number of threads per block: 1024
[GPU_INIT][10]Maximum size of each dimension of a block (x,y,z): (1024,1024,64)
[GPU_INIT][10]Maximum size of each dimension of a grid (x,y,z): (2147483647,65535,65535)
[GPU_INIT][10]Concurrent copy and execution: Yes with 2 copy engine(s)
[GPU_INIT][10]Run time limit on kernels: No
[GPU_INIT][10]Concurrent kernel execution: Yes
[GPU_INIT][10]Peak Memory Bandwidth (GB/s): 1008.1
[GPU_INIT][10]Device has ECC support enabled: No
[GPU_INIT][10]Device is using TCC driver mode: No
[GPU_INIT][10]Device supports Unified Addressing (UVA): Yes
[GPU_INIT][10]Device PCI Bus ID / PCI location ID: 184 / 0
[GPU_INIT][10]Compute Mode:
[GPU_INIT][10] < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
[GPU_INIT][10]MPI implementation CUDA-aware? yes.
[GPU_INIT][0]GPU Affinity: GPU 0 has been bound to MPI Rank 0 (local 0)
[SYSTEM][0]Gauge group: SU(4)
[SYSTEM][0]Fermion representation: dim = 4
[SYSTEM][0][RepID: 0][world_size: 1]
[SYSTEM][0][MPI_ID: 0][MPI_size: 1]
[SYSTEM][0]MACROS=-DBC_T_PERIODIC -DBC_X_PERIODIC -DBC_Y_PERIODIC -DBC_Z_PERIODIC -DNDEBUG -DCHECK_SPINOR_MATCHING -DIO_FLUSH -DWITH_MPI -DWITH_GPU -DWITH_NEW_GEOMETRY -DNG=4 -DGAUGE_SUN -DREPR_FUNDAMENTAL -DREPR_NAME="REPR_FUNDAMENTAL" -D_XOPEN_SOURCE=600
[GEOMETRY][0]WARNING: NO PARALLEL DIMENSIONS SPECIFIED!!!
[GEOMETRY][0]WARNING: THE MPI CODE SHOULD NOT BE USED IN THIS CASE!!!
[GEOMETRY_INIT][0]Global size is 8x8x8x8
[GEOMETRY_INIT][0]Local size is 8x8x8x8
[GEOMETRY_INIT][0]Extended local size is 8x8x8x8
[GEOMETRY_INIT][0]The lattice borders are (0,0,0,0)
[GEOMETRY_INIT][0]Size of the bulk subblocking (2,2,2,2)
[GEOMETRY_INIT][0]MPI Blocking size (1,1,1,1)
[GEOMETRY_INIT][0]Process sign is 0
[GEOMETRY][0]Geometry variable checked
[GEOMETRY DEFINE][1]Define Lattice Geometry...
[GEOMETRY DEFINE][10]Number of parallel directions: 0
[GEOMETRY DEFINE][10]INNER + [ 0, 0, 0, 0]..[ 8, 8, 8, 8] = [ 8, 8, 8, 8] Idx=(0+2048 / 2048+2048)
[GEOMETRY DEFINE][10]Number of boxes = 1
[GEOMETRY DEFINE][1]Define Lattice Geometry... Done.
[READINPUT][0]Warning: input parameter [ranlux store] not found in [input_file]!
[READINPUT][0]Warning: input parameter [ranlux start] not found in [input_file]!
[READINPUT][0]Warning: input parameter [ranlux state] not found in [input_file]!
[SETUP_RANDOM][0]RLXD [1,43215]
[INIT][0]beta=6.100000
[INIT][0]bare anisotropy=2.464000
[INIT][0]nhb=1 nor=2
[FLOW][0]Starting a new run from a random conf!
[INIT][0]Separation between each measure=2
[BCS][0]Gauge field: PERIODIC x PERIODIC PERIODIC PERIODIC
[BCS][0]Fermion fields: PERIODIC x PERIODIC PERIODIC PERIODIC
[INIT WF][0]WF max integration time=14.000000
[INIT WF][0]WF number of measures=40
[INIT WF][0]WF initial epsilon=0.010000
[INIT WF][0]WF delta=0.000010
[INIT WF][0]WF integrator type: 2 (0=Euler 1=3rd order Runge-Kutta 2=Adaptive 3rd order Runge-Kutta)
[MAIN][0]0....20....40....60....80....100
[MAIN][0]Thermalized 50 Trajectories: [4.378251 sec]
[IO][0]Configuration [cnfg/test_with_gpu_8x8x8x8nc4b6.100000an2.464000n0] saved [0.021243 sec]
[MAIN][0]Trajectory #1...
[MAIN][0]Trajectory #1: generated in [0.092085 sec]
[MAIN][0]Plaquette 1.091667828362043180e-03
[MAIN][0]Trajectory #2...
[MAIN][0]Trajectory #2: generated in [0.087007 sec]
[MAIN][0]Plaquette 1.091667828362043180e-03
[MAIN][0]Trajectory #3...
[MAIN][0]Trajectory #3: generated in [0.087001 sec]
[MAIN][0]Plaquette 1.091667828362043180e-03
[MAIN][0]Trajectory #4...
[MAIN][0]Trajectory #4: generated in [0.087561 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #5...
[MAIN][0]Trajectory #5: generated in [0.088205 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #6...
[MAIN][0]Trajectory #6: generated in [0.086934 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #7...
[MAIN][0]Trajectory #7: generated in [0.087564 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #8...
[MAIN][0]Trajectory #8: generated in [0.087141 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #9...
[MAIN][0]Trajectory #9: generated in [0.087159 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #10...
[MAIN][0]Trajectory #10: generated in [0.087201 sec]
[MAIN][0]Plaquette 1.091667828362043397e-03
[MAIN][0]Trajectory #11...
[MAIN][0]Trajectory #11: generated in [0.087848 sec]
[MAIN][0]Plaquette 1.091667828362043830e-03
[MAIN][0]Trajectory #12...
[MAIN][0]Trajectory #12: generated in [0.087028 sec]
[MAIN][0]Plaquette 1.091667828362043830e-03
[IO][0]Configuration [cnfg/test_with_gpu_8x8x8x8nc4b6.100000an2.464000n12] saved [0.021708 sec]
[SYSTEM][0]Process finalized.
[SYSTEM][0]Gauge group: SU(4)
[SYSTEM][0]Fermion representation: dim = 4
[SYSTEM][0][RepID: 0][world_size: 1]
[SYSTEM][0][MPI_ID: 0][MPI_size: 1]
[SYSTEM][0]MACROS=-DBC_T_PERIODIC -DBC_X_PERIODIC -DBC_Y_PERIODIC -DBC_Z_PERIODIC -DNDEBUG -DCHECK_SPINOR_MATCHING -DIO_FLUSH -DWITH_MPI -DNG=4 -DGAUGE_SUN -DREPR_FUNDAMENTAL -DREPR_NAME="REPR_FUNDAMENTAL" -D_XOPEN_SOURCE=600
[GEOMETRY][0]WARNING: NO PARALLEL DIMENSIONS SPECIFIED!!!
[GEOMETRY][0]WARNING: THE MPI CODE SHOULD NOT BE USED IN THIS CASE!!!
[GEOMETRY_INIT][0]Global size is 8x8x8x8
[GEOMETRY_INIT][0]Local size is 8x8x8x8
[GEOMETRY_INIT][0]Extended local size is 8x8x8x8
[GEOMETRY_INIT][0]The lattice borders are (0,0,0,0)
[GEOMETRY_INIT][0]Size of the bulk subblocking (2,2,2,2)
[GEOMETRY_INIT][0]MPI Blocking size (1,1,1,1)
[GEOMETRY_INIT][0]Process sign is 0
[GEOMETRY][0]Geometry variable checked
[GEOMETRY][0]Gauge field: size 4096 nbuffer 0
[GEOMETRY][0]Spinor field (EO): size 4096 nbuffer 0
[GEOMETRY][0]Even Spinor field: size 2048 nbuffer 0
[GEOMETRY][0]Odd Spinor field: size 2048 nbuffer 0
[READINPUT][0]Warning: input parameter [ranlux store] not found in [input_file]!
[READINPUT][0]Warning: input parameter [ranlux start] not found in [input_file]!
[READINPUT][0]Warning: input parameter [ranlux state] not found in [input_file]!
[SETUP_RANDOM][0]RLXD [1,43215]
[INIT][0]beta=6.100000
[INIT][0]bare anisotropy=2.464000
[INIT][0]nhb=1 nor=2
[FLOW][0]Starting a new run from a random conf!
[INIT][0]Separation between each measure=2
[BCS][0]Gauge field: PERIODIC x PERIODIC PERIODIC PERIODIC
[BCS][0]Fermion fields: PERIODIC x PERIODIC PERIODIC PERIODIC
[INIT WF][0]WF max integration time=14.000000
[INIT WF][0]WF number of measures=40
[INIT WF][0]WF initial epsilon=0.010000
[INIT WF][0]WF delta=0.000010
[INIT WF][0]WF integrator type: 2 (0=Euler 1=3rd order Runge-Kutta 2=Adaptive 3rd order Runge-Kutta)
[MAIN][0]0....20....40....60....80....100
[MAIN][0]Thermalized 50 Trajectories: [4.505385 sec]
[IO][0]Configuration [cnfg/test_without_gpu_8x8x8x8nc4b6.100000an2.464000n0] saved [0.021938 sec]
[MAIN][0]Trajectory #1...
[MAIN][0]Trajectory #1: generated in [0.095477 sec]
[MAIN][0]Plaquette 2.066603317285517771e-01
[MAIN][0]Trajectory #2...
[MAIN][0]Trajectory #2: generated in [0.087862 sec]
[MAIN][0]Plaquette 2.089974323752936680e-01
[MAIN][0]Trajectory #3...
[MAIN][0]Trajectory #3: generated in [0.087542 sec]
[MAIN][0]Plaquette 2.071824100709576122e-01
[MAIN][0]Trajectory #4...
[MAIN][0]Trajectory #4: generated in [0.095606 sec]
[MAIN][0]Plaquette 2.079944220967428758e-01
[MAIN][0]Trajectory #5...
[MAIN][0]Trajectory #5: generated in [0.087452 sec]
[MAIN][0]Plaquette 2.098374901582525653e-01
[MAIN][0]Trajectory #6...
[MAIN][0]Trajectory #6: generated in [0.087462 sec]
[MAIN][0]Plaquette 2.096473368572122353e-01
[MAIN][0]Trajectory #7...
[MAIN][0]Trajectory #7: generated in [0.095455 sec]
[MAIN][0]Plaquette 2.096424826397468577e-01
[MAIN][0]Trajectory #8...
[MAIN][0]Trajectory #8: generated in [0.087449 sec]
[MAIN][0]Plaquette 2.092673785469066905e-01
[MAIN][0]Trajectory #9...
[MAIN][0]Trajectory #9: generated in [0.087551 sec]
[MAIN][0]Plaquette 2.090081160849926734e-01
[MAIN][0]Trajectory #10...
[MAIN][0]Trajectory #10: generated in [0.087924 sec]
[MAIN][0]Plaquette 2.069269521540821888e-01
[MAIN][0]Trajectory #11...
[MAIN][0]Trajectory #11: generated in [0.095465 sec]
[MAIN][0]Plaquette 2.068276567919517250e-01
[MAIN][0]Trajectory #12...
[MAIN][0]Trajectory #12: generated in [0.087473 sec]
[MAIN][0]Plaquette 2.087220580233619671e-01
[IO][0]Configuration [cnfg/test_without_gpu_8x8x8x8nc4b6.100000an2.464000n12] saved [0.020612 sec]
[SYSTEM][0]Process finalized.
the only differences between the input file in two cases is the job name, one is with_gpu and one is without_gpu
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels