-
Notifications
You must be signed in to change notification settings - Fork 571
Replies: 3 comments · 5 replies
-
OOM error. Please reduce the number of atoms in your system or use increase the number of workers in your parallel MD simulations. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Okey, thank you very much |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi. Dr. Wang, I used Bohrium's v100 (12-core 92GB) to calculate, but the above error still occurred. According to the previous test, when se_atten is selected, as long as the lammps calculation does not exceed 1000 atoms, this error will not occur, but the system of no more than 1000 atoms is too small for the dynamic calculation we need. Thank you again! LAMMPS (23 Jun 2022 - Update 1)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Loaded 1 plugins from /opt/deepmd-kit-2.1.5/lib/deepmd_lmp
Reading data file ...
orthogonal box = (0 0 0) to (41.652384 41.652384 41.652384)
1 by 1 by 1 MPI processor grid
reading atoms ...
5632 atoms
reading velocities ...
5632 velocities
read_data CPU = 0.015 seconds
Summary of lammps deepmd module ...
>>> Info of deepmd-kit:
installed to: /opt/deepmd-kit-2.1.5
source: v2.1.5
source branch: HEAD
source commit: 6e3d4a62
source commit at: 2022-09-23 16:10:28 +0800
surpport model ver.:1.1
build float prec: double
build variant: cuda
build with tf inc: /opt/deepmd-kit-2.1.5/include;/opt/deepmd-kit-2.1.5/include
build with tf lib: /opt/deepmd-kit-2.1.5/lib/libtensorflow_cc.so;/opt/deepmd-kit-2.1.5/lib/libtensorflow_framework.so
set tf intra_op_parallelism_threads: 170926128
set tf inter_op_parallelism_threads: 876097845
>>> Info of lammps module:
use deepmd-kit at: /opt/deepmd-kit-2.1.5DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance.
2023-06-07 11:22:16.777657: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-07 11:22:16.781387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:16.824918: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:16.825195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:17.421955: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:17.422210: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:17.422394: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-07 11:22:17.422577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29259 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2023-06-07 11:22:17.422889: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2023-06-07 11:22:17.471963: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
>>> Info of model(s):
using 1 model(s): graph.pb
rcut in model: 6
ntypes in model: 7
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Generated 0 of 21 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 7
ghost atom cutoff = 7
binsize = 3.5, bins = 12 12 12
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: full, newton on
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.001
2023-06-07 11:22:29.723924: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.93GiB (rounded to 2076180480)requested by op attention_layer_1/c_out/MatMul
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-06-07 11:22:29.723984: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] BFCAllocator dump for GPU_0_bfc
2023-06-07 11:22:29.724013: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (256): Total Chunks: 19, Chunks in use: 19. 4.8KiB allocated for chunks. 4.8KiB in use in bin. 200B client-requested in use in bin.
2023-06-07 11:22:29.724030: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (512): Total Chunks: 2, Chunks in use: 2. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.1KiB client-requested in use in bin.
2023-06-07 11:22:29.724046: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (1024): Total Chunks: 8, Chunks in use: 8. 8.5KiB allocated for chunks. 8.5KiB in use in bin. 8.3KiB client-requested in use in bin.
2023-06-07 11:22:29.724061: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (2048): Total Chunks: 13, Chunks in use: 13. 29.5KiB allocated for chunks. 29.5KiB in use in bin. 28.8KiB client-requested in use in bin.
2023-06-07 11:22:29.724074: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-07 11:22:29.724089: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (8192): Total Chunks: 1, Chunks in use: 1. 10.8KiB allocated for chunks. 10.8KiB in use in bin. 10.6KiB client-requested in use in bin.
2023-06-07 11:22:29.724100: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (16384): Total Chunks: 2, Chunks in use: 2. 63.0KiB allocated for chunks. 63.0KiB in use in bin. 63.0KiB client-requested in use in bin.
2023-06-07 11:22:29.724110: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (32768): Total Chunks: 2, Chunks in use: 2. 88.0KiB allocated for chunks. 88.0KiB in use in bin. 88.0KiB client-requested in use in bin.
2023-06-07 11:22:29.724121: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (65536): Total Chunks: 1, Chunks in use: 1. 100.0KiB allocated for chunks. 100.0KiB in use in bin. 100.0KiB client-requested in use in bin.
2023-06-07 11:22:29.724130: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-07 11:22:29.724140: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (262144): Total Chunks: 13, Chunks in use: 12. 4.63MiB allocated for chunks. 4.27MiB in use in bin. 4.11MiB client-requested in use in bin.
2023-06-07 11:22:29.724149: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (524288): Total Chunks: 7, Chunks in use: 7. 5.51MiB allocated for chunks. 5.51MiB in use in bin. 5.41MiB client-requested in use in bin.
2023-06-07 11:22:29.724160: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-07 11:22:29.724169: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (2097152): Total Chunks: 1, Chunks in use: 1. 3.09MiB allocated for chunks. 3.09MiB in use in bin. 3.09MiB client-requested in use in bin.
2023-06-07 11:22:29.724179: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (4194304): Total Chunks: 12, Chunks in use: 10. 72.53MiB allocated for chunks. 61.88MiB in use in bin. 61.88MiB client-requested in use in bin.
2023-06-07 11:22:29.724190: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (8388608): Total Chunks: 1, Chunks in use: 1. 10.44MiB allocated for chunks. 10.44MiB in use in bin. 6.19MiB client-requested in use in bin.
2023-06-07 11:22:29.724201: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (16777216): Total Chunks: 4, Chunks in use: 4. 71.69MiB allocated for chunks. 71.69MiB in use in bin. 65.08MiB client-requested in use in bin.
2023-06-07 11:22:29.724211: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (33554432): Total Chunks: 1, Chunks in use: 1. 32.00MiB allocated for chunks. 32.00MiB in use in bin. 24.75MiB client-requested in use in bin.
2023-06-07 11:22:29.724221: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (67108864): Total Chunks: 1, Chunks in use: 0. 126.81MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-07 11:22:29.724231: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (134217728): Total Chunks: 2, Chunks in use: 1. 264.00MiB allocated for chunks. 128.00MiB in use in bin. 74.25MiB client-requested in use in bin.
2023-06-07 11:22:29.724242: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (268435456): Total Chunks: 31, Chunks in use: 26. 28.00GiB allocated for chunks. 25.35GiB in use in bin. 25.33GiB client-requested in use in bin.
2023-06-07 11:22:29.724253: I tensorflow/core/common_runtime/bfc_allocator.cc:1050] Bin for 1.93GiB was 256.00MiB, Chunk State:
2023-06-07 11:22:29.724265: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Size: 256.00MiB | Requested Size: 18.56MiB | in_use: 0 | bin_num: 20
2023-06-07 11:22:29.724277: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Size: 266.00MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 792.00MiB | Requested Size: 792.00MiB | in_use: 1 | bin_num: -1
2023-06-07 11:22:29.724290: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Size: 396.00MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 792.00MiB | Requested Size: 792.00MiB | in_use: 1 | bin_num: -1, next: Size: 1.93GiB | Requested Size: 1.93GiB | in_use: 1 | bin_num: -1
2023-06-07 11:22:29.724307: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Size: 767.00MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 792.00MiB | Requested Size: 792.00MiB | in_use: 1 | bin_num: -1
2023-06-07 11:22:29.724318: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Size: 1.00GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 792.00MiB | Requested Size: 792.00MiB | in_use: 1 | bin_num: -1
2023-06-07 11:22:29.724327: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 13431681792
2023-06-07 11:22:29.724337: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa32c000000 of size 2076180480 next 101
2023-06-07 11:22:29.724346: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa3a7c00000 of size 830472192 next 108
2023-06-07 11:22:29.724354: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa3d9400000 of size 830472192 next 110
2023-06-07 11:22:29.724362: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7fa40ac00000 of size 415236096 next 105
2023-06-07 11:22:29.724370: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa423800000 of size 2076180480 next 106
2023-06-07 11:22:29.724378: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa49f400000 of size 830472192 next 114
2023-06-07 11:22:29.724386: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa4d0c00000 of size 830472192 next 115
2023-06-07 11:22:29.724393: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa502400000 of size 830472192 next 116
2023-06-07 11:22:29.724402: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa533c00000 of size 934281216 next 117
2023-06-07 11:22:29.724408: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa56b700000 of size 934281216 next 118
2023-06-07 11:22:29.724415: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa5a3200000 of size 934281216 next 119
2023-06-07 11:22:29.724422: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa5dad00000 of size 830472192 next 120
2023-06-07 11:22:29.724430: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7fa60c500000 of size 1078407936 next 18446744073709551615
2023-06-07 11:22:29.724438: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 8589934592
2023-06-07 11:22:29.724446: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa660000000 of size 830472192 next 87
2023-06-07 11:22:29.724454: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa691800000 of size 830472192 next 88
2023-06-07 11:22:29.724461: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa6c3000000 of size 830472192 next 93
2023-06-07 11:22:29.724469: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa6f4800000 of size 830472192 next 94
2023-06-07 11:22:29.724477: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa726000000 of size 830472192 next 95
2023-06-07 11:22:29.724484: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa757800000 of size 934281216 next 96
2023-06-07 11:22:29.724492: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa78f300000 of size 934281216 next 97
2023-06-07 11:22:29.724499: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa7c6e00000 of size 934281216 next 98
2023-06-07 11:22:29.724507: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa7fe900000 of size 830472192 next 109
2023-06-07 11:22:29.724515: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7fa830100000 of size 804257792 next 18446744073709551615
2023-06-07 11:22:29.724523: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 4294967296
2023-06-07 11:22:29.724531: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa86a000000 of size 2076180480 next 83
2023-06-07 11:22:29.724538: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa8e5c00000 of size 2076180480 next 103
2023-06-07 11:22:29.724546: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7fa961800000 of size 142606336 next 18446744073709551615
2023-06-07 11:22:29.724554: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 2147483648
2023-06-07 11:22:29.724562: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa970000000 of size 1038090240 next 81
2023-06-07 11:22:29.724570: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fa9ade00000 of size 830472192 next 85
2023-06-07 11:22:29.724577: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7fa9df600000 of size 278921216 next 18446744073709551615
2023-06-07 11:22:29.724585: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 1073741824
2023-06-07 11:22:29.724593: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa20000000 of size 934281216 next 80
2023-06-07 11:22:29.724598: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa57b00000 of size 6488064 next 111
2023-06-07 11:22:29.724606: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7faa58130000 of size 132972544 next 18446744073709551615
2023-06-07 11:22:29.724614: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 536870912
2023-06-07 11:22:29.724622: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa60000000 of size 536870912 next 18446744073709551615
2023-06-07 11:22:29.724629: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 268435456
2023-06-07 11:22:29.724637: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7faa80000000 of size 268435456 next 18446744073709551615
2023-06-07 11:22:29.724645: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 134217728
2023-06-07 11:22:29.724653: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa90000000 of size 19464192 next 65
2023-06-07 11:22:29.724661: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa91290000 of size 19464192 next 78
2023-06-07 11:22:29.724669: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa92520000 of size 6488064 next 89
2023-06-07 11:22:29.724688: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa92b50000 of size 6488064 next 84
2023-06-07 11:22:29.724696: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa93180000 of size 6488064 next 102
2023-06-07 11:22:29.724704: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa937b0000 of size 6488064 next 99
2023-06-07 11:22:29.724712: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa93de0000 of size 6488064 next 74
2023-06-07 11:22:29.724720: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa94410000 of size 6488064 next 57
2023-06-07 11:22:29.724728: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7faa94a40000 of size 6488064 next 59
2023-06-07 11:22:29.724735: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa95070000 of size 6488064 next 60
2023-06-07 11:22:29.724743: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa956a0000 of size 19464192 next 67
2023-06-07 11:22:29.724751: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa96930000 of size 6488064 next 68
2023-06-07 11:22:29.724758: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa96f60000 of size 6488064 next 77
2023-06-07 11:22:29.724766: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa97590000 of size 10944512 next 18446744073709551615
2023-06-07 11:22:29.724774: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 134217728
2023-06-07 11:22:29.724782: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faa98000000 of size 134217728 next 18446744073709551615
2023-06-07 11:22:29.724790: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 33554432
2023-06-07 11:22:29.724797: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa0000000 of size 33554432 next 18446744073709551615
2023-06-07 11:22:29.724805: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 16777216
2023-06-07 11:22:29.724813: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa2000000 of size 16777216 next 18446744073709551615
2023-06-07 11:22:29.724821: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 16777216
2023-06-07 11:22:29.724829: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3000000 of size 1024 next 25
2023-06-07 11:22:29.724837: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3000400 of size 327680 next 26
2023-06-07 11:22:29.724846: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3050400 of size 1024 next 27
2023-06-07 11:22:29.724853: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3050800 of size 256 next 28
2023-06-07 11:22:29.724861: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3050900 of size 256 next 29
2023-06-07 11:22:29.724869: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3050a00 of size 327680 next 30
2023-06-07 11:22:29.724877: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa30a0a00 of size 1024 next 31
2023-06-07 11:22:29.724885: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa30a0e00 of size 327680 next 32
2023-06-07 11:22:29.724893: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa30f0e00 of size 2560 next 33
2023-06-07 11:22:29.724900: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa30f1800 of size 409600 next 34
2023-06-07 11:22:29.724906: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3155800 of size 2560 next 35
2023-06-07 11:22:29.724914: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3156200 of size 102400 next 36
2023-06-07 11:22:29.724922: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa316f200 of size 1280 next 37
2023-06-07 11:22:29.724928: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa316f700 of size 512 next 38
2023-06-07 11:22:29.724936: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa316f900 of size 11008 next 39
2023-06-07 11:22:29.724944: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3172400 of size 768 next 40
2023-06-07 11:22:29.724952: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3172700 of size 327680 next 41
2023-06-07 11:22:29.724960: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa31c2700 of size 1024 next 42
2023-06-07 11:22:29.724966: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa31c2b00 of size 256 next 43
2023-06-07 11:22:29.724971: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa31c2c00 of size 327680 next 44
2023-06-07 11:22:29.724979: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3212c00 of size 2560 next 45
2023-06-07 11:22:29.724987: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3213600 of size 32256 next 46
2023-06-07 11:22:29.725000: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa321b400 of size 32256 next 47
2023-06-07 11:22:29.725009: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223200 of size 256 next 48
2023-06-07 11:22:29.725016: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223300 of size 256 next 49
2023-06-07 11:22:29.725022: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223400 of size 256 next 50
2023-06-07 11:22:29.725029: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223500 of size 256 next 51
2023-06-07 11:22:29.725037: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223600 of size 256 next 52
2023-06-07 11:22:29.725045: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223700 of size 256 next 53
2023-06-07 11:22:29.725052: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223800 of size 256 next 54
2023-06-07 11:22:29.725058: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223900 of size 256 next 55
2023-06-07 11:22:29.725065: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223a00 of size 256 next 56
2023-06-07 11:22:29.725071: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223b00 of size 256 next 104
2023-06-07 11:22:29.725078: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3223c00 of size 256 next 107
2023-06-07 11:22:29.725090: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7faaa3223d00 of size 381696 next 61
2023-06-07 11:22:29.725098: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3281000 of size 3244032 next 66
2023-06-07 11:22:29.725106: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3599000 of size 811008 next 69
2023-06-07 11:22:29.725114: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa365f000 of size 811008 next 90
2023-06-07 11:22:29.725122: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3725000 of size 811008 next 91
2023-06-07 11:22:29.725130: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa37eb000 of size 811008 next 92
2023-06-07 11:22:29.725136: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa38b1000 of size 909824 next 71
2023-06-07 11:22:29.725144: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa398f200 of size 45056 next 72
2023-06-07 11:22:29.725152: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa399a200 of size 360448 next 73
2023-06-07 11:22:29.725158: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa39f2200 of size 45056 next 75
2023-06-07 11:22:29.725166: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa39fd200 of size 811008 next 112
2023-06-07 11:22:29.725172: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faaa3ac3200 of size 811008 next 113
2023-06-07 11:22:29.725179: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free at 7faaa3b89200 of size 4681216 next 18446744073709551615
2023-06-07 11:22:29.725187: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 2097152
2023-06-07 11:22:29.725193: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6400000 of size 256 next 1
2023-06-07 11:22:29.725201: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6400100 of size 1280 next 2
2023-06-07 11:22:29.725208: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6400600 of size 256 next 3
2023-06-07 11:22:29.725214: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6400700 of size 256 next 4
2023-06-07 11:22:29.725221: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6400800 of size 2048 next 5
2023-06-07 11:22:29.725229: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6401000 of size 2048 next 6
2023-06-07 11:22:29.725237: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6401800 of size 2048 next 7
2023-06-07 11:22:29.725244: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6402000 of size 460800 next 8
2023-06-07 11:22:29.725252: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6472800 of size 2048 next 9
2023-06-07 11:22:29.725260: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6473000 of size 2048 next 10
2023-06-07 11:22:29.725267: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6473800 of size 2048 next 11
2023-06-07 11:22:29.725275: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6474000 of size 460800 next 12
2023-06-07 11:22:29.725283: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac64e4800 of size 256 next 14
2023-06-07 11:22:29.725291: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac64e4900 of size 2560 next 15
2023-06-07 11:22:29.725298: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac64e5300 of size 2560 next 16
2023-06-07 11:22:29.725304: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac64e5d00 of size 327680 next 17
2023-06-07 11:22:29.725311: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6535d00 of size 1024 next 18
2023-06-07 11:22:29.725318: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6536100 of size 327680 next 19
2023-06-07 11:22:29.725326: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6586100 of size 1024 next 20
2023-06-07 11:22:29.725333: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6586500 of size 2560 next 21
2023-06-07 11:22:29.725341: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6586f00 of size 256 next 22
2023-06-07 11:22:29.725349: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6587000 of size 2560 next 23
2023-06-07 11:22:29.725356: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7faac6587a00 of size 493056 next 18446744073709551615
2023-06-07 11:22:29.725363: I tensorflow/core/common_runtime/bfc_allocator.cc:1088] Summary of in-use Chunks by size:
2023-06-07 11:22:29.725372: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 19 Chunks of size 256 totalling 4.8KiB
2023-06-07 11:22:29.725381: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 512 totalling 512B
2023-06-07 11:22:29.725389: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 768 totalling 768B
2023-06-07 11:22:29.725397: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 1024 totalling 6.0KiB
2023-06-07 11:22:29.725405: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 1280 totalling 2.5KiB
2023-06-07 11:22:29.725414: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 2048 totalling 12.0KiB
2023-06-07 11:22:29.725422: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 7 Chunks of size 2560 totalling 17.5KiB
2023-06-07 11:22:29.725431: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 11008 totalling 10.8KiB
2023-06-07 11:22:29.725440: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 32256 totalling 63.0KiB
2023-06-07 11:22:29.725448: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 45056 totalling 88.0KiB
2023-06-07 11:22:29.725457: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 102400 totalling 100.0KiB
2023-06-07 11:22:29.725463: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 7 Chunks of size 327680 totalling 2.19MiB
2023-06-07 11:22:29.725471: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 360448 totalling 352.0KiB
2023-06-07 11:22:29.725480: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 409600 totalling 400.0KiB
2023-06-07 11:22:29.725488: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 460800 totalling 900.0KiB
2023-06-07 11:22:29.725494: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 493056 totalling 481.5KiB
2023-06-07 11:22:29.725503: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 811008 totalling 4.64MiB
2023-06-07 11:22:29.725510: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 909824 totalling 888.5KiB
2023-06-07 11:22:29.725518: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 3244032 totalling 3.09MiB
2023-06-07 11:22:29.725527: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 10 Chunks of size 6488064 totalling 61.88MiB
2023-06-07 11:22:29.725535: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 10944512 totalling 10.44MiB
2023-06-07 11:22:29.725542: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 16777216 totalling 16.00MiB
2023-06-07 11:22:29.725550: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 19464192 totalling 55.69MiB
2023-06-07 11:22:29.725559: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 33554432 totalling 32.00MiB
2023-06-07 11:22:29.725565: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 134217728 totalling 128.00MiB
2023-06-07 11:22:29.725573: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 536870912 totalling 512.00MiB
2023-06-07 11:22:29.725582: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 13 Chunks of size 830472192 totalling 10.05GiB
2023-06-07 11:22:29.725589: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 7 Chunks of size 934281216 totalling 6.09GiB
2023-06-07 11:22:29.725597: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1038090240 totalling 990.00MiB
2023-06-07 11:22:29.725608: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 2076180480 totalling 7.73GiB
2023-06-07 11:22:29.725620: I tensorflow/core/common_runtime/bfc_allocator.cc:1095] Sum Total of in-use chunks: 25.66GiB
2023-06-07 11:22:29.725631: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] total_region_allocated_bytes_: 30680756992 memory_limit_: 30680757043 available bytes: 51 curr_region_allocation_bytes_: 34359738368
2023-06-07 11:22:29.725647: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] Stats:
Limit: 30680757043
InUse: 27548368640
MaxInUse: 27548368640
NumAllocs: 150
MaxAllocSize: 2076180480
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2023-06-07 11:22:29.725668: W tensorflow/core/common_runtime/bfc_allocator.cc:491] *****************************************__***************************_*****************************
2023-06-07 11:22:29.725713: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:681 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[811008,320] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
RESOURCE_EXHAUSTED: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[811008,320] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node attention_layer_1/c_out/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[o_energy/_45]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[811008,320] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node attention_layer_1/c_out/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
ERROR: DeePMD-kit Error: TensorFlow Error: RESOURCE_EXHAUSTED: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[811008,320] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{node attention_layer_1/c_out/MatMul}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[o_energy/_45]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[811008,320] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{node attention_layer_1/c_out/MatMul}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored. (/home/conda/feedstock_root/build_artifacts/libdeepmd_1663923207577/work/source/lmp/pair_deepmd.cpp:390)
Last command: run 10000 |
Beta Was this translation helpful? Give feedback.
All reactions
-
parallel MD simulation with multiple GPUs workers is necessary |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you very much for your reply. I failed to use 2 GPU (P100) cards before. According to your suggestion, I re-paralleled 4 GPU (P100) cards to realize the dynamic calculation of nearly 6000 atoms based on the se_atten model system. May I ask whether Dr. Wang can reduce the parallel implementation of GPU to realize the dynamics of se_atten model in future versions of deepmd-kit? |
Beta Was this translation helpful? Give feedback.
All reactions
-
yes, here it is #2532 |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for your reply, I wish the model compression for se_atten can be applied to the new version as soon as possible. |
Beta Was this translation helpful? Give feedback.
All reactions
-
plz refer to https://docs.deepmodeling.com/projects/deepmd/en/master/troubleshooting/precision.html |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Both version 2.1.5 and 2.2.1 of deepkit have this error, and the specific log information is as follows:
Beta Was this translation helpful? Give feedback.
All reactions