LAMMPS: Resource exhausted on Tesla P100 GPU #294

LiangMD-BGI · 2020-11-16T21:07:33Z

LiangMD-BGI
Nov 16, 2020

Hello All,

I run the GPU-based LAMMPS with deepmd potential on a single Tesla P100 GPU (16 GB Memory Capacity).
My system contains 21600 atoms. I got an error showing that memory resource is exhausted. Is there a method to reduce memory usage?

The details of the error is shown as follows:

2020-11-16 21:54:46.278230: W tensorflow/core/common_runtime/bfc_allocator.cc:429] _____***_****__________****___
2020-11-16 21:54:46.278259: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at concat_op.cc:153 : Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Resource exhausted: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node filter_type_1/concat_4}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[o_force/_27]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node filter_type_1/concat_4}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Many thanks,
Liang

Answered by jameswind

Nov 16, 2020

We will have a good method for this soon. But for now, you can only reduce memory issues by using smaller network sizes or using multiple GPUs with the mpi version of LAMMPS.

View full answer

jameswind · 2020-11-16T23:38:24Z

jameswind
Nov 16, 2020
Maintainer

We will have a good method for this soon. But for now, you can only reduce memory issues by using smaller network sizes or using multiple GPUs with the mpi version of LAMMPS.

…

On Tue, Nov 17, 2020 at 5:07 AM LiangMD-BGI ***@***.***> wrote: Hello All, I run the GPU-based LAMMPS with deepmd potential on a single Tesla P100 GPU (16 GB Memory Capacity). My system contains 21600 atoms. I got an error showing that memory resource is exhausted. Is there a method to reduce memory usage? The details of the error is shown as follows: 2020-11-16 21:54:46.278230: W tensorflow/core/common_runtime/bfc_allocator.cc:429] *_____**** *_****__________****___* 2020-11-16 21:54:46.278259: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at concat_op.cc:153 : Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Resource exhausted: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node filter_type_1/concat_4}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[o_force/_27]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. (1) Resource exhausted: OOM when allocating tensor with shape[10800,200,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node filter_type_1/concat_4}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Many thanks, Liang — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/deepmodeling/deepmd-kit/issues/294>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJ6DC63UVPXPBKFGD5Y6DDSQGICLANCNFSM4TXWNJDA> .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LAMMPS: Resource exhausted on Tesla P100 GPU #294

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LAMMPS: Resource exhausted on Tesla P100 GPU #294

Uh oh!

LiangMD-BGI Nov 16, 2020

Replies: 1 comment

Uh oh!

jameswind Nov 16, 2020 Maintainer

LiangMD-BGI
Nov 16, 2020

jameswind
Nov 16, 2020
Maintainer