Replies: 2 comments
-
Please try:
1) nvidia-smi on your computation node to see a GPU is working or not;
2) test it using the training example. Use water_smth.json as the input
file. The training time should be around 2s for 100 batches.
Best,
Linfeng
…On Thu, May 30, 2019 at 11:20 PM fkxie ***@***.***> wrote:
Hi,
I want to train data using gpu accelerate.
When I use dp_test, dp_frz , they all abort:
2019-05-30 15:09:16.824616: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-05-30 15:09:22.765409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-05-30 15:09:23.098334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-05-30 15:09:23.437009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
But when I try dp_train, there's no information about gpu dumped. I think
it's still using cpu for train. And I think it's maybe a bug.
F.K.xie
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/deepmodeling/deepmd-kit/issues/35?email_source=notifications&email_token=AEJ6DC6XKRY3QUEXDSGILA3PX7WCJA5CNFSM4HRGBEMKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWX63TQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEJ6DC2RXFHCELCOTVQ24M3PX7WCJANCNFSM4HRGBEMA>
.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi, The training time on my machine is about 2s every 100 batches, so maybe there's no other problem for me. Anyway, thanks for your replying. Best, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I want to train data using gpu accelerate.
When I use
dp_test
,dp_frz
, they all abort:But when I try
dp_train
, there's no information about gpu dumped. I think it's still using cpu for train. And I think it's maybe a bug.F.K.xie
Beta Was this translation helpful? Give feedback.
All reactions