Replies: 1 comment
-
Is this question connected to the tutorial? Which input files are you using and which hands-on sessions are you referring to? The second error means that you are running I am not sure about the first error. Do you have valid checkpoint files in the folder? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I need help please
#!/bin/bash
#SBATCH -A HfA22_EAIFR
#SBATCH -p m100_usr_prod
#SBATCH --time 24:00:00 # format: HH:MM:SS
#SBATCH -N 1 # 1 node
#SBATCH --ntasks-per-node=16 # 8 tasks out of 128
#SBATCH --gres=gpu:4 # 1 gpus per node out of 4
#SBATCH --mem=7100 # memory per node out of 246000MB
#SBATCH --job-name=watdim-e0
#SBATCH --mail-type=ALL
#SBATCH --mail-user=[email protected]
module load profile/deeplrn
module load autoload deepmd
module load autoload cuda
export OMP_NUM_THREADS=3
export TF_INTRA_OP_PARALLELISM_THREADS=3
export TF_INTER_OP_PARALLELISM_THREADS=2
CUDA_VISIBLE_DEVICES=0,1 horovodrun -np 1
dp train --mpi-log=workers input.json
#dp train input.json
dp freeze -o graph.pb
dp test
ERROR **********************************
input_checkpoint = checkpoint.model_checkpoint_path
AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'
2022-08-18 11:57:24.642552: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /cineca/prod/opt/libraries/open-ce/1.1.3/none/opence-conda-env-py3.8-cuda-openmpi-11.0/opence/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
File "/cineca/prod/opt/applications/deepmd/2.0/cuda--11.0/deepmd/bin/dp", line 8, in
sys.exit(main())
File "/m100/prod/opt/applications/deepmd/2.0/cuda--11.0/deepmd/lib/python3.8/site-packages/deepmd/entrypoints/main.py", line 443, in main
test(**dict_args)
File "/m100/prod/opt/applications/deepmd/2.0/cuda--11.0/deepmd/lib/python3.8/site-packages/deepmd/entrypoints/test.py", line 62, in test
raise RuntimeError("Did not find valid system")
RuntimeError: Did not find valid system
Beta Was this translation helpful? Give feedback.
All reactions