A large error induced by compression for hybrid descriptor #2182

shihao-code · 2022-12-15T09:41:23Z

shihao-code
Dec 15, 2022

When I used a hybrid descriptor of se_e2_a and se_e3, the RMSE of deep potential is very small (3 meV/atom for energy and 59 meV/Ang for atomic force), however, after compressing the potential, the RMSE change very large (16 meV/atom for energy and 64 meV/Ang for atomic force). But if I only used se_e2_a descriptor with keepind other parameter in input.json file unchanged, there is no change before and after compression. And if only se_e3 descriptor was used, there is also a large error induced by compression.

Verison of deepmd-kit: 2.1.5_cuda11.6

Command I used: dp compress -i FeH.pb -o FeH-compress.pb --step 0.002

The output of compression:

Loading BaseGPU/2021
Loading requirement: nvhpc/21.3 cuda/11.2 openmpi/4.0.3cu11.2.v2
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
DEEPMD INFO

DEEPMD INFO stage 1: compress the model
DEEPMD INFO _____ _____ __ __ _____ _ _ _
DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |
DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO source : v2.1.5
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: 6e3d4a6
DEEPMD INFO source commit at: 2022-09-23 16:10:28 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build variant: cuda
DEEPMD INFO build with tf inc: /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/include;/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: gpu0501
DEEPMD INFO computing device: gpu:0
DEEPMD INFO CUDA_VISIBLE_DEVICES: 0,1
DEEPMD INFO Count of visible GPU: 2
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
DEEPMD INFO training without frame parameter
DEEPMD INFO training data with lower boundary: [-0.22680075 -0.29381635]
DEEPMD INFO training data with upper boundary: [30.16753829 41.82551879]
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 1 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 1 thread/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449422 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449421 thread 1 bound to OS proc set 0
DEEPMD INFO training data with lower boundary: [-1505.35165116 -4165.88651941]
DEEPMD INFO training data with upper boundary: [1505.35165116 4165.88651941]
DEEPMD INFO built lr
DEEPMD INFO built network
DEEPMD INFO built training
DEEPMD INFO initialize model from scratch
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
DEEPMD INFO /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
INFO:tensorflow:0
DEEPMD INFO 0
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
DEEPMD INFO /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
INFO:tensorflow:69300
DEEPMD INFO 69300
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
DEEPMD INFO /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
INFO:tensorflow:1659000
DEEPMD INFO 1659000
DEEPMD INFO finished compressing
DEEPMD INFO

DEEPMD INFO stage 2: freeze the model
INFO:tensorflow:Restoring parameters from model-compression/model.ckpt
DEEPMD INFO Restoring parameters from model-compression/model.ckpt
DEEPMD INFO The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
DEEPMD INFO 1258 ops in the final graph.

My input.json file

    "descriptor": {
    "type": "hybrid",
    "list": [
	 {
            "type": "se_e2_a",
            "sel": "auto",
            "rcut_smth": 0.5,
	    "activation_function": "tanh",
            "rcut": 6.5,
            "neuron": [
                30,
                60,
		120
            ],
            "resnet_dt": false,
            "axis_neuron": 32,
            "seed": 13290,
            "_comment": " that's all"
	 },
	 {
                "type": "se_e3",
                "sel": "auto",
                "rcut_smth": 0.5,
                "activation_function": "tanh",
                "rcut": 5.0,
                "neuron": [
                    5,
                    10,
                    20
                ],
                "resnet_dt": false,
                "seed": 1327,
                "_comment": " that's all"
	 }
    ]
    },
    "fitting_net": {
        "neuron": [
            320,
            320,
	320
        ],
        "resnet_dt": true,
        "seed": 6374,
        "_comment": " that's all"
    },

wanghan-iapcm · 2022-12-18T04:21:19Z

wanghan-iapcm
Dec 18, 2022
Maintainer

@denghuilu could you please take a look? Thanks!

0 replies

shihao-code · 2022-12-20T09:39:19Z

shihao-code
Dec 20, 2022
Author

@denghuilu @wanghan-iapcm Please tell me if you need more information. Thank you very much.

0 replies

njzjz · 2022-12-20T21:23:56Z

njzjz
Dec 20, 2022
Maintainer

Just a confirmation. Do you have the same result using CPUs?

6 replies

wanghan-iapcm Jan 14, 2023
Maintainer

@shihao-code Could you please provide us with the uncompressed model and the minimal set of test data that reproduces the bug?

shihao-code Jan 14, 2023
Author

Dear @wanghan-iapcm, the uncompressed model and the test data are attached. Thx.

files.zip

shihao-code Jan 14, 2023
Author

Dear @wanghan-iapcm and @njzjz, there are two additional test structures with the maximum error between the energies calculated by the uncompressed and compressed models, which might be helpful for you to debug.

                                       DFT         uncompressed model        compressed model

data_108_000_108 -2.0921e+01 -2.0288e+01 -5.1817e+01
data_376_320_056 -2.6794e+03 -2.6786e+03 -2.6592e+03

Thx.

testdata.zip

wanghan-iapcm Jan 15, 2023
Maintainer

Thanks! we will take care.

njzjz Jan 15, 2023
Maintainer

I can reproduce it and confirm it's not related to the range or the "step".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A large error induced by compression for hybrid descriptor #2182

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A large error induced by compression for hybrid descriptor #2182

Uh oh!

shihao-code Dec 15, 2022

Replies: 3 comments · 6 replies

Uh oh!

wanghan-iapcm Dec 18, 2022 Maintainer

Uh oh!

shihao-code Dec 20, 2022 Author

Uh oh!

njzjz Dec 20, 2022 Maintainer

Uh oh!

wanghan-iapcm Jan 14, 2023 Maintainer

Uh oh!

shihao-code Jan 14, 2023 Author

Uh oh!

shihao-code Jan 14, 2023 Author

Uh oh!

wanghan-iapcm Jan 15, 2023 Maintainer

Uh oh!

njzjz Jan 15, 2023 Maintainer

shihao-code
Dec 15, 2022

Replies: 3 comments 6 replies

wanghan-iapcm
Dec 18, 2022
Maintainer

shihao-code
Dec 20, 2022
Author

njzjz
Dec 20, 2022
Maintainer

wanghan-iapcm Jan 14, 2023
Maintainer

shihao-code Jan 14, 2023
Author

shihao-code Jan 14, 2023
Author

wanghan-iapcm Jan 15, 2023
Maintainer

njzjz Jan 15, 2023
Maintainer