-
Hello @njzjz , Hope you are doing well. I have a few trivial doubts regarding Descriptor - se_e2_a. Consider the data of Water molecules given - 500 frames and 288 atoms each. Input -- Q1. ) When input gets transferred to generalized coordinates then What are Natoms? I am totally confused about these 3 options- Q2.) When environment matrix gets created with a pre-defined cut-off radius, then it might be possible that all-atom won't have the same number of neighbors. Then how that situation is tackled? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
The number of atoms is 288. The number of neighbors can be padded to any number, as long as the result remains unchanged. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @njzjz for clarifying the doubt. If I understood correctly the second question answer then it means if any atom has less number of neighbor atoms within the cut-off radius then beyond the cut-off radius atom can be padded to make sure the dimension is totally uniform. As per code it looks like they are making sure every atom should have sel[a,b] neighbours in case of water example.
Now I have few more doubts regarding descriptor formulation -- Consider the dataset water of molecules (H2O) Function : se_a/_pass_filter() def _pass_filter(
self, inputs, atype, natoms, input_dict, reuse=None, suffix="", trainable=True
):
if input_dict is not None:
type_embedding = input_dict.get("type_embedding", None)
else:
type_embedding = None
start_index = 0
#Nframes*Natoms*(Nei*4)
inputs = tf.reshape(inputs, [-1, natoms[0], self.ndescrpt])
output = []
output_qmat = []
if not self.type_one_side and type_embedding is None:
for type_i in range(self.ntypes):
#Nframes*Natoms*(Nei*4)
inputs_i = tf.slice(
inputs, [0, start_index, 0], [-1, natoms[2 + type_i], -1]
)
#(Nframes*Natoms)*Nei*4
inputs_i = tf.reshape(inputs_i, [-1, self.ndescrpt])
filter_name = "filter_type_" + str(type_i) + suffix
layer, qmat = self._filter(
inputs_i,
type_i,
name=filter_name,
natoms=natoms,
reuse=reuse,
trainable=trainable,
activation_fn=self.filter_activation_fn,
)
#Nframes*Natoms*(M1*M2) M1*M2 - Descriptor Dimension
layer = tf.reshape(
layer, [tf.shape(inputs)[0], natoms[2 + type_i], self.get_dim_out()]
) Brief overview of this function -- In this function here we are extracting same type of atom in loop from every frame and then sending it to the _filter. Like extraction of O atom in one loop and then H atom in next loop from every frame. Q1.) Why do we need to extract data atom type and then try to create descriptor ? Why it can't be done directly convert to descriptor ? Function : se_a/_filter() @cast_precision
def _filter(
self,
inputs,
type_input,
natoms,
type_embedding=None,
activation_fn=tf.nn.tanh,
stddev=1.0,
bavg=0.0,
name="linear",
reuse=None,
trainable=True,
):
nframes = tf.shape(tf.reshape(inputs, [-1, natoms[0], self.ndescrpt]))[0]
# natom x (nei x 4)
shape = inputs.get_shape().as_list()
outputs_size = [1] + self.filter_neuron
outputs_size_2 = self.n_axis_neuron
all_excluded = all(
[
(type_input, type_i) in self.exclude_types
for type_i in range(self.ntypes)
]
)
if all_excluded:
# all types are excluded so result and qmat should be zeros
# we can safaly return a zero matrix...
# See also https://stackoverflow.com/a/34725458/9567349
# result: natom x outputs_size x outputs_size_2
# qmat: natom x outputs_size x 3
natom = tf.shape(inputs)[0]
result = tf.cast(
tf.fill((natom, outputs_size_2, outputs_size[-1]), 0.0),
GLOBAL_TF_FLOAT_PRECISION,
)
qmat = tf.cast(
tf.fill((natom, outputs_size[-1], 3), 0.0), GLOBAL_TF_FLOAT_PRECISION
)
return result, qmat
with tf.variable_scope(name, reuse=reuse):
start_index = 0
type_i = 0
# natom x 4 x outputs_size
if type_embedding is None:
rets = []
for type_i in range(self.ntypes):
ret = self._filter_lower(
type_i,
type_input,
start_index,
self.sel_a[type_i],
inputs,
nframes,
natoms,
type_embedding=type_embedding,
is_exclude=(type_input, type_i) in self.exclude_types,
activation_fn=activation_fn,
stddev=stddev,
bavg=bavg,
trainable=trainable,
suffix="_" + str(type_i),
)
if (type_input, type_i) not in self.exclude_types:
# add zero is meaningless; skip
rets.append(ret)
start_index += self.sel_a[type_i]
# faster to use accumulate_n than multiple add
xyz_scatter_1 = tf.accumulate_n(rets) Brief overview of this function -- Here as O atom came as a input from every frame now calculating G^i1*R separately based on atom type of O atom neighbors. Q2.)Why we are calculating G^i1.R based atom type of neighbor? Why we can't do it in one go as neighbor of every atom predecided? How it going to effect result? In this function dimension written as -- # natom x (nei x 4). So here natom refert to (nframes*natoms) Q3.) se_e2_a means it will consider both radial and angular information of atomic configuration but while creating embedding from environment matrix it considers only radial information. How is angular information being used? |
Beta Was this translation helpful? Give feedback.
-
You might learn some basic knowledge of quantum mechanics. Q1 and Q2 are due to physical reasons. Since our models fit the DFT data, the models should have some similar behavior to DFT. For Q3, you can refer to #2067. |
Beta Was this translation helpful? Give feedback.
You might learn some basic knowledge of quantum mechanics. Q1 and Q2 are due to physical reasons. Since our models fit the DFT data, the models should have some similar behavior to DFT.
For Q3, you can refer to #2067.