Doubt regarding Environment matrix And Se_a Descriptor #2469

AnuragKr · 2023-04-20T19:02:32Z

AnuragKr
Apr 20, 2023

Hope you are doing well. I have a few trivial doubts regarding Descriptor - se_e2_a. Consider the data of Water molecules given - 500 frames and 288 atoms each.

Input --
Coord = Nframes * Natoms * 3
Energy = Nframes

Q1. ) When input gets transferred to generalized coordinates then What are Natoms?
Environment matrix Creation -
-- Input -> Generalized Coordinates
Nframes * Natoms * 3 -> Nframes * Natoms * Nei_atoms * 4

I am totally confused about these 3 options-
Natoms are -- 1.) 288 (Total no. of atoms in one frame) 2.) 2 (O, H) 3.) 3 (HHO)

Q2.) When environment matrix gets created with a pre-defined cut-off radius, then it might be possible that all-atom won't have the same number of neighbors. Then how that situation is tackled?

Answered by njzjz

Apr 25, 2023

You might learn some basic knowledge of quantum mechanics. Q1 and Q2 are due to physical reasons. Since our models fit the DFT data, the models should have some similar behavior to DFT.

For Q3, you can refer to #2067.

View full answer

njzjz · 2023-04-20T19:19:16Z

njzjz
Apr 20, 2023
Maintainer

The number of atoms is 288.

The number of neighbors can be padded to any number, as long as the result remains unchanged.

0 replies

AnuragKr · 2023-04-22T09:29:01Z

AnuragKr
Apr 22, 2023
Author

Thanks, @njzjz for clarifying the doubt.

If I understood correctly the second question answer then it means if any atom has less number of neighbor atoms within the cut-off radius then beyond the cut-off radius atom can be padded to make sure the dimension is totally uniform. As per code it looks like they are making sure every atom should have sel[a,b] neighbours in case of water example.

The number of neighbors can be padded to any number, as long as the result remains unchanged.

Now I have few more doubts regarding descriptor formulation --

Consider the dataset water of molecules (H2O)
Type of atom are O,H

Function : se_a/_pass_filter()

def _pass_filter(
        self, inputs, atype, natoms, input_dict, reuse=None, suffix="", trainable=True
    ):
        if input_dict is not None:
            type_embedding = input_dict.get("type_embedding", None)
        else:
            type_embedding = None
        start_index = 0
        #Nframes*Natoms*(Nei*4)
        inputs = tf.reshape(inputs, [-1, natoms[0], self.ndescrpt])
        output = []
        output_qmat = []
        if not self.type_one_side and type_embedding is None:
            for type_i in range(self.ntypes):
                 #Nframes*Natoms*(Nei*4)  
                inputs_i = tf.slice(
                    inputs, [0, start_index, 0], [-1, natoms[2 + type_i], -1]
                )
                #(Nframes*Natoms)*Nei*4
                inputs_i = tf.reshape(inputs_i, [-1, self.ndescrpt])
                filter_name = "filter_type_" + str(type_i) + suffix
                layer, qmat = self._filter(
                    inputs_i,
                    type_i,
                    name=filter_name,
                    natoms=natoms,
                    reuse=reuse,
                    trainable=trainable,
                    activation_fn=self.filter_activation_fn,
                )
                #Nframes*Natoms*(M1*M2)                M1*M2 - Descriptor Dimension
                layer = tf.reshape(
                    layer, [tf.shape(inputs)[0], natoms[2 + type_i], self.get_dim_out()]
                )

Brief overview of this function -- In this function here we are extracting same type of atom in loop from every frame and then sending it to the _filter. Like extraction of O atom in one loop and then H atom in next loop from every frame.

Q1.) Why do we need to extract data atom type and then try to create descriptor ? Why it can't be done directly convert to descriptor ?

Function : se_a/_filter()

@cast_precision
    def _filter(
        self,
        inputs,
        type_input,
        natoms,
        type_embedding=None,
        activation_fn=tf.nn.tanh,
        stddev=1.0,
        bavg=0.0,
        name="linear",
        reuse=None,
        trainable=True,
    ):
        nframes = tf.shape(tf.reshape(inputs, [-1, natoms[0], self.ndescrpt]))[0]
        # natom x (nei x 4)
        shape = inputs.get_shape().as_list()
        outputs_size = [1] + self.filter_neuron
        outputs_size_2 = self.n_axis_neuron
        all_excluded = all(
            [
                (type_input, type_i) in self.exclude_types
                for type_i in range(self.ntypes)
            ]
        )
        if all_excluded:
            # all types are excluded so result and qmat should be zeros
            # we can safaly return a zero matrix...
            # See also https://stackoverflow.com/a/34725458/9567349
            # result: natom x outputs_size x outputs_size_2
            # qmat: natom x outputs_size x 3
            natom = tf.shape(inputs)[0]
            result = tf.cast(
                tf.fill((natom, outputs_size_2, outputs_size[-1]), 0.0),
                GLOBAL_TF_FLOAT_PRECISION,
            )
            qmat = tf.cast(
                tf.fill((natom, outputs_size[-1], 3), 0.0), GLOBAL_TF_FLOAT_PRECISION
            )
            return result, qmat

        with tf.variable_scope(name, reuse=reuse):
            start_index = 0
            type_i = 0
            # natom x 4 x outputs_size
            if type_embedding is None:
                rets = []
                for type_i in range(self.ntypes):
                    ret = self._filter_lower(
                        type_i,
                        type_input,
                        start_index,
                        self.sel_a[type_i],
                        inputs,
                        nframes,
                        natoms,
                        type_embedding=type_embedding,
                        is_exclude=(type_input, type_i) in self.exclude_types,
                        activation_fn=activation_fn,
                        stddev=stddev,
                        bavg=bavg,
                        trainable=trainable,
                        suffix="_" + str(type_i),
                    )
                    if (type_input, type_i) not in self.exclude_types:
                        # add zero is meaningless; skip
                        rets.append(ret)
                    start_index += self.sel_a[type_i]
                # faster to use accumulate_n than multiple add
                xyz_scatter_1 = tf.accumulate_n(rets)

Brief overview of this function -- Here as O atom came as a input from every frame now calculating G^i1*R separately based on atom type of O atom neighbors.

Q2.)Why we are calculating G^i1.R based atom type of neighbor? Why we can't do it in one go as neighbor of every atom predecided? How it going to effect result?

In this function dimension written as -- # natom x (nei x 4). So here natom refert to (nframes*natoms)

Q3.) se_e2_a means it will consider both radial and angular information of atomic configuration but while creating embedding from environment matrix it considers only radial information. How is angular information being used?

0 replies

njzjz · 2023-04-25T20:00:59Z

njzjz
Apr 25, 2023
Maintainer

You might learn some basic knowledge of quantum mechanics. Q1 and Q2 are due to physical reasons. Since our models fit the DFT data, the models should have some similar behavior to DFT.

For Q3, you can refer to #2067.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Doubt regarding Environment matrix And Se_a Descriptor #2469

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Doubt regarding Environment matrix And Se_a Descriptor #2469

Uh oh!

AnuragKr Apr 20, 2023

Replies: 3 comments

Uh oh!

njzjz Apr 20, 2023 Maintainer

Uh oh!

Uh oh!

AnuragKr Apr 22, 2023 Author

Uh oh!

njzjz Apr 25, 2023 Maintainer

AnuragKr
Apr 20, 2023

njzjz
Apr 20, 2023
Maintainer

AnuragKr
Apr 22, 2023
Author

njzjz
Apr 25, 2023
Maintainer