Some questions about my customized GATConv #9047

MuaYoi · 2024-03-12T09:58:01Z

MuaYoi
Mar 12, 2024

I am going to implement a customized heterogeneous graph attention network, where for each type of relationship, it is trained through a seperate GATConv.
My work seems a bit clumsy, which is to create a model instance for each relationship type through GATConv, as shown in the following code.

        self.conv1 = HeteroConv({
            ('converter','converter_trans','trans'): CustomGATConv(in_channels=6, out_channels=64, heads=4, relation='converter_trans',dropout=0.3),
            ('trans','trans_converter','converter'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='trans_converter',dropout=0.3),
            ('converter','converter_load','load'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='converter_load',dropout=0.3),
            ('load','load_converter','converter'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='load_converter',dropout=0.3),
            ('syn','syn_trans','trans'): CustomGATConv(in_channels=6, out_channels=64, heads=4, relation='syn_trans',dropout=0.3),
            ('trans','trans_syn','syn'): CustomGATConv(in_channels=15, out_channels=64, heads=4, relation='trans_syn',dropout=0.3),
            ('syn','syn_load','load'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='syn_load',dropout=0.3),
            ('load','load_syn','syn'): CustomGATConv(in_channels=15, out_channels=64, heads=4, relation='load_syn',dropout=0.3),
            ('trans','trans_load','load'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='trans_load',dropout=0.3),
            ('load','load_trans','trans'): CustomGATConv(in_channels=6, out_channels=64, heads=4, relation='load_trans',dropout=0.3),
            ('control','control_converter','converter'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='control_converter',dropout=0.3),
            ('trans','trans_trans','trans'): CustomGATConv(in_channels=6, out_channels=64, heads=4, relation='trans_trans',dropout=0.3),
            ('load','load_load','load'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='load_load',dropout=0.3),
            ('syn','syn_syn','syn'): CustomGATConv(in_channels=15, out_channels=64, heads=4, relation='syn_syn',dropout=0.3),
            ('converter','converter_converter','converter'): CustomGATConv(in_channels=12, out_channels=64, heads=4, relation='converter_converter',dropout=0.3),
            ('control','control_control','control'): CustomGATConv(in_channels=15, out_channels=64, heads=4, relation='control_control',dropout=0.3),
        },aggr='sum')

And the CustomGATConv is the result of minor modifications to GATConv. More concretely, in CustomGATConv, I apply a linear layer to make sure that the feature dimensions of source and target node are same:

self.type_aware = torch.nn.ModuleDict({
    'converter_trans': Linear(12, 6, bias=False, weight_initializer='glorot'),
    'trans_converter': Linear(6, 12, bias=False, weight_initializer='glorot'),
    'converter_load': Linear(12, 12, bias=False, weight_initializer='glorot'),
    'load_converter': Linear(12, 12, bias=False, weight_initializer='glorot'),
    'syn_trans': Linear(15, 6, bias=False, weight_initializer='glorot'),
    'trans_syn': Linear(6, 15, bias=False, weight_initializer='glorot'),
    'syn_load': Linear(15, 12, bias=False, weight_initializer='glorot'),
    'load_syn': Linear(12, 15, bias=False, weight_initializer='glorot'),
    'trans_load': Linear(6, 12, bias=False, weight_initializer='glorot'),
    'load_trans': Linear(12, 6, bias=False, weight_initializer='glorot'),
    'control_converter': Linear(15, 12, bias=False, weight_initializer='glorot'),
    'trans_trans': Linear(6, 6, bias=False, weight_initializer='glorot'),
    'syn_syn': Linear(15, 15, bias=False, weight_initializer='glorot'),
    'load_load': Linear(12, 12, bias=False, weight_initializer='glorot'),
    'converter_converter': Linear(12, 12, bias=False, weight_initializer='glorot'),
    'control_control': Linear(15, 15, bias=False, weight_initializer='glorot'),
})
self.relation = relation
self.type_aware_lin = self.type_aware[relation]

And there are only two extra changes in the forward function, which as shown in the following code:

if isinstance(x, Tensor):
    # if x is a Tensor, it means that we are considering its self-attention.
    x = self.type_aware_lin(x)      # the change 1
    assert x.dim() == 2, "Static graphs not supported in 'GATConv'" 

    if self.lin is not None:
        x_src = x_dst = self.lin(x).view(-1, H, C)
    else:
        # If the module is initialized as bipartite, transform source
        # and destination node features separately:
        assert self.lin_src is not None and self.lin_dst is not None
        x_src = self.lin_src(x).view(-1, H, C)
        x_dst = self.lin_dst(x).view(-1, H, C)

else:  # Tuple of source and target node features:
    x_src, x_dst = x
    x_src = self.type_aware_lin(x_src) # the change 2: src node's dimension will be as same as the dst's.
    assert x_src.dim() == 2, "Static graphs not supported in 'GATConv'"

    if self.lin is not None:
        # If the module is initialized as non-bipartite, we expect that
        # source and destination node features have the same shape and
        # that they their transformations are shared:
        x_src = self.lin(x_src).view(-1, H, C)
        if x_dst is not None:
            x_dst = self.lin(x_dst).view(-1, H, C)
    else:
        assert self.lin_src is not None and self.lin_dst is not None

        x_src = self.lin_src(x_src).view(-1, H, C)
        if x_dst is not None:
            x_dst = self.lin_dst(x_dst).view(-1, H, C)

The trouble is that after training work begins, the gradients of att_src and att_dst are zero (and other gradients are also small, but not zero), so this model doesn't work well. I do not know how to improve it.

rusty1s · 2024-03-14T15:19:18Z

rusty1s
Mar 14, 2024
Maintainer

You can try to add normalization between layers. This should help with vanishing gradients. Besides that, I am wondering why you not just implement the HeteroConv as

conv = HeteroConv({
     edge_type: GATConv((-1, -1), 64) for edge_type in edge_types
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some questions about my customized GATConv #9047

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Some questions about my customized GATConv #9047

Uh oh!

MuaYoi Mar 12, 2024

Replies: 1 comment

Uh oh!

rusty1s Mar 14, 2024 Maintainer

MuaYoi
Mar 12, 2024

rusty1s
Mar 14, 2024
Maintainer