Loss remain constant for training #4546

JiaruiWang · 2022-04-27T05:41:33Z

JiaruiWang
Apr 27, 2022

My task is a node classification problem on a 60M nodes graph. 19M nodes are labeled, and 40M nodes are unlabeled. I created the dataset with a 15M nodes training mask, a 2M nodes validation mask, and a 2M nodes testing mask, out of 29M labeled nodes. Node labels are 243 countries(geolocation info). Following is the code creating the dataset. The node feature is 1 for every node.

class PagenetDataset(Dataset):
    def __init__(self, root, transform=None, pre_transform=None, pre_filter=None):
        super().__init__(root, transform, pre_transform, pre_filter)

    @property
    def raw_file_names(self):
        return ['edge_index.csv', 'c_label.csv', 'c_tr.csv', 'c_va.csv', 'c_te.csv']

    @property
    def processed_file_names(self):
        return ['page_net_total_disk_node_feat_1.pt']

    def download(self):
        pass

    def len(self):
        return len(self.processed_file_names)

    def get(self, idx=None):
        data = torch.load(osp.join(self.processed_dir, f'page_net_total_disk_node_feat_1.pt'))
        return data

    def process(self):
        idx = 0
        x = self.get_node_feature()
        edge_index = self.get_edge_index('./data/raw_dir/edge_index.csv')
        y = self.get_y_label('./data/raw_dir/c_label.csv')

        self.data = Data(x=x, edge_index=edge_index, y=y)
        self.data.train_mask = self.get_masks('./data/raw_dir/c_tr.csv')
        self.data.val_mask = self.get_masks('./data/raw_dir/c_va.csv')
        self.data.test_mask = self.get_masks('./data/raw_dir/c_te.csv')
        self.data.num_classes = 243

        torch.save(self.data, osp.join(self.processed_dir, f'page_net_total_disk_node_feat_1.pt'))
        return self.data

    def get_node_feature(self, file_path: str = None) -> torch.Tensor:
        num_nodes = 60924683

        # 1. Just add constant value 1 for each node as feature augmentation.
        x = [[1] for i in range(num_nodes)]
        x = torch.Tensor(x)

        # 2. Add position anchor sets, node features are the distances to the sets.

        return x
    
    def get_edge_index(self, file_path: str) -> torch.LongTensor:
        df = pd.read_csv(file_path, sep='\t', header=None)
        dft = df.T
        edge_index = torch.LongTensor(dft.values)
        return edge_index

    def get_y_label(self, file_path: str) -> torch.LongTensor:
        df = pd.read_csv(file_path, sep='\t', header=None)
        y = torch.LongTensor(df.T.values[0])
        return y

    def get_masks(self, file_path: str) -> torch.Tensor:
        df = pd.read_csv(file_path, sep='\t', header=None)
        mask = torch.tensor(df.T.values[0], dtype=torch.bool)
        return mask

Some dataset statistics:

Data in the PagenetDataset() :
 Data(x=[60924683, 1], edge_index=[2, 789494545], y=[60924683], train_mask=[60924683], val_mask=[60924683], test_mask=[60924683], num_classes=243)
Number of nodes: 60924683
Number of features: 1
Number of classes: 243
Number of edges: 789494545
Average node degree: 12.96
Number of training nodes: 15048704
Number of validation nodes: 1881088
Number of testing nodes: 1881088
Training node label rate: 0.25
Has isolated nodes: True
Has self-loops: True
Is undirected: False

Node lables count

+----------------------------------------------+----------+
| country                                      | count(*) |
+----------------------------------------------+----------+
| NULL                                         | 42113803 |
| United States                                |  6163640 |
| United Kingdom                               |  1188485 |
| Italy                                        |   806525 |
| Germany                                      |   765180 |
| Brazil                                       |   756659 |
| Spain                                        |   657865 |
| Canada                                       |   636285 |
| Australia                                    |   631653 |
| France                                       |   613230 |
| Poland                                       |   419352 |
| Netherlands                                  |   383773 |
| Mexico                                       |   307278 |
| Japan                                        |   290214 |
| India                                        |   255724 |
| Argentina                                    |   227096 |
| Portugal                                     |   219244 |
| Turkey                                       |   184258 |
| Greece                                       |   159927 |
| Taiwan                                       |   157235 |
| Belgium                                      |   138066 |
| South Africa                                 |   134511 |
| Egypt                                        |   132672 |
| Ireland                                      |   126898 |
| Thailand                                     |   126483 |
| Pakistan                                     |   118733 |
| Austria                                      |   113937 |
| Sweden                                       |   112706 |
| Romania                                      |   110695 |
| New Zealand                                  |   109813 |
| Malaysia                                     |   106064 |
| Philippines                                  |   102190 |
| Switzerland                                  |    91829 |
| Peru                                         |    88448 |
| Colombia                                     |    88428 |
| Hungary                                      |    87952 |
| Norway                                       |    85071 |
| Czech Republic                               |    81794 |
| Vietnam                                      |    78380 |
| Finland                                      |    70473 |
| Indonesia                                    |    66306 |
| Denmark                                      |    64243 |
| Chile                                        |    58966 |
| Tunisia                                      |    58246 |
| Israel                                       |    54447 |
| Croatia                                      |    52647 |
| Morocco                                      |    50063 |
| Bangladesh                                   |    48078 |
| Bulgaria                                     |    47722 |
| Russia                                       |    44608 |
| Serbia                                       |    44030 |
| Algeria                                      |    41626 |
| Costa Rica                                   |    41613 |
| United Arab Emirates                         |    40202 |
| Singapore                                    |    37015 |
| Hong Kong                                    |    35094 |
| Jordan                                       |    34737 |
| South Korea                                  |    34443 |
| Slovakia                                     |    33862 |
| Iraq                                         |    33551 |
| Iran                                         |    32835 |
| Venezuela                                    |    32652 |
| Ukraine                                      |    32636 |
| Ecuador                                      |    31758 |
| Slovenia                                     |    31566 |
| Lithuania                                    |    29100 |
| Puerto Rico                                  |    25151 |
| Albania                                      |    22508 |
| Kenya                                        |    22453 |
| Palestine                                    |    21659 |
| Cyprus                                       |    19489 |
| Guatemala                                    |    19233 |
| Sri Lanka                                    |    18721 |
| Saudi Arabia                                 |    18077 |
| Georgia                                      |    17954 |
| Dominican Republic                           |    17862 |
| Uruguay                                      |    17573 |
| Bosnia and Herzegovina                       |    17313 |
| Syria                                        |    17192 |
| Estonia                                      |    16912 |
| Nepal                                        |    16903 |
| Cambodia                                     |    16177 |
| Lebanon                                      |    16070 |
| Latvia                                       |    14959 |
| Paraguay                                     |    14792 |
| El Salvador                                  |    13322 |
| Macedonia                                    |    13103 |
| Panama                                       |    12749 |
| Libya                                        |    12640 |
| China                                        |    12227 |
| Bolivia                                      |    11653 |
| Nigeria                                      |    11292 |
| Azerbaijan                                   |    11085 |
| Malta                                        |    10633 |
| Trinidad and Tobago                          |    10506 |
| Kosovo                                       |    10207 |
| Iceland                                      |    10041 |
| Jamaica                                      |    10002 |
| Nicaragua                                    |     9488 |
| Uganda                                       |     8786 |
| Honduras                                     |     8357 |
| Luxembourg                                   |     7728 |
| Moldova                                      |     7474 |
| Armenia                                      |     6901 |
| Mongolia                                     |     6769 |
| Afghanistan                                  |     6220 |
| Myanmar                                      |     6045 |
| Ghana                                        |     5911 |
| Qatar                                        |     5446 |
| Tanzania                                     |     5402 |
| Ru00e9union                                  |     5365 |
| Mauritius                                    |     4787 |
| Senegal                                      |     4774 |
| Yemen                                        |     4736 |
| Montenegro                                   |     4709 |
| Zimbabwe                                     |     4603 |
| The Bahamas                                  |     4536 |
| Namibia                                      |     4463 |
| Kuwait                                       |     4284 |
| Guadeloupe                                   |     3910 |
| Barbados                                     |     3907 |
| Angola                                       |     3864 |
| Haiti                                        |     3684 |
| Democratic Republic of the Congo             |     3489 |
| Cu00f4te d'Ivoire                            |     3478 |
| Belize                                       |     3470 |
| Sudan                                        |     3345 |
| Martinique                                   |     3126 |
| Mozambique                                   |     3013 |
| Ethiopia                                     |     2971 |
| Curau00e7ao                                  |     2869 |
| Madagascar                                   |     2847 |
| Suriname                                     |     2826 |
| Zambia                                       |     2808 |
| New Caledonia                                |     2779 |
| Bahrain                                      |     2773 |
| Cuba                                         |     2646 |
| Maldives                                     |     2628 |
| Belarus                                      |     2589 |
| Jersey                                       |     2513 |
| French Polynesia                             |     2496 |
| Kazakhstan                                   |     2389 |
| Macau                                        |     2372 |
| Cameroon                                     |     2367 |
| Botswana                                     |     2363 |
| Andorra                                      |     2289 |
| Isle Of Man                                  |     2287 |
| Oman                                         |     2260 |
| Aruba                                        |     2132 |
| Fiji                                         |     2130 |
| US Virgin Islands                            |     2007 |
| Laos                                         |     1870 |
| Monaco                                       |     1843 |
| Cayman Islands                               |     1745 |
| Kyrgyzstan                                   |     1736 |
| Cape Verde                                   |     1650 |
| Guyana                                       |     1475 |
| St. Lucia                                    |     1469 |
| Bermuda                                      |     1399 |
| Brunei                                       |     1353 |
| Rwanda                                       |     1280 |
| Uzbekistan                                   |     1157 |
| Grenada                                      |     1155 |
| Guam                                         |     1143 |
| Gibraltar                                    |     1121 |
| Papua New Guinea                             |     1085 |
| Malawi                                       |     1061 |
| Antigua                                      |     1052 |
| Benin                                        |     1035 |
| Gabon                                        |      985 |
| French Guiana                                |      985 |
| San Marino                                   |      925 |
| Bhutan                                       |      893 |
| Dominica                                     |      833 |
| Saint Vincent and the Grenadines             |      833 |
| Saint Kitts and Nevis                        |      823 |
| Togo                                         |      822 |
| Liechtenstein                                |      764 |
| Sint Maarten                                 |      745 |
| Guernsey                                     |      719 |
| Vanuatu                                      |      717 |
| Somalia                                      |      688 |
| Solomon Islands                              |      688 |
| Burkina Faso                                 |      682 |
| Turks and Caicos Islands                     |      662 |
| Mali                                         |      660 |
| Mayotte                                      |      641 |
| Faroe Islands                                |      621 |
| Bonaire, Sint Eustatius and Saba             |      606 |
| Liberia                                      |      550 |
| Swaziland                                    |      546 |
| Samoa                                        |      546 |
| Mauritania                                   |      525 |
| Western Sahara                               |      520 |
| British Virgin Islands                       |      506 |
| Tajikistan                                   |      497 |
| Seychelles                                   |      493 |
| Sierra Leone                                 |      489 |
| Guinea                                       |      487 |
| Republic of the Congo                        |      468 |
| Greenland                                    |      417 |
| South Sudan                                  |      390 |
| The Gambia                                   |      386 |
| Anguilla                                     |      368 |
| Cook Islands                                 |      348 |
| Lesotho                                      |      299 |
| North Korea                                  |      283 |
| Burundi                                      |      271 |
| Northern Mariana Islands                     |      254 |
| Timor-Leste                                  |      233 |
| Tonga                                        |      232 |
| Chad                                         |      221 |
| Sao Tome and Principe                        |      216 |
| Niger                                        |      205 |
| Djibouti                                     |      187 |
| Central African Republic                     |      184 |
| American Samoa                               |      173 |
| Equatorial Guinea                            |      171 |
| Norfolk Island                               |      154 |
| Guinea-Bissau                                |      152 |
| Svalbard and Jan Mayen                       |      149 |
| Federated States of Micronesia               |      142 |
| Saint Martin                                 |      136 |
| Comoros                                      |      134 |
| Saint Pierre and Miquelon                    |      131 |
| Falkland Islands                             |      106 |
| Palau                                        |      106 |
| Marshall Islands                             |      104 |
| Antarctica                                   |      102 |
| Turkmenistan                                 |       84 |
| Eritrea                                      |       77 |
| Montserrat                                   |       58 |
| Niue                                         |       42 |
| Saint Helena                                 |       32 |
| Christmas Island                             |       27 |
| Kiribati                                     |       26 |
| Cocos (Keeling) Islands                      |       11 |
| Pitcairn                                     |       10 |
| Tuvalu                                       |        9 |
| Wallis and Futuna                            |        8 |
| South Georgia and the South Sandwich Islands |        5 |
| Nauru                                        |        3 |
| French Southern Territories                  |        3 |
| Tokelau                                      |        2 |
+----------------------------------------------+----------+

The outgoing degree histogram is

+----------------------------+------------------------+
| bucket                     | one_page_likespage_num |
+----------------------------+------------------------+
| 0 <= likespage <= 10       |               29154056 |
| 10 < likespage <= 100      |                8143647 |
| 100 < likespage <= 1000    |                1600350 |
| 1000 < likespage <= 10000  |                  49709 |
| 10000 < likespage <= 80000 |                    223 |
+----------------------------+------------------------+

I used two layers GraphSAGE model with NeighborLoaders for the training.

dataset = PagenetDataset("data/")
data = dataset.get()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_loader = NeighborLoader(data, num_neighbors=[10] * 2, batch_size=4096,
                                input_nodes=data.train_mask)
val_loader = NeighborLoader(data, num_neighbors=[-1] * 2, batch_size=512, 
                                input_nodes=data.val_mask)
test_loader = NeighborLoader(data, num_neighbors=[-1] * 2, batch_size=512,
                                input_nodes=data.test_mask)

class Sage(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        torch.manual_seed(1234567)
        self.conv1 = SAGEConv(data.num_features, hidden_channels)
        self.conv2 = SAGEConv(hidden_channels, data.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)   
        return x

model = Sage(hidden_channels=64)
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

def train():
    model.train()
    total_loss = total_examples = 0
    for sampled_data in tqdm.tqdm(train_loader):
        sampled_data = sampled_data.to(device)
        adj = SparseTensor(row=sampled_data.edge_index[0], 
                           col=sampled_data.edge_index[1],
                           sparse_sizes=(sampled_data.num_nodes,
                                         sampled_data.num_nodes))
        optimizer.zero_grad()
        out = model(sampled_data.x, adj.t())
        loss = criterion(out[sampled_data.train_mask], 
                         sampled_data.y[sampled_data.train_mask])
        loss.backward()
        optimizer.step()
        
        total_loss += loss
        total_examples += sampled_data.train_mask.sum()

    return total_loss / total_examples

@torch.no_grad()
def test():
    model.eval()
    accs = []

    loader = val_loader
    correct_sum, mask_sum = 0, 0
    for sampled_data in tqdm.tqdm(loader):
        out = model(sampled_data.x.to(device), sampled_data.edge_index.to(device))
        pred = out.argmax(dim=1)  # Use the class with highest probability.
        y = sampled_data.y.to(device)
        correct = pred[sampled_data.val_mask] == y[sampled_data.val_mask]  # Check against ground-truth labels.
        correct_sum += int(correct.sum()) 
        mask_sum += int(sampled_data.val_mask.sum())
    accs.append(correct_sum / mask_sum) # Derive ratio of correct predictions.

    loader = test_loader
    correct_sum, mask_sum = 0, 0
    for sampled_data in tqdm.tqdm(loader):
        out = model(sampled_data.x.to(device), sampled_data.edge_index.to(device))
        pred = out.argmax(dim=1)  # Use the class with highest probability.
        y = sampled_data.y.to(device)
        correct = pred[sampled_data.test_mask] == y[sampled_data.test_mask]  # Check against ground-truth labels.
        correct_sum += int(correct.sum()) 
        mask_sum += int(sampled_data.test_mask.sum())
    accs.append(correct_sum / mask_sum) # Derive ratio of correct predictions.

    return accs

for epoch in range(1, 200):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.10f}')
    if epoch % 10 == 0:
        val_acc, test_acc = test()
        print(f'Val: {val_acc:.4f}, Test: {test_acc:.4f}')

The training loss remains constant.

100%|██████████| 3674/3674 [06:52<00:00,  8.90it/s]
Epoch: 001, Loss: 0.0000301802
100%|██████████| 3674/3674 [04:17<00:00, 14.25it/s]
Epoch: 002, Loss: 0.0000299938
100%|██████████| 3674/3674 [04:22<00:00, 13.99it/s]
Epoch: 003, Loss: 0.0000299932
100%|██████████| 3674/3674 [04:25<00:00, 13.85it/s]
Epoch: 004, Loss: 0.0000299949
100%|██████████| 3674/3674 [04:26<00:00, 13.78it/s]
Epoch: 005, Loss: 0.0000299924
100%|██████████| 3674/3674 [04:28<00:00, 13.68it/s]
Epoch: 006, Loss: 0.0000299947
100%|██████████| 3674/3674 [04:32<00:00, 13.49it/s]
Epoch: 007, Loss: 0.0000299952
100%|██████████| 3674/3674 [04:32<00:00, 13.46it/s]
Epoch: 008, Loss: 0.0000299960
100%|██████████| 3674/3674 [04:32<00:00, 13.50it/s]
Epoch: 009, Loss: 0.0000299943
100%|██████████| 3674/3674 [04:33<00:00, 13.45it/s]
Epoch: 010, Loss: 0.0000299929
100%|██████████| 3674/3674 [04:31<00:00, 13.54it/s]
Epoch: 011, Loss: 0.0000299939
100%|██████████| 3674/3674 [04:34<00:00, 13.38it/s]
Epoch: 012, Loss: 0.0000299937
100%|██████████| 3674/3674 [04:34<00:00, 13.37it/s]
Epoch: 013, Loss: 0.0000299933
100%|██████████| 3674/3674 [04:33<00:00, 13.43it/s]
Epoch: 014, Loss: 0.0000299923
100%|██████████| 3674/3674 [04:32<00:00, 13.48it/s]
Epoch: 015, Loss: 0.0000299945
100%|██████████| 3674/3674 [04:33<00:00, 13.45it/s]
Epoch: 016, Loss: 0.0000299927
100%|██████████| 3674/3674 [04:32<00:00, 13.46it/s]
Epoch: 017, Loss: 0.0000299914
100%|██████████| 3674/3674 [04:32<00:00, 13.47it/s]
Epoch: 018, Loss: 0.0000299944
100%|██████████| 3674/3674 [04:34<00:00, 13.37it/s]
Epoch: 019, Loss: 0.0000299934
100%|██████████| 3674/3674 [04:34<00:00, 13.38it/s]
Epoch: 020, Loss: 0.0000299917
100%|██████████| 3674/3674 [04:46<00:00, 12.83it/s]
Epoch: 021, Loss: 0.0000299940
100%|██████████| 3674/3674 [04:44<00:00, 12.92it/s]
Epoch: 022, Loss: 0.0000299941
100%|██████████| 3674/3674 [04:39<00:00, 13.14it/s]
Epoch: 023, Loss: 0.0000299960
 10%|▉         | 363/3674 [00:39<04:35, 12.00it/s]
Epoch: 024, Loss: 0.0000299937
100%|██████████| 3674/3674 [04:44<00:00, 12.93it/s]
Epoch: 025, Loss: 0.0000299926
100%|██████████| 3674/3674 [04:45<00:00, 12.88it/s]
Epoch: 026, Loss: 0.0000299929
100%|██████████| 3674/3674 [04:45<00:00, 12.86it/s]
Epoch: 027, Loss: 0.0000299927
100%|██████████| 3674/3674 [04:43<00:00, 12.98it/s]
Epoch: 028, Loss: 0.0000299937
100%|██████████| 3674/3674 [04:43<00:00, 12.96it/s]
Epoch: 029, Loss: 0.0000299941
100%|██████████| 3674/3674 [04:43<00:00, 12.98it/s]
Epoch: 030, Loss: 0.0000299923
100%|██████████| 3674/3674 [04:41<00:00, 13.07it/s]

I have some guesses for the reasons.

The loss is very small, will it cause the constant loss issue during the backward propagation?
Node labels distribution is very imbalanced. Is CrossEntropyLoss fit for this task?
Is SageConv suitable for this task?

I am desperate on this issue. Do I have any bug in my code causing the constant loss?
Thank you very much.

rusty1s · 2022-04-27T08:47:49Z

rusty1s
Apr 27, 2022
Maintainer

I think you first need to fix your issues with your node feature matrix. Currently, it looks like data.x only holds a single feature per node that is shared across all nodes (a 1). That means, that your GNN cannot learn anything. Every node will receive the same node embedding and the GNN cannot extract any useful information out of this. Any chance you can create more meaningful node features?

6 replies

rusty1s Apr 28, 2022
Maintainer

Combining P-GNN and NeighborLoader is indeed challenge. I think I would start with adding some synthetic features to the nodes first, e.g., by computing their degree. You can look at transforms.OneHotDegree for this.

JiaruiWang May 14, 2022
Author

Hi, I added some meaningful node features. Here are 52 features for one target node, These features are distances from the target node to the other 52 anchor nodes.
[4., 4., 3., 5., 3., 5., 4., 3., 5., 3., 3., 4., 4., 3.,
5., 4., 3., 4.,4., 4., 4., 3., 5., 5., 3., 3., 4., 2., 5., 5., 4.,
5.,4., 5., 4., 3., 6., 3., 5., 4., 3., 4., 3., 3., 4., 3.,
4., 4., 3., 3., 2., 2,]
Even though I added the node features, the loss still remains constant. Do you have any suggestions?
Thank you very much.

JiaruiWang May 14, 2022
Author

Here are 21 epochs.
100%|██████████| 25093/25093 [01:41<00:00, 246.04it/s]
Epoch: 001, Loss: 0.0010731507
100%|██████████| 25093/25093 [01:42<00:00, 245.34it/s]
Epoch: 002, Loss: 0.0010726714
100%|██████████| 25093/25093 [01:52<00:00, 223.48it/s]
Epoch: 003, Loss: 0.0010728895
100%|██████████| 25093/25093 [01:39<00:00, 251.24it/s]
Epoch: 004, Loss: 0.0010728558
100%|██████████| 25093/25093 [01:45<00:00, 238.96it/s]
Epoch: 005, Loss: 0.0010728377
100%|██████████| 25093/25093 [01:42<00:00, 245.49it/s]
Epoch: 006, Loss: 0.0010728115
100%|██████████| 25093/25093 [01:49<00:00, 229.47it/s]
Epoch: 007, Loss: 0.0010728717
100%|██████████| 25093/25093 [01:50<00:00, 228.04it/s]
Epoch: 008, Loss: 0.0010729075
100%|██████████| 25093/25093 [01:48<00:00, 230.60it/s]
Epoch: 009, Loss: 0.0010727936
44%|████▎ | 10947/25093 [00:47<00:59, 236.82it/s]Epoch: 010, Loss: 0.0010728919
100%|██████████| 25093/25093 [00:58<00:00, 431.27it/s]
100%|██████████| 2295/2295 [01:01<00:00, 37.25it/s]
100%|██████████| 2295/2295 [01:03<00:00, 36.15it/s]
Train: 0.0385, Val: 0.0000, Test: 0.0000
100%|██████████| 25093/25093 [01:46<00:00, 236.24it/s]
Epoch: 011, Loss: 0.0010727817
100%|██████████| 25093/25093 [01:40<00:00, 248.68it/s]
Epoch: 012, Loss: 0.0010728481
100%|██████████| 25093/25093 [01:50<00:00, 227.42it/s]
Epoch: 013, Loss: 0.0010730251
100%|██████████| 25093/25093 [01:41<00:00, 247.72it/s]
Epoch: 014, Loss: 0.0010727778
100%|██████████| 25093/25093 [01:43<00:00, 242.96it/s]
Epoch: 015, Loss: 0.0010730457
100%|██████████| 25093/25093 [01:39<00:00, 252.41it/s]
Epoch: 016, Loss: 0.0010728750
100%|██████████| 25093/25093 [01:39<00:00, 252.92it/s]
Epoch: 017, Loss: 0.0010727561
100%|██████████| 25093/25093 [01:41<00:00, 246.03it/s]
Epoch: 018, Loss: 0.0010726525
100%|██████████| 25093/25093 [01:46<00:00, 235.52it/s]
Epoch: 019, Loss: 0.0010728291
100%|██████████| 25093/25093 [01:52<00:00, 223.66it/s]
Epoch: 020, Loss: 0.0010729128
100%|██████████| 25093/25093 [00:57<00:00, 435.58it/s]
100%|██████████| 2295/2295 [01:07<00:00, 33.82it/s]
100%|██████████| 2295/2295 [01:05<00:00, 34.94it/s]
Train: 0.0385, Val: 0.0000, Test: 0.0000
100%|██████████| 25093/25093 [01:45<00:00, 237.77it/s]
Epoch: 021, Loss: 0.0010729161

JiaruiWang May 14, 2022
Author

One observation is that if I don't use the data loader in the training, but use the whole dataset. The loss will go down.

rusty1s May 14, 2022
Maintainer

I see. It looks like your training function is somewhat wrong as you are also training on non-seed nodes generated by the NeighborLoader. You need to first slice the data according to batch_size:

out = model(sampled_data.x, adj.t())
loss = criterion(out[:sampled_data.batch_size], 
                 sampled_data.y[:sampled_data.batch_size])

JiaruiWang · 2022-05-20T04:34:46Z

JiaruiWang
May 20, 2022
Author

Thank you @rusty1s! That fixes the constant loss problem.
However, I run into another problem. The label distribution for my data is very imbalanced. There are 52 label classes in the dataset, 6,000,000 nodes. Most of the class counts are less than 1.5%, the largest class is 15% of the total data.

If I separate the dataset into 90% train, 5% validation, and 5% test randomly. The model will classify all the nodes into the largest label class.
If I make the training set from 20,000 nodes for each class, 1,000,000 nodes in total. The model will also classify all the nodes into one class, but the class is random.

Do you have any suggestions? Is this underfitting?

1 reply

rusty1s May 21, 2022
Maintainer

Please have a look at https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.ImbalancedSampler.

Loss remain constant for training #4546

Uh oh!

Uh oh!

JiaruiWang Apr 27, 2022

Replies: 2 comments · 7 replies

Uh oh!

rusty1s Apr 27, 2022 Maintainer

Uh oh!

rusty1s Apr 28, 2022 Maintainer

Uh oh!

JiaruiWang May 14, 2022 Author

Uh oh!

JiaruiWang May 14, 2022 Author

Uh oh!

JiaruiWang May 14, 2022 Author

Uh oh!

rusty1s May 14, 2022 Maintainer

Uh oh!

Uh oh!

JiaruiWang May 20, 2022 Author

Uh oh!

rusty1s May 21, 2022 Maintainer

JiaruiWang
Apr 27, 2022

Replies: 2 comments 7 replies

rusty1s
Apr 27, 2022
Maintainer

rusty1s Apr 28, 2022
Maintainer

JiaruiWang May 14, 2022
Author

JiaruiWang May 14, 2022
Author

JiaruiWang May 14, 2022
Author

rusty1s May 14, 2022
Maintainer

JiaruiWang
May 20, 2022
Author

rusty1s May 21, 2022
Maintainer