Low results when validated using multiple GPUs / DDP as compared to single GPU #7133

VirajBagal · 2021-04-20T09:44:49Z

VirajBagal
Apr 20, 2021

I trained a model using the following LightningModule. When validated using single GPU and using the trained weights, I get val_f1 score of 0.288. But when validated using multiple GPUs and DDP, I get val_f1 score of 0.082. How can I resolve this?

In the following code, I have also tried using the 'gather_all_tensors' function, but it didn't help.

class ShopeeModel_PL(pl.LightningModule):

    def __init__(self, CFG):
        super().__init__()

        self.cfg = CFG
        df = pd.read_csv(CFG.train_csv)
        tmp = df.groupby('label_group').posting_id.agg('unique').to_dict()
        df['target'] = df.label_group.map(tmp)
        self.valdf = df[df['fold'] == 4].reset_index(drop = True)
        num_classes = df[df['fold'] != 4]['label_group'].nunique()
        print('Number of classes: ', num_classes)
        self.model = ShopeeModel(num_classes, CFG.model_name, CFG.fc_dim, CFG.margin, CFG.scale)

        self.scheduler_params = {
        "lr_start": 1e-5,
        "lr_max": 1e-5 * self.cfg.batch_size,     # 1e-5 * 32 (if batch_size(=32) is different then)
        "lr_min": 1e-6,
        "lr_ramp_ep": 5,
        "lr_sus_ep": 0,
        "lr_decay": 0.8,
        }

    def shared_step(self, batch):

        image, label = batch['image'], batch['label']
        _, loss = self.model(image, label)

        return loss

    def training_step(self, batch, batch_idx):
        loss = self.shared_step(batch)
        self.log_dict({'train_loss': loss}, sync_dist = True)
        return loss

    def validation_step(self,batch, batch_idx):
        features = self.shared_step(batch)
 
        return features

    def test_step(self, batch, batch_idx):
        return self.validation_step(batch, batch_idx)

    def test_epoch_end(self, embeds):
        return self.validation_epoch_end(embeds)

    def validation_epoch_end(self, embeds):

        image_embeddings = torch.cat(embeds)  
        image_embeddings = self.sync_across_gpus(image_embeddings)
        # image_embeddings = gather_all_tensors(image_embeddings)
        image_embeddings = image_embeddings.detach().cpu().numpy()
        assert len(self.valdf) == len(image_embeddings), "They shld match"

        if self.trainer.running_sanity_check:
            predictions = self.get_image_neighbors(self.valdf, image_embeddings, KNN = 2)
            dummy_valdf = self.valdf.sample(len(predictions))
            dummy_valdf['oof'] = predictions
            dummy_valdf['f1'] = dummy_valdf.apply(self.getMetric('oof'),axis=1)
            final_f1 = dummy_valdf.mean()
        else:
            predictions = self.get_image_neighbors(self.valdf, image_embeddings)
            # predictions = self.valdf.groupby('image_phash').posting_id.agg('unique').to_dict()
            # self.valdf['oof'] = self.valdf.image_phash.map(predictions)
            # print(self.valdf['oof'].head())
            self.valdf['oof'] = predictions
            self.valdf['f1'] = self.valdf.apply(self.getMetric('oof'),axis=1)
            final_f1 = self.valdf['f1'].mean()
            print(final_f1)

        self.log_dict({'val_f1': final_f1})  
        
        
    def sync_across_gpus(self, t):   # t is a tensor
       
        gather_t_tensor = [torch.zeros_like(t) for _ in range(self.trainer.world_size)]
        torch.distributed.all_gather(gather_t_tensor, t)
        return torch.cat(gather_t_tensor)

    def configure_optimizers(self):

        # optimizer = torch.optim.Adam(self.model.parameters(),
        #                          lr = self.scheduler_params['lr_start'])
        optimizer = torch.optim.Adam(self.model.parameters(),
                                 lr = 3e-4)
        # scheduler = ShopeeScheduler(optimizer, **self.scheduler_params)
        scheduler = {'scheduler': torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode = 'max', patience = 5, factor = 0.1),
                    'monitor': 'val_f1'
                    }

        return {'optimizer': optimizer, 'lr_scheduler': scheduler}

    def get_image_neighbors(self, df, embeddings, KNN=50):

        model = NearestNeighbors(n_neighbors = KNN)
        model.fit(embeddings)
        distances, indices = model.kneighbors(embeddings)
        
        threshold = 4.5
        predictions = []
        for k in tqdm(range(embeddings.shape[0])):
            idx = np.where(distances[k,] < threshold)[0]
            ids = indices[k,idx]
            posting_ids = df['posting_id'].iloc[ids].values
            predictions.append(posting_ids)
            
        del model, distances, indices
        gc.collect()
        return predictions

    def getMetric(self, col):
        def f1score(row):
            n = len( np.intersect1d(row.target,row[col]) )
            return 2*n / (len(row.target)+len(row[col]))
        return f1score

I use the following Trainer for single GPU.

    trainer = pl.Trainer(gpus = [1], max_epochs = args.epochs, accelerator='ddp', \
         plugins=DDPPlugin(find_unused_parameters=False), callbacks = [ckpt_logger, lr_monitor], logger = wandblogger,
        precision = 16)

For 2 GPUs, I do the following:

    trainer = pl.Trainer(gpus = 2, max_epochs = args.epochs, accelerator='ddp', \
         plugins=DDPPlugin(find_unused_parameters=False), callbacks = [ckpt_logger, lr_monitor], logger = wandblogger,
        precision = 16)

When I use the following for single GPU, then too I get val_f1 as 0.082

    trainer = pl.Trainer(gpus = 1, max_epochs = args.epochs, accelerator='ddp', \
         plugins=DDPPlugin(find_unused_parameters=False), callbacks = [ckpt_logger, lr_monitor], logger = wandblogger,
        precision = 16)

awaelchli · 2021-04-20T10:06:49Z

awaelchli
Apr 20, 2021

Are you computing an F1 score here for classification? Apologies I have not studied the code in detail but for correct metric reduction and logging we recommend the torchmetrics.

TorchMetrics overview: https://torchmetrics.readthedocs.io/en/latest/pages/overview.html#overview
F1 score: https://torchmetrics.readthedocs.io/en/latest/references/modules.html?highlight=F1#f1

Is there evidence that this is a bug or is it ok to convert this to a question/implementation help?

0 replies

VirajBagal · 2021-04-20T14:31:03Z

VirajBagal
Apr 20, 2021
Author

Sorry, I think this should be converted to a question/implementation help.

The target in this problem is the list of ''posting_id' that is similar to the current posting id. So, over here, there is no particular class. In validation, I am doing unsupervised clustering. If you look at the getMetric function, I am basically calculating the intersection of posting ids predicted by my model and the target posting ids. Each row in the dataframe has a different posting_id. For each row in the dataframe, I am calculating F1 score and taking mean of all rows as the final F1 score. So, here there is no notion of classes. I can't think of any way to make the torchmetrics F1 score work over here. It asks me to declare number of classes, which I dont have any. Each sample can have any number of posting ids as the target value. Any suggestions in the getMetric function that I can do to make it work for the multiGPU validation?

0 replies

awaelchli · 2021-04-21T07:58:52Z

awaelchli
Apr 21, 2021

Can you guarantee that [syncing across GPUs then computing the metric] is the same as [computing the metric on each GPU then averaging the metric over the GPUs]?

And btw, if the dataloader size is not divisible by the number of GPUs, the DistributedSampler will repeat batches so that every GPU has the same number of batches. So for this reason I recommend to run the trainer.test() on one GPU to make sure that you get the expected metrics correctly before going to the multi GPU case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low results when validated using multiple GPUs / DDP as compared to single GPU #7133

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Low results when validated using multiple GPUs / DDP as compared to single GPU #7133

Uh oh!

VirajBagal Apr 20, 2021

Replies: 3 comments

Uh oh!

Uh oh!

awaelchli Apr 20, 2021

Uh oh!

Uh oh!

VirajBagal Apr 20, 2021 Author

Uh oh!

Uh oh!

awaelchli Apr 21, 2021

VirajBagal
Apr 20, 2021

awaelchli
Apr 20, 2021

VirajBagal
Apr 20, 2021
Author

awaelchli
Apr 21, 2021