UNet vs BasicUNet speed and memory usage. #3048

ramonemiliani93 · 2021-09-29T16:59:37Z

ramonemiliani93
Sep 29, 2021

Hi, I was looking at the implementations of both UNet and BasicUNet. While the BasicUNet implementation is very straightforward, the UNet implementation uses a recursive function to build level by level. I tested both using common configurations that yielded a somewhat similar parameter count. Here are the functions I used:

def count_params(model: torch.nn.Module):
    model_parameters = filter(lambda p: p.requires_grad, model.parameters())
    params = sum([np.prod(p.size()) for p in model_parameters])
    return params

def inference_time(model: torch.nn.Module, tensor: torch.Tensor) -> float:
    """https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964"""
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)

    start.record()
    model(tensor)
    end.record()

    torch.cuda.synchronize()

    return start.elapsed_time(end)


def profile(model: torch.nn.Module, tensor_shape: Tuple[int, ...], repeat: int):
    tensor = torch.ones(tensor_shape, dtype=torch.float).cuda()
    model = model.cuda()

    print(f"{model.__class__.__name__}: {count_params(model)} parameters")

    times = [inference_time(model, tensor) for _ in range(repeat)]
    print(f"{model.__class__.__name__} inference time: {np.mean(times)} ms")

And these are the tested architectures:

    unet = UNet(
        dimensions=3,
        in_channels=1,
        out_channels=2,
        channels=(16, 32, 64, 128, 256),
        strides=(2, 2, 2, 2),
        num_res_units=2,
    )
    basic_unet = BasicUNet()

The results I got showed that Unet is almost 10x faster than BasicUNet for performing inference on a (1, 1, 128, 128, 128) ones volume. Are there any specific architectural changes that would explain such a difference between the inference times?

Furthermore if I try increasing the batch size from 1 to 2, i.e. (2, 1, 128, 128, 128) I run into a memory error using BasicUNet while for UNet I can increase it much more. In BasicUNet the tensors x0 to x4 and u4 to u1 are kept until logits is returned will the recursive implementation help with the memory consumption? or is this just the benefit of using a 2-strided convolution?

Any comments are very much appreciated! Thanks!

Nic-Ma · 2021-09-29T23:41:52Z

Nic-Ma
Sep 29, 2021
Maintainer

Hi @ericspod ,

Could you please help share some details of the implementation that may cause this difference?

Thanks in advance.

0 replies

rijobro · 2021-09-30T08:53:26Z

rijobro
Sep 30, 2021
Collaborator

@ramonemiliani93

passing both through torch-summary, although the number of trainable parameters are similar, other values are different by over factor 10, e.g., total mul-adds, forward/backward pass size, and estimated total size. I haven't looked at the implementation of BasicUNet, but suspect this could explain the performance difference.

Output from summary(unet, (1, 128, 128, 128)):

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
├─Sequential: 1-1                             [-1, 2, 128, 128, 128]    --
|    └─ResidualUnit: 2-1                      [-1, 16, 64, 64, 64]      --
|    |    └─Conv3d: 3-1                       [-1, 16, 64, 64, 64]      448
|    |    └─Sequential: 3-2                   [-1, 16, 64, 64, 64]      7,378
|    └─SkipConnection: 2-2                    [-1, 32, 64, 64, 64]      --
|    |    └─Sequential: 3-3                   [-1, 16, 64, 64, 64]      4,796,814
|    └─Sequential: 2-3                        [-1, 2, 128, 128, 128]    --
|    |    └─Convolution: 3-4                  [-1, 2, 128, 128, 128]    1,731
|    |    └─ResidualUnit: 3-5                 [-1, 2, 128, 128, 128]    110
===============================================================================================
Total params: 4,806,481
Trainable params: 4,806,481
Non-trainable params: 0
Total mult-adds (G): 3.76
===============================================================================================
Input size (MB): 8.00
Forward/backward pass size (MB): 64.00
Params size (MB): 18.34
Estimated Total Size (MB): 90.34
===============================================================================================

Output from summary(basic_unet, (1, 128, 128, 128)):

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─TwoConv: 1-1                           [-1, 32, 128, 128, 128]   --
|    └─Convolution: 2-1                  [-1, 32, 128, 128, 128]   --
|    |    └─Conv3d: 3-1                  [-1, 32, 128, 128, 128]   896
|    |    └─ADN: 3-2                     [-1, 32, 128, 128, 128]   64
|    └─Convolution: 2-2                  [-1, 32, 128, 128, 128]   --
|    |    └─Conv3d: 3-3                  [-1, 32, 128, 128, 128]   27,680
|    |    └─ADN: 3-4                     [-1, 32, 128, 128, 128]   64
├─Down: 1-2                              [-1, 32, 64, 64, 64]      --
|    └─MaxPool3d: 2-3                    [-1, 32, 64, 64, 64]      --
|    └─TwoConv: 2-4                      [-1, 32, 64, 64, 64]      --
|    |    └─Convolution: 3-5             [-1, 32, 64, 64, 64]      27,744
|    |    └─Convolution: 3-6             [-1, 32, 64, 64, 64]      27,744
├─Down: 1-3                              [-1, 64, 32, 32, 32]      --
|    └─MaxPool3d: 2-5                    [-1, 32, 32, 32, 32]      --
|    └─TwoConv: 2-6                      [-1, 64, 32, 32, 32]      --
|    |    └─Convolution: 3-7             [-1, 64, 32, 32, 32]      55,488
|    |    └─Convolution: 3-8             [-1, 64, 32, 32, 32]      110,784
├─Down: 1-4                              [-1, 128, 16, 16, 16]     --
|    └─MaxPool3d: 2-7                    [-1, 64, 16, 16, 16]      --
|    └─TwoConv: 2-8                      [-1, 128, 16, 16, 16]     --
|    |    └─Convolution: 3-9             [-1, 128, 16, 16, 16]     221,568
|    |    └─Convolution: 3-10            [-1, 128, 16, 16, 16]     442,752
├─Down: 1-5                              [-1, 256, 8, 8, 8]        --
|    └─MaxPool3d: 2-9                    [-1, 128, 8, 8, 8]        --
|    └─TwoConv: 2-10                     [-1, 256, 8, 8, 8]        --
|    |    └─Convolution: 3-11            [-1, 256, 8, 8, 8]        885,504
|    |    └─Convolution: 3-12            [-1, 256, 8, 8, 8]        1,770,240
├─UpCat: 1-6                             [-1, 128, 16, 16, 16]     --
|    └─UpSample: 2-11                    [-1, 128, 16, 16, 16]     --
|    |    └─ConvTranspose3d: 3-13        [-1, 128, 16, 16, 16]     262,272
|    └─TwoConv: 2-12                     [-1, 128, 16, 16, 16]     --
|    |    └─Convolution: 3-14            [-1, 128, 16, 16, 16]     885,120
|    |    └─Convolution: 3-15            [-1, 128, 16, 16, 16]     442,752
├─UpCat: 1-7                             [-1, 64, 32, 32, 32]      --
|    └─UpSample: 2-13                    [-1, 64, 32, 32, 32]      --
|    |    └─ConvTranspose3d: 3-16        [-1, 64, 32, 32, 32]      65,600
|    └─TwoConv: 2-14                     [-1, 64, 32, 32, 32]      --
|    |    └─Convolution: 3-17            [-1, 64, 32, 32, 32]      221,376
|    |    └─Convolution: 3-18            [-1, 64, 32, 32, 32]      110,784
├─UpCat: 1-8                             [-1, 32, 64, 64, 64]      --
|    └─UpSample: 2-15                    [-1, 32, 64, 64, 64]      --
|    |    └─ConvTranspose3d: 3-19        [-1, 32, 64, 64, 64]      16,416
|    └─TwoConv: 2-16                     [-1, 32, 64, 64, 64]      --
|    |    └─Convolution: 3-20            [-1, 32, 64, 64, 64]      55,392
|    |    └─Convolution: 3-21            [-1, 32, 64, 64, 64]      27,744
├─UpCat: 1-9                             [-1, 32, 128, 128, 128]   --
|    └─UpSample: 2-17                    [-1, 32, 128, 128, 128]   --
|    |    └─ConvTranspose3d: 3-22        [-1, 32, 128, 128, 128]   8,224
|    └─TwoConv: 2-18                     [-1, 32, 128, 128, 128]   --
|    |    └─Convolution: 3-23            [-1, 32, 128, 128, 128]   55,392
|    |    └─Convolution: 3-24            [-1, 32, 128, 128, 128]   27,744
├─Conv3d: 1-10                           [-1, 2, 128, 128, 128]    66
==========================================================================================
Total params: 5,749,410
Trainable params: 5,749,410
Non-trainable params: 0
Total mult-adds (G): 320.65
==========================================================================================
Input size (MB): 8.00
Forward/backward pass size (MB): 4038.00
Params size (MB): 21.93
Estimated Total Size (MB): 4067.93
==========================================================================================

1 reply

ramonemiliani93 Sep 30, 2021
Author

Hi @rijobro thanks a lot for the insight! I didn't know about the torch-summary package 🙌 Just comparing from the first block following what you did:

import torchinfo
from monai.networks.blocks import ResidualUnit

residual = ResidualUnit(dimensions=3, in_channels=1, out_channels=16, strides=2)
torchinfo.summary(residual, (1, 1, 128, 128, 128))

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ResidualUnit                             --                        --
├─Conv3d: 1-1                            [1, 16, 64, 64, 64]       448
├─Sequential: 1-2                        [1, 16, 64, 64, 64]       --
│    └─Convolution: 2-1                  [1, 16, 64, 64, 64]       --
│    │    └─Conv3d: 3-1                  [1, 16, 64, 64, 64]       448
│    │    └─ADN: 3-2                     [1, 16, 64, 64, 64]       1
│    └─Convolution: 2-2                  [1, 16, 64, 64, 64]       --
│    │    └─Conv3d: 3-3                  [1, 16, 64, 64, 64]       6,928
│    │    └─ADN: 3-4                     [1, 16, 64, 64, 64]       1
==========================================================================================
Total params: 7,826
Trainable params: 7,826
Non-trainable params: 0
Total mult-adds (G): 2.05
==========================================================================================
Input size (MB): 8.39
Forward/backward pass size (MB): 167.77
Params size (MB): 0.03
Estimated Total Size (MB): 176.19
==========================================================================================

while for the TwoConv block used in the BasicUNet we have:

import torchinfo
from monai.networks.nets.basic_unet import TwoConv

two_conv = two_conv(dim=3, in_chns=1, out_chns=16, act="relu", norm="batch")
torchinfo.summary(two_conv, (1, 1, 128, 128, 128))

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
TwoConv                                  --                        --
├─Convolution: 1-1                       [1, 16, 128, 128, 128]    --
│    └─Conv3d: 2-1                       [1, 16, 128, 128, 128]    448
│    └─ADN: 2-2                          [1, 16, 128, 128, 128]    --
│    │    └─BatchNorm3d: 3-1             [1, 16, 128, 128, 128]    32
│    │    └─Dropout: 3-2                 [1, 16, 128, 128, 128]    --
│    │    └─ReLU: 3-3                    [1, 16, 128, 128, 128]    --
├─Convolution: 1-2                       [1, 16, 128, 128, 128]    --
│    └─Conv3d: 2-3                       [1, 16, 128, 128, 128]    6,928
│    └─ADN: 2-4                          [1, 16, 128, 128, 128]    --
│    │    └─BatchNorm3d: 3-4             [1, 16, 128, 128, 128]    32
│    │    └─Dropout: 3-5                 [1, 16, 128, 128, 128]    --
│    │    └─ReLU: 3-6                    [1, 16, 128, 128, 128]    --
==========================================================================================
Total params: 7,440
Trainable params: 7,440
Non-trainable params: 0
Total mult-adds (G): 15.47
==========================================================================================
Input size (MB): 8.39
Forward/backward pass size (MB): 1073.74
Params size (MB): 0.03
Estimated Total Size (MB): 1082.16
==========================================================================================

Probably the difference in the number of operations stems from BasicUNet using a 1-strided convolution + maxpool and UNet using simply a 2-strided convolution. Though I am not entirely sure, and please @ericspod correct me if I am wrong, UNet does not have feature maps at the original scale of the image. This means the image is mostly processed at a (64, 64, 64) scale and then upsampled to (128, 128, 128) using a transposed convolution i.e there are no skip connections from (128, 128, 128) maps.

ericspod · 2021-09-30T15:36:41Z

ericspod
Sep 30, 2021
Maintainer

To follow on from @rijobro 's analysis, running the following will produce a full report of the layers in the networks as you've already done:

from monai.networks.nets import UNet, BasicUNet
import torchinfo

unet = UNet(
    dimensions=3,
    in_channels=1,
    out_channels=2,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
)

basic_unet = BasicUNet()

print(torchinfo.summary(unet, (1, 1, 128, 128, 128), depth=20))
print(torchinfo.summary(basic_unet, (1, 1, 128, 128, 128), depth=20))

The slight difference here is setting depth to a large value so you can see every layer. In summary, without the layers for brevity:

============================================================================================================================================
Layer (type:depth-idx)                                                                     Output Shape              Param #
============================================================================================================================================
UNet                                                                                       --                        --
============================================================================================================================================
Total params: 4,806,481
Trainable params: 4,806,481
Non-trainable params: 0
Total mult-adds (G): 27.23
============================================================================================================================================
Input size (MB): 8.39
Forward/backward pass size (MB): 504.89
Params size (MB): 19.23
Estimated Total Size (MB): 532.50
============================================================================================================================================

====================================================================================================
Layer (type:depth-idx)                             Output Shape              Param #
====================================================================================================
BasicUNet                                          --                        --
====================================================================================================
Total params: 5,749,410
Trainable params: 5,749,410
Non-trainable params: 0
Total mult-adds (G): 321.02
====================================================================================================
Input size (MB): 8.39
Forward/backward pass size (MB): 5662.31
Params size (MB): 23.00
Estimated Total Size (MB): 5693.70
====================================================================================================

What we're seeing here with an input of (1, 1, 128, 128, 128) the UNet object has a similar number of parameters (4,806,481) compared to the BasicUNet (5,749,410), however its forward/backward pass size is less than 10%. This is due to the use of more convolutions in the BasicUNet which also are applied to data at the original input size. The UNet class downsamples input data immediately with a strided convolution so never applies layers to the full size of the data except at the end of the decode half, this reduces the memory significantly without really impact results. In terms of speed this will make a big difference and I suspect the implementation of the forward pass in UNet will be a bit faster as well. The decode half of UNet is also lighter weight than the encode half, having fewer convolutions to do the reconstruction of the result, and so requiring less memory and time. Downsampling is also done with strided convolutions only and upsampling with strided transpose convolutions only, the lack of pooling layers saves memory and time as well.

0 replies

ramonemiliani93 · 2021-09-30T15:40:33Z

ramonemiliani93
Sep 30, 2021
Author

Thanks a lot @ericspod and @rijobro for your explanation! 🙌

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UNet vs BasicUNet speed and memory usage. #3048

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

UNet vs BasicUNet speed and memory usage. #3048

Uh oh!

ramonemiliani93 Sep 29, 2021

Replies: 4 comments · 1 reply

Uh oh!

Nic-Ma Sep 29, 2021 Maintainer

Uh oh!

rijobro Sep 30, 2021 Collaborator

Uh oh!

ramonemiliani93 Sep 30, 2021 Author

Uh oh!

Uh oh!

ericspod Sep 30, 2021 Maintainer

Uh oh!

ramonemiliani93 Sep 30, 2021 Author

ramonemiliani93
Sep 29, 2021

Replies: 4 comments 1 reply

Nic-Ma
Sep 29, 2021
Maintainer

rijobro
Sep 30, 2021
Collaborator

ramonemiliani93 Sep 30, 2021
Author

ericspod
Sep 30, 2021
Maintainer

ramonemiliani93
Sep 30, 2021
Author