|
| 1 | +This rule is specific to Deep Learning architectures built with convolutional layers followed by Batch Normalization. |
| 2 | + |
| 3 | +Using a bias term in a convolutional layer that is immediately followed by a Batch Normalization (BatchNorm) layer is redundant and unnecessary. In such cases, the bias added by the convolution is effectively canceled out during the normalization process, as BatchNorm subtracts the mean and applies its own learnable affine transformation. As a result, the bias from the convolutional layer has no practical effect on the model's output. Removing it reduces the number of parameters—improving model efficiency slightly in terms of memory usage and emissions—while maintaining or even slightly improving training accuracy. |
| 4 | + |
| 5 | +== Non Compliant Code Example |
| 6 | + |
| 7 | +[source,python] |
| 8 | +---- |
| 9 | +nn.Sequential( |
| 10 | + nn.Conv2d(in_channels, out_channels, kernel_size, bias=True), |
| 11 | + nn.BatchNorm2d(out_channels), |
| 12 | + nn.ReLU() |
| 13 | +) |
| 14 | +---- |
| 15 | + |
| 16 | +In this example, a convolutional layer includes a bias term, which is unnecessary when immediately followed by a BatchNorm layer. |
| 17 | + |
| 18 | +== Compliant Solution |
| 19 | + |
| 20 | +[source,python] |
| 21 | +---- |
| 22 | +nn.Sequential( |
| 23 | + nn.Conv2d(in_channels, out_channels, kernel_size, bias=False), |
| 24 | + nn.BatchNorm2d(out_channels), |
| 25 | + nn.ReLU() |
| 26 | +) |
| 27 | +---- |
| 28 | + |
| 29 | +Since `BatchNorm2d` normalizes and shifts the output using learnable parameters, the bias from the preceding convolution becomes redundant. |
| 30 | + |
| 31 | +== Relevance Analysis |
| 32 | + |
| 33 | +Local experiments were conducted to assess the impact of disabling bias in convolutional layers followed by BatchNorm. |
| 34 | + |
| 35 | +=== Configuration |
| 36 | + |
| 37 | +* Processor: Intel(R) Xeon(R) CPU 3.80GHz |
| 38 | +* RAM: 64GB |
| 39 | +* GPU: NVIDIA Quadro RTX 6000 |
| 40 | +* CO₂ Emissions Measurement: https://mlco2.github.io/codecarbon/[CodeCarbon] |
| 41 | +* Framework: PyTorch |
| 42 | + |
| 43 | +=== Context |
| 44 | + |
| 45 | +Two models were trained under identical settings: |
| 46 | +- One with `bias=True` in convolutional layers preceding BatchNorm |
| 47 | +- One with `bias=False` |
| 48 | + |
| 49 | +The following metrics were compared: |
| 50 | +- Training time per epoch |
| 51 | +- GPU memory usage |
| 52 | +- Parameter count |
| 53 | +- Training and test accuracy |
| 54 | +- CO₂ emissions per epoch |
| 55 | + |
| 56 | +=== Impact Analysis |
| 57 | + |
| 58 | +image::convresult.png[] |
| 59 | + |
| 60 | +- **Training Time:** Nearly identical (~30 seconds/epoch) between configurations. |
| 61 | +- **Memory Usage:** lower for the "Without Bias" model in terms of reserved GPU memory. |
| 62 | +- **Training Accuracy:** We can see that there's no significant difference in training accuracy between the two models, with both converging to similar values. |
| 63 | +- **Test Accuracy:** Final accuracy remained the same for both models, confirming no negative impact. |
| 64 | +- **Parameter Count:** |
| 65 | + - With Bias: 155,850 |
| 66 | + - Without Bias: 155,626 |
| 67 | + This shows a real reduction in parameters. |
| 68 | +- **Emissions:** Emissions per epoch were fractionally lower without bias, due to a leaner architecture and reduced operations. |
| 69 | + |
| 70 | +== Conclusion |
| 71 | + |
| 72 | +Disabling bias in convolutional layers followed by BatchNorm: |
| 73 | + |
| 74 | +- Reduces the parameter count |
| 75 | +- Optimizes memory and emissions |
| 76 | +- Maintains accuracy |
| 77 | + |
| 78 | +== References |
| 79 | + |
| 80 | +Credit : https://github.com/AghilesAzzoug/GreenPyData |
| 81 | +- https://arxiv.org/pdf/1502.03167 |
| 82 | +- https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html |
0 commit comments