Skip to content

Commit b5e59ed

Browse files
committed
june update
1 parent 05d3612 commit b5e59ed

File tree

10 files changed

+254
-20
lines changed

10 files changed

+254
-20
lines changed

README.md

Lines changed: 78 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,117 @@
1-
# AI Image Signal Processing and ISPs
1+
# AI Image Signal Processing and Computational Photography
2+
## Deep learning for low-level computer vision and imaging
23

3-
[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2201.03210)
4+
[![isp](https://img.shields.io/badge/ISP-paper-lightgreen)](https://arxiv.org/abs/2201.03210)
5+
[![lpienet](https://img.shields.io/badge/LPIENet-paper-lightpink)](https://arxiv.org/abs/2210.13552)
6+
[![bokeh](https://img.shields.io/badge/Bokeh-paper-9cf)](https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/papers/Seizinger_Efficient_Multi-Lens_Bokeh_Effect_Rendering_and_Transformation_CVPRW_2023_paper.pdf)
7+
[![ntire23](https://img.shields.io/badge/NTIRE-CVPR23-lightcyan)](https://cvlai.net/ntire/2023/)
48
![visitors](https://visitor-badge.glitch.me/badge?page_id=mv-lab/AISP)
59

610

7-
[Marcos V. Conde](https://scholar.google.com/citations?user=NtB1kjYAAAAJ&hl=en), [Radu Timofte](https://scholar.google.com/citations?user=u3MwH5kAAAAJ&hl=en)
11+
**[Marcos V. Conde](https://scholar.google.com/citations?user=NtB1kjYAAAAJ&hl=en), [Radu Timofte](https://scholar.google.com/citations?user=u3MwH5kAAAAJ&hl=en)**
812

9-
[Computer Vision Lab, CAIDAS, University of Würzburg](https://www.informatik.uni-wuerzburg.de/computervision/home/)
13+
[Computer Vision Lab, CAIDAS, University of Würzburg](https://www.informatik.uni-wuerzburg.de/computervision/home/)
1014

1115
---------------------------------------------------
1216

17+
> **Topics** This repository contains material for RAW image processing, RAW image reconstruction and synthesis, learned Image Signal Processing (ISP), Image Enhancement and Restoration (denoising, deblurring), Multi-lense Bokeh effect rendering, and much more! 📷
18+
19+
<br>
20+
1321
#### Official repository for the following works:
1422

23+
1. **[Efficient Multi-Lens Bokeh Effect Rendering and Transformation](https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/papers/Seizinger_Efficient_Multi-Lens_Bokeh_Effect_Rendering_and_Transformation_CVPRW_2023_paper.pdf)** at **CVPR NTIRE 2023**.
1524
1. **[Perceptual Image Enhancement for Smartphone Real-Time Applications](https://arxiv.org/abs/2210.13552) (LPIENet) at WACV 2023.**
1625
1. **[Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report](aim22-reverseisp/) ECCV, AIM 2022**
17-
1. **[Model-Based Image Signal Processors via Learnable Dictionaries](https://ojs.aaai.org/index.php/AAAI/article/view/19926) AAAI 2022 Oral**
26+
1. **[Model-Based Image Signal Processors via Learnable Dictionaries](https://arxiv.org/abs/2201.03210) AAAI 2022 Oral**
1827
1. [MAI 2022 Learned ISP Challenge](#mai-2022-learned-isp-challenge) Complete Baseline solution
19-
1. [Citation and Acknowledgement](#citation-and-acknowledgement) | [Contact](#contact)
28+
1. [Citation and Acknowledgement](#citation-and-acknowledgement) | [Contact](#contact) for any inquiries.
2029

2130
**News 🚀🚀**
2231

23-
- [11/2022] LPIENet release soon!
32+
- will try to keep the repo updated on a monthly basis ✏️
33+
- [06/2023] Lens-to-lens bokeh effect transformation and NTIRE 2023 material coming soon.
34+
- [01/202] LPIENet material is out
2435
- [10/2022] Reversed ISP and RAW Reconstruction material presented at AIM workshop ECCV 2022 is now available! [check here](aim22-reverseisp/)
2536

26-
---------------------------------------------------
37+
| | | | |
38+
|:--- |:--- |:--- |:---|
39+
| <a href="https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/papers/Seizinger_Efficient_Multi-Lens_Bokeh_Effect_Rendering_and_Transformation_CVPRW_2023_paper.pdf"><img src="media/papers/bokeh-ntire23.png" width="300" border="0"></a> | <a href="https://arxiv.org/abs/2210.13552"><img src="media/papers/lpienet-wacv23.png" width="300" border="0"></a> | <a href="https://arxiv.org/abs/2210.11153"><img src="media/papers/reisp-aim22.png" width="255" border="0"></a> | <a href="https://arxiv.org/abs/2201.03210"><img src="media/papers/isp-aaai22.png" width="300" border="0"></a>
40+
| | | | |
2741

28-
## [AIM 2022 Reversed ISP Challenge](aim22-reverseisp/)
42+
------
2943

30-
### [Track 1 - S7](https://codalab.lisn.upsaclay.fr/competitions/5079) | [Track 2 - P20](https://codalab.lisn.upsaclay.fr/competitions/5080)
44+
## [Perceptual Image Enhancement for Smartphone Real-Time Applications](https://arxiv.org/abs/2210.13552) (WACV '23)
3145

32-
<a href="https://data.vision.ee.ethz.ch/cvl/aim22/"><img src="https://i.ibb.co/VJ7SSQj/aim-challenge-teaser.png" alt="aim-challenge-teaser" width="400" border="0"></a>
46+
*This work was presented at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023.*
3347

34-
In this challenge, we look for solutions to recover RAW readings from the camera using only the corresponding RGB images processed by the in-camera ISP. Successful solutions should generate plausible RAW images, and by doing this, other downstream tasks like Denoising, Super-resolution or Colour Constancy can benefit from such synthetic data generation. Click [here to read more information](aim22-reverseisp/README.md) about the challenge.
48+
> Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images e.g., noise, diffraction artifacts, blur, and HDR overexposure.
49+
We propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones.
3550

36-
### Starter guide and code 🔥
51+
The code is available at **[lpienet](lpienet/)** including versions in Pytorch and Tensorflow. We also include the model conversion to TFLite, so you can generate the corresponding `.tflite` file and run the model using the `AI Benchmark` app on android devices.
52+
In *[lpienet-tflite.ipynb](lpienet/lpienet-tflite.ipynb)* you can find a complete tutorial to transform the model to tflite.
3753

38-
- **[aim-starter-code.ipynb](aim22-reverseisp/official-starter-code.ipynb)** - Simple dataloading and visualization of RGB-RAW pairs + other utils.
39-
- **[aim-baseline.ipynb](aim22-reverseisp/official-baseline.ipynb)** - End-to-end guide to load the data, train a simple UNet model and make your first submission!
54+
**Contributions**
55+
- The model can process 4K images under 1s on commercial smartphones.
56+
- We achieve competitive results in comparison to SOTA methods in relevant benchmarks for denoising, deblurring and HDR correction. For example the SIDD benchmark.
57+
- We reduce NAFNet number of MACs (or FLOPs) by 50 times.
58+
59+
<details>
60+
<summary>Click here to read the abstract</summary>
61+
<p>Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements.
62+
63+
In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
64+
<br>
65+
</p>
66+
</details>
67+
<br>
68+
69+
70+
71+
<a href="https://arxiv.org/abs/2210.13552"><img src="media/lpienet.png" alt="lpienet" width="800" border="0"></a>
72+
73+
74+
| | |
75+
| :--- | :--- |
76+
| <img src="lpienet/lpienet-app.png" width="300" border="0"> | <img src="lpienet/lpienet-plot.png" width="450" border="0"> |
77+
| | |
4078

79+
<br>
4180

4281
------
4382

44-
## [Model-Based Image Signal Processors via Learnable Dictionaries](https://ojs.aaai.org/index.php/AAAI/article/view/19926) (AAAI '22 Oral)
83+
## [Model-Based Image Signal Processors via Learnable Dictionaries](https://mv-lab.github.io/model-isp22/) (AAAI '22 Oral)
84+
85+
*This work was presented at the 36th AAAI Conference on Artificial Intelligence, Spotlight (15%)*
4586

4687
[Project website](https://mv-lab.github.io/model-isp22/) where you can find the poster, presentation and more information.
4788

4889
> Hybrid model-based and data-driven approach for modelling ISPs using learnable dictionaries. We explore RAW image reconstruction and improve downstream tasks like RAW Image Denoising via raw data augmentation-synthesis.
4990
5091

51-
<img src="mbispld/mbispld.png" alt="mbdlisp" width="600" border="0">
92+
<a href="https://ojs.aaai.org/index.php/AAAI/article/view/19926/19685"><img src="mbispld/mbispld.png" alt="mbdlisp" width="800" border="0"></a>
93+
94+
95+
If you have implementation questions or you need qualitative samples for comparison, please contact me. You can download the figure/illustration of our method in [mbispld](mbispld/mbispld.pdf).
96+
97+
<br>
98+
99+
------
100+
101+
## [AIM 2022 Reversed ISP Challenge](aim22-reverseisp/)
102+
103+
This work was presented at the European Conference on Computer Vision (ECCV) 2022, AIM workshop.
104+
105+
### [Track 1 - S7](https://codalab.lisn.upsaclay.fr/competitions/5079) | [Track 2 - P20](https://codalab.lisn.upsaclay.fr/competitions/5080)
106+
107+
<a href="https://data.vision.ee.ethz.ch/cvl/aim22/"><img src="https://i.ibb.co/VJ7SSQj/aim-challenge-teaser.png" alt="aim-challenge-teaser" width="500" border="0"></a>
52108

109+
In this challenge, we look for solutions to recover RAW readings from the camera using only the corresponding RGB images processed by the in-camera ISP. Successful solutions should generate plausible RAW images, and by doing this, other downstream tasks like Denoising, Super-resolution or Colour Constancy can benefit from such synthetic data generation. Click [here to read more information](aim22-reverseisp/README.md) about the challenge.
53110

54-
The code will be released soon. If you have implementation questions or you need qualitative samples for comparison, please contact me.
111+
### Starter guide and code 🔥
55112

56-
We provide the figure/illustration of our method in [mbispld](mbispld/mbispld.pdf).
113+
- **[aim-starter-code.ipynb](aim22-reverseisp/official-starter-code.ipynb)** - Simple dataloading and visualization of RGB-RAW pairs + other utils.
114+
- **[aim-baseline.ipynb](aim22-reverseisp/official-baseline.ipynb)** - End-to-end guide to load the data, train a simple UNet model and make your first submission!
57115

58116
------
59117

@@ -94,4 +152,4 @@ We test the model on AI Benchmark. The model average latency is 60ms using a inp
94152

95153
## Contact
96154

97-
Marcos Conde (marcos.conde-osorio@uni-wuerzburg.de) and Radu Timofte (radu.timofte@uni-wuerzburg.de) are the contact persons and direct managers of the AIM challenge. Please add in the email subject "AIM22 Reverse ISP Challenge" or "AISP"
155+
Marcos Conde (marcos.conde@uni-wuerzburg.de) is the contact persons and co-organizer of NTIRE and AIM challenges.

lpienet/lpienet-app.png

740 KB
Loading

lpienet/lpienet-plot.png

149 KB
Loading

lpienet/lpienet-pytorch.py

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
"""
2+
Experiment options:
3+
- Clip input range?!
4+
- Sequential or parallel attention, which order?
5+
- Spatial attention options (see CBAM paper)
6+
- Which down and up sampling method? Pool, Conv, Shuffle, Interpolation
7+
- Add vs. concat skips
8+
- Add FMEN-like Unshuffle/Shuffle
9+
"""
10+
11+
import torch
12+
import torch.nn as nn
13+
import torch.nn.functional as F
14+
from typing import List
15+
16+
17+
class AttentionBlock(nn.Module):
18+
def __init__(self, dim: int):
19+
super(AttentionBlock, self).__init__()
20+
self._spatial_attention_conv = nn.Conv2d(2, dim, kernel_size=3, padding=1)
21+
22+
# Channel attention MLP
23+
self._channel_attention_conv0 = nn.Conv2d(1, dim, kernel_size=1, padding=0)
24+
self._channel_attention_conv1 = nn.Conv2d(dim, dim, kernel_size=1, padding=0)
25+
26+
self._out_conv = nn.Conv2d(2 * dim, dim, kernel_size=1, padding=0)
27+
28+
def forward(self, x: torch.Tensor):
29+
if len(x.shape) != 4:
30+
raise ValueError(f"Expected [B, C, H, W] input, got {x.shape}.")
31+
32+
# Spatial attention
33+
mean = torch.mean(x, dim=1, keepdim=True) # Mean/Max on C axis
34+
max, _ = torch.max(x, dim=1, keepdim=True)
35+
spatial_attention = torch.cat([mean, max], dim=1) # [B, 2, H, W]
36+
spatial_attention = self._spatial_attention_conv(spatial_attention)
37+
spatial_attention = torch.sigmoid(spatial_attention) * x
38+
39+
# Channel attention. TODO: Correct that it only uses average pool contrary to CBAM?
40+
# NOTE/TODO: This differs from CBAM as it uses Channel pooling, not spatial pooling!
41+
# In a way, this is 2x spatial attention
42+
channel_attention = torch.relu(self._channel_attention_conv0(mean))
43+
channel_attention = self._channel_attention_conv1(channel_attention)
44+
channel_attention = torch.sigmoid(channel_attention) * x
45+
46+
attention = torch.cat([spatial_attention, channel_attention], dim=1) # [B, 2*dim, H, W]
47+
attention = self._out_conv(attention)
48+
return x + attention
49+
50+
51+
# TODO: This is not named in the paper right?
52+
# It is sort of the InverseResidualBlock but w/o the Channel and Spatial Attentions and without another Conv after ReLU
53+
class InverseBlock(nn.Module):
54+
def __init__(self, input_channels: int, channels: int):
55+
super(InverseBlock, self).__init__()
56+
57+
self._conv0 = nn.Conv2d(input_channels, channels, kernel_size=1)
58+
self._dw_conv = nn.Conv2d(channels, channels, kernel_size=3, padding=1, groups=channels)
59+
self._conv1 = nn.Conv2d(channels, channels, kernel_size=1)
60+
self._conv2 = nn.Conv2d(input_channels, channels, kernel_size=1)
61+
62+
def forward(self, x: torch.Tensor):
63+
features = self._conv0(x)
64+
features = F.elu(self._dw_conv(features)) # TODO: Paper is ReLU, authors do ELU
65+
features = self._conv1(features)
66+
67+
# TODO: The BaseBlock has residuals and one path of convolutions, not 2 separate paths - is this different on purpose?
68+
x = torch.relu(self._conv2(x))
69+
return x + features
70+
71+
72+
class BaseBlock(nn.Module):
73+
def __init__(self, channels: int):
74+
super(BaseBlock, self).__init__()
75+
76+
self._conv0 = nn.Conv2d(channels, channels, kernel_size=1)
77+
self._dw_conv = nn.Conv2d(channels, channels, kernel_size=3, padding=1, groups=channels)
78+
self._conv1 = nn.Conv2d(channels, channels, kernel_size=1)
79+
80+
self._conv2 = nn.Conv2d(channels, channels, kernel_size=1)
81+
self._conv3 = nn.Conv2d(channels, channels, kernel_size=1)
82+
83+
def forward(self, x: torch.Tensor):
84+
features = self._conv0(x)
85+
features = F.elu(self._dw_conv(features)) # TODO: ELU or ReLU?
86+
features = self._conv1(features)
87+
x = x + features
88+
89+
features = F.elu(self._conv2(x))
90+
features = self._conv3(features)
91+
return x + features
92+
93+
94+
class AttentionTail(nn.Module):
95+
def __init__(self, channels: int):
96+
super(AttentionTail, self).__init__()
97+
98+
self._conv0 = nn.Conv2d(channels, channels, kernel_size=7, padding=3)
99+
self._conv1 = nn.Conv2d(channels, channels, kernel_size=5, padding=2)
100+
self._conv2 = nn.Conv2d(channels, channels, kernel_size=3, padding=1)
101+
102+
def forward(self, x: torch.Tensor):
103+
attention = torch.relu(self._conv0(x))
104+
attention = torch.relu(self._conv1(attention))
105+
attention = torch.sigmoid(self._conv2(attention))
106+
return x * attention
107+
108+
109+
class LPIENet(nn.Module):
110+
def __init__(self, input_channels: int, output_channels: int, encoder_dims: List[int], decoder_dims: List[int]):
111+
super(LPIENet, self).__init__()
112+
113+
if len(encoder_dims) != len(decoder_dims) + 1 or len(decoder_dims) < 1:
114+
raise ValueError(f"Unexpected encoder and decoder dims: {encoder_dims}, {decoder_dims}.")
115+
116+
if input_channels != output_channels:
117+
raise NotImplementedError()
118+
119+
# TODO: We will need an explicit decoder head, consider Unshuffle & Shuffle
120+
121+
encoders = []
122+
for i, encoder_dim in enumerate(encoder_dims):
123+
input_dim = input_channels if i == 0 else encoder_dims[i - 1]
124+
encoders.append(
125+
nn.Sequential(
126+
nn.Conv2d(input_dim, encoder_dim, kernel_size=3, padding=1),
127+
BaseBlock(encoder_dim), # TODO: one or two base blocks?
128+
BaseBlock(encoder_dim),
129+
AttentionBlock(encoder_dim),
130+
)
131+
)
132+
self._encoders = nn.ModuleList(encoders)
133+
134+
decoders = []
135+
for i, decoder_dim in enumerate(decoder_dims):
136+
input_dim = encoder_dims[-1] if i == 0 else decoder_dims[i - 1] + encoder_dims[-i - 1]
137+
decoders.append(
138+
nn.Sequential(
139+
nn.Conv2d(input_dim, decoder_dim, kernel_size=3, padding=1),
140+
BaseBlock(decoder_dim),
141+
BaseBlock(decoder_dim),
142+
AttentionBlock(decoder_dim),
143+
)
144+
)
145+
self._decoders = nn.ModuleList(decoders)
146+
147+
self._inverse_bock = InverseBlock(encoder_dims[0] + decoder_dims[-1], output_channels)
148+
self._attention_tail = AttentionTail(output_channels)
149+
150+
def forward(self, x: torch.Tensor):
151+
if len(x.shape) != 4:
152+
raise ValueError(f"Expected [B, C, H, W] input, got {x.shape}.")
153+
global_residual = x
154+
155+
encoder_outputs = []
156+
for i, encoder in enumerate(self._encoders):
157+
x = encoder(x)
158+
if i != len(self._encoders) - 1:
159+
encoder_outputs.append(x)
160+
x = F.max_pool2d(x, kernel_size=2)
161+
162+
for i, decoder in enumerate(self._decoders):
163+
x = decoder(x)
164+
x = F.interpolate(x, scale_factor=2, mode="bilinear")
165+
x = torch.cat([x, encoder_outputs.pop()], dim=1)
166+
167+
x = self._inverse_bock(x)
168+
x = self._attention_tail(x)
169+
return x + global_residual
170+
171+
172+
model = LPIENet(3, 3, [4, 8, 16], [8, 4])
173+
x = torch.rand(1, 3, 16, 16)
174+
out = model(x)
175+
print(out.shape)

lpienet/lpienet-tflite.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

media/lpienet.png

286 KB
Loading

media/papers/bokeh-ntire23.png

398 KB
Loading

media/papers/isp-aaai22.png

263 KB
Loading

media/papers/lpienet-wacv23.png

239 KB
Loading

media/papers/reisp-aim22.png

123 KB
Loading

0 commit comments

Comments
 (0)