Blur operations running slow on CPU; [W101 ...] Could not initialize NNPACK!

The warning about NNPACK is not directly from Impact Pack but is caused by certain tensor operations on an old CPU which doesn't support AVX instructions. It indicates that something should be moved to GPU device because it is very slow on such a CPU. In this case, it is the gaussian blur.

I made a simple fix which moves the mask to the correct device for performing torchvision GaussianBlur, and also keeps images for related operations on GPU so that we don't get "Expected all tensors to be on the same device" errors.

Tested it only with a few nodes, some further changes might be necessary if there are cases in the code where a mask is blurred with the utils.tensor_gaussian_blur_mask() method and then used with an image without explicitly setting them to same device.

Several lines such as "image = image.cpu()" and "enhanced_image = enhanced_image.cpu()" in modules/impact/segs_nodes.py and modules/impact/impact_pack.py can probably be removed but they don't seem to have much effect either way.

modules/impact/utils.py
```
@@ -251,8 +251,8 @@
     mask = mask[:, :h, :w, :]
 
     # Get the region to be modified
-    region1 = image1[:, y:y+h, x:x+w, :]
-    region2 = image2[:, :h, :w, :]
+    region1 = image1[:, y:y+h, x:x+w, :].to(device=mask.device)
+    region2 = image2[:, :h, :w, :].to(device=mask.device)
 
     # Handle RGB and RGBA cases
     if c1 == 3 and c2 == 3:
@@ -470,7 +470,7 @@
 
     # apply gaussian blur
     mask = mask[:, None, ..., 0]
-    blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask)
+    blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask.to(device))
     blurred_mask = blurred_mask[:, 0, ..., None]
 
     blurred_mask.to(prev_device)
```

Running the Gaussian Blur Mask node with kernel_size 24 on my CPU without the patch:
```
got prompt                                                                                                                                                                                    
[W101 16:54:13.026243409 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware.                                                                                            
Prompt executed in 183.66 seconds
```

Same with the patch, now running on GPU:
```
got prompt
Prompt executed in 0.21 seconds
```

Tested also the Upscaler (SEGS) node just now with feather set to 20, it is also affected dramatically with processing time dropping from over 10 minutes to less than 90 seconds on my system. I realize most people are using better CPUs so it might not even be noticable for everyone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blur operations running slow on CPU; [W101 ...] Could not initialize NNPACK! #1171

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Blur operations running slow on CPU; [W101 ...] Could not initialize NNPACK! #1171

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions