Skip to content

Blur operations running slow on CPU; [W101 ...] Could not initialize NNPACK! #1171

@hum-ma

Description

@hum-ma

The warning about NNPACK is not directly from Impact Pack but is caused by certain tensor operations on an old CPU which doesn't support AVX instructions. It indicates that something should be moved to GPU device because it is very slow on such a CPU. In this case, it is the gaussian blur.

I made a simple fix which moves the mask to the correct device for performing torchvision GaussianBlur, and also keeps images for related operations on GPU so that we don't get "Expected all tensors to be on the same device" errors.

Tested it only with a few nodes, some further changes might be necessary if there are cases in the code where a mask is blurred with the utils.tensor_gaussian_blur_mask() method and then used with an image without explicitly setting them to same device.

Several lines such as "image = image.cpu()" and "enhanced_image = enhanced_image.cpu()" in modules/impact/segs_nodes.py and modules/impact/impact_pack.py can probably be removed but they don't seem to have much effect either way.

modules/impact/utils.py

@@ -251,8 +251,8 @@
     mask = mask[:, :h, :w, :]
 
     # Get the region to be modified
-    region1 = image1[:, y:y+h, x:x+w, :]
-    region2 = image2[:, :h, :w, :]
+    region1 = image1[:, y:y+h, x:x+w, :].to(device=mask.device)
+    region2 = image2[:, :h, :w, :].to(device=mask.device)
 
     # Handle RGB and RGBA cases
     if c1 == 3 and c2 == 3:
@@ -470,7 +470,7 @@
 
     # apply gaussian blur
     mask = mask[:, None, ..., 0]
-    blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask)
+    blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask.to(device))
     blurred_mask = blurred_mask[:, 0, ..., None]
 
     blurred_mask.to(prev_device)

Running the Gaussian Blur Mask node with kernel_size 24 on my CPU without the patch:

got prompt                                                                                                                                                                                    
[W101 16:54:13.026243409 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware.                                                                                            
Prompt executed in 183.66 seconds

Same with the patch, now running on GPU:

got prompt
Prompt executed in 0.21 seconds

Tested also the Upscaler (SEGS) node just now with feather set to 20, it is also affected dramatically with processing time dropping from over 10 minutes to less than 90 seconds on my system. I realize most people are using better CPUs so it might not even be noticable for everyone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions