-
Notifications
You must be signed in to change notification settings - Fork 343
Description
The warning about NNPACK is not directly from Impact Pack but is caused by certain tensor operations on an old CPU which doesn't support AVX instructions. It indicates that something should be moved to GPU device because it is very slow on such a CPU. In this case, it is the gaussian blur.
I made a simple fix which moves the mask to the correct device for performing torchvision GaussianBlur, and also keeps images for related operations on GPU so that we don't get "Expected all tensors to be on the same device" errors.
Tested it only with a few nodes, some further changes might be necessary if there are cases in the code where a mask is blurred with the utils.tensor_gaussian_blur_mask() method and then used with an image without explicitly setting them to same device.
Several lines such as "image = image.cpu()" and "enhanced_image = enhanced_image.cpu()" in modules/impact/segs_nodes.py and modules/impact/impact_pack.py can probably be removed but they don't seem to have much effect either way.
modules/impact/utils.py
@@ -251,8 +251,8 @@
mask = mask[:, :h, :w, :]
# Get the region to be modified
- region1 = image1[:, y:y+h, x:x+w, :]
- region2 = image2[:, :h, :w, :]
+ region1 = image1[:, y:y+h, x:x+w, :].to(device=mask.device)
+ region2 = image2[:, :h, :w, :].to(device=mask.device)
# Handle RGB and RGBA cases
if c1 == 3 and c2 == 3:
@@ -470,7 +470,7 @@
# apply gaussian blur
mask = mask[:, None, ..., 0]
- blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask)
+ blurred_mask = torchvision.transforms.GaussianBlur(kernel_size=kernel_size, sigma=sigma)(mask.to(device))
blurred_mask = blurred_mask[:, 0, ..., None]
blurred_mask.to(prev_device)
Running the Gaussian Blur Mask node with kernel_size 24 on my CPU without the patch:
got prompt
[W101 16:54:13.026243409 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware.
Prompt executed in 183.66 seconds
Same with the patch, now running on GPU:
got prompt
Prompt executed in 0.21 seconds
Tested also the Upscaler (SEGS) node just now with feather set to 20, it is also affected dramatically with processing time dropping from over 10 minutes to less than 90 seconds on my system. I realize most people are using better CPUs so it might not even be noticable for everyone.