Thanks for your amazing work. I want to apply your method to a standard convolution instead of a depthwise one. Does your cuda code support standard convolutions?