Consider implementing this thing https://github.com/Triang3l/S3TConv Could be really useful for adreno gpu's.