-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Your library is pretty cool, but looks like it was not updated for a long period of time.
At the same time, the version of libdnn in your Caffe fork seems to be more maintained and even got some new features, like BLAS routines generators, etc.
Could you provide some insight about your plans regarding the standalone libdnn or libdnn in general?
Specifically, it would be nice if you could answer some of the following questions:
-
Do you plan to update the standalone libdnn, e.g. from the version in your Caffe fork?
-
What is the status of the BLAS support in the Caffe's version of libdnn? How does it compare to something like CLBlas, CLBlast or CUDA counterparts of those?
-
Could you provide a brief description of the algorithms you use when producing optimized fused convolution (and other) kernels and how/why they are better/faster than e.g. im2col-based approaches or other well-known implementations of convolutions either in terms of performance or memory consumption? The documentation is pretty sparse currently. If it is based on any specific papers or well-known approaches, it would be nice if you could provide references.
-
How is libdnn In terms of the convolutions performance compared to the current versions of cuDNN and other well-known implementations. You reported it was very fast, often faster than competitors in the past. Is it still the case, or may be there were some recent achievements that made other implementations faster?
-
Do you plan to add any new interesting features or improvements? If so, could you describe your them?
Thanks!