|
87 | 87 | <div class="textblock"><h1>v3.7.2 </h1> |
88 | 88 | <h2>Improvements </h2> |
89 | 89 | <ul> |
90 | | -<li>Cache CUDA kernels to disk to improve load times(Thanks to @cschreib-ibex) /PR{2848}</li> |
91 | | -<li>Staticly link against cuda libraries /PR{2785}</li> |
92 | | -<li>Make cuDNN an optional build dependency /PR{2836}</li> |
93 | | -<li>Improve support for different compilers and OS /PR{2876} /PR{2945} /PR{2925} /PR{2942} /PR{2943} /PR{2945} /PR{2958}</li> |
94 | | -<li>Improve performance of join and transpose on CPU /PR{2849}</li> |
95 | | -<li>Improve documentation /PR{2816} /PR{2821} /PR{2846} /PR{2918} /PR{2928} /PR{2947}</li> |
96 | | -<li>Reduce binary size using NVRTC and template reducing instantiations /PR{2849} /PR{2861} /PR{2890} /PR{2957}</li> |
97 | | -<li>reduceByKey performance improvements /PR{2851} /PR{2957}</li> |
98 | | -<li>Improve support for Intel OpenCL GPUs /PR{2855}</li> |
99 | | -<li>Allow staticly linking against MKL /PR{2877} (Sponsered by SDL)</li> |
100 | | -<li>Better support for older CUDA toolkits /PR{2923}</li> |
101 | | -<li>Add support for CUDA 11 /PR{2939}</li> |
102 | | -<li>Add support for cuDNN 8 /PR{2963}</li> |
103 | | -<li>Add support for ccache for faster builds /PR{2931}</li> |
104 | | -<li>Add support for the conan package manager on linux /PR{2875}</li> |
105 | | -<li>Propagate build errors up the stack in AFError exceptions /PR{2948} /PR{2957}</li> |
106 | | -<li>Improve runtime dependency library loading /PR{2954}</li> |
107 | | -<li>Improved cuDNN runtime checks and warnings /PR{2960}</li> |
108 | | -<li>Document af_memory_manager_* native memory return values /PR{2911}</li> |
| 90 | +<li>Cache CUDA kernels to disk to improve load times(Thanks to @cschreib-ibex) [<a href="https://github.com/arrayfire/arrayfire/pull/2848">#2848</a>]</li> |
| 91 | +<li>Staticly link against cuda libraries [<a href="https://github.com/arrayfire/arrayfire/pull/2785">#2785</a>]</li> |
| 92 | +<li>Make cuDNN an optional build dependency [<a href="https://github.com/arrayfire/arrayfire/pull/2836">#2836</a>]</li> |
| 93 | +<li>Improve support for different compilers and OS [<a href="https://github.com/arrayfire/arrayfire/pull/2876">#2876</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2945">#2945</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2925">#2925</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2942">#2942</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2943">#2943</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2945">#2945</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2958">#2958</a>]</li> |
| 94 | +<li>Improve performance of join and transpose on CPU [<a href="https://github.com/arrayfire/arrayfire/pull/2849">#2849</a>]</li> |
| 95 | +<li>Improve documentation [<a href="https://github.com/arrayfire/arrayfire/pull/2816">#2816</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2821">#2821</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2846">#2846</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2918">#2918</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2928">#2928</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2947">#2947</a>]</li> |
| 96 | +<li>Reduce binary size using NVRTC and template reducing instantiations [<a href="https://github.com/arrayfire/arrayfire/pull/2849">#2849</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2861">#2861</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2890">#2890</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2957">#2957</a>]</li> |
| 97 | +<li>reduceByKey performance improvements [<a href="https://github.com/arrayfire/arrayfire/pull/2851">#2851</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2957">#2957</a>]</li> |
| 98 | +<li>Improve support for Intel OpenCL GPUs [<a href="https://github.com/arrayfire/arrayfire/pull/2855">#2855</a>]</li> |
| 99 | +<li>Allow staticly linking against MKL [<a href="https://github.com/arrayfire/arrayfire/pull/2877">#2877</a>] (Sponsered by SDL)</li> |
| 100 | +<li>Better support for older CUDA toolkits [<a href="https://github.com/arrayfire/arrayfire/pull/2923">#2923</a>]</li> |
| 101 | +<li>Add support for CUDA 11 [<a href="https://github.com/arrayfire/arrayfire/pull/2939">#2939</a>]</li> |
| 102 | +<li>Add support for cuDNN 8 [<a href="https://github.com/arrayfire/arrayfire/pull/2963">#2963</a>]</li> |
| 103 | +<li>Add support for ccache for faster builds [<a href="https://github.com/arrayfire/arrayfire/pull/2931">#2931</a>]</li> |
| 104 | +<li>Add support for the conan package manager on linux [<a href="https://github.com/arrayfire/arrayfire/pull/2875">#2875</a>]</li> |
| 105 | +<li>Propagate build errors up the stack in AFError exceptions [<a href="https://github.com/arrayfire/arrayfire/pull/2948">#2948</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2957">#2957</a>]</li> |
| 106 | +<li>Improve runtime dependency library loading [<a href="https://github.com/arrayfire/arrayfire/pull/2954">#2954</a>]</li> |
| 107 | +<li>Improved cuDNN runtime checks and warnings [<a href="https://github.com/arrayfire/arrayfire/pull/2960">#2960</a>]</li> |
| 108 | +<li>Document af_memory_manager_* native memory return values [<a href="https://github.com/arrayfire/arrayfire/pull/2911">#2911</a>]</li> |
109 | 109 | </ul> |
110 | 110 | <h2>Fixes </h2> |
111 | 111 | <ul> |
112 | | -<li>Bug crash when allocating large arrays /PR{2827}</li> |
113 | | -<li>Fix various compiler warnings /PR{2827} /PR{2849} /PR{2872} /PR{2876}</li> |
114 | | -<li>Fix minor leaks in OpenCL functions /PR{2913}</li> |
115 | | -<li>Various continuous integration related fixes /PR{2819}</li> |
116 | | -<li>Fix zero padding with convolv2NN /PR{2820}</li> |
117 | | -<li>Fix af_get_memory_pressure_threshold return value /PR{2831}</li> |
| 112 | +<li>Bug crash when allocating large arrays [<a href="https://github.com/arrayfire/arrayfire/pull/2827">#2827</a>]</li> |
| 113 | +<li>Fix various compiler warnings [<a href="https://github.com/arrayfire/arrayfire/pull/2827">#2827</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2849">#2849</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2872">#2872</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2876">#2876</a>]</li> |
| 114 | +<li>Fix minor leaks in OpenCL functions [<a href="https://github.com/arrayfire/arrayfire/pull/2913">#2913</a>]</li> |
| 115 | +<li>Various continuous integration related fixes [<a href="https://github.com/arrayfire/arrayfire/pull/2819">#2819</a>]</li> |
| 116 | +<li>Fix zero padding with convolv2NN [<a href="https://github.com/arrayfire/arrayfire/pull/2820">#2820</a>]</li> |
| 117 | +<li>Fix af_get_memory_pressure_threshold return value [<a href="https://github.com/arrayfire/arrayfire/pull/2831">#2831</a>]</li> |
118 | 118 | <li>Increased the max filter length for morph</li> |
119 | | -<li>Handle empty array inputs for LU, QR, and Rank functions /PR{2838}</li> |
120 | | -<li>Fix FindMKL.cmake script for sequential threading library /PR{2840} /PR{2952}</li> |
121 | | -<li>Various internal refactoring /PR{2839} /PR{2861} /PR{2864} /PR{2873} /PR{2890} /PR{2891} /PR{2913} /PR{2959}</li> |
122 | | -<li>Fix OpenCL 2.0 builtin function name conflict /PR{2851}</li> |
123 | | -<li>Fix error caused when releasing memory with multiple devices /PR{2867}</li> |
124 | | -<li>Fix missing set stacktrace symbol from unified API /PR{2915}</li> |
125 | | -<li>Fix zero padding issue in convolve2NN /PR{2820}</li> |
126 | | -<li>Fixed bugs in ReduceByKey /PR{2957}</li> |
127 | | -<li>Add clblast patch to handle custom context with multiple devices /PR{2967}</li> |
| 119 | +<li>Handle empty array inputs for LU, QR, and Rank functions [<a href="https://github.com/arrayfire/arrayfire/pull/2838">#2838</a>]</li> |
| 120 | +<li>Fix FindMKL.cmake script for sequential threading library [<a href="https://github.com/arrayfire/arrayfire/pull/2840">#2840</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2952">#2952</a>]</li> |
| 121 | +<li>Various internal refactoring [<a href="https://github.com/arrayfire/arrayfire/pull/2839">#2839</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2861">#2861</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2864">#2864</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2873">#2873</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2890">#2890</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2891">#2891</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2913">#2913</a>] [<a href="https://github.com/arrayfire/arrayfire/pull/2959">#2959</a>]</li> |
| 122 | +<li>Fix OpenCL 2.0 builtin function name conflict [<a href="https://github.com/arrayfire/arrayfire/pull/2851">#2851</a>]</li> |
| 123 | +<li>Fix error caused when releasing memory with multiple devices [<a href="https://github.com/arrayfire/arrayfire/pull/2867">#2867</a>]</li> |
| 124 | +<li>Fix missing set stacktrace symbol from unified API [<a href="https://github.com/arrayfire/arrayfire/pull/2915">#2915</a>]</li> |
| 125 | +<li>Fix zero padding issue in convolve2NN [<a href="https://github.com/arrayfire/arrayfire/pull/2820">#2820</a>]</li> |
| 126 | +<li>Fixed bugs in ReduceByKey [<a href="https://github.com/arrayfire/arrayfire/pull/2957">#2957</a>]</li> |
| 127 | +<li>Add clblast patch to handle custom context with multiple devices [<a href="https://github.com/arrayfire/arrayfire/pull/2967">#2967</a>]</li> |
128 | 128 | </ul> |
129 | 129 | <h2>Contributions </h2> |
130 | 130 | <p>Special thanks to our contributors: <a href="https://github.com/cschreib-ibex">Corentin Schreiber</a> <a href="https://github.com/jacobkahn">Jacob Kahn</a> <a href="https://github.com/pauljurczak">Paul Jurczak</a> <a href="https://github.com/junghans">Christoph Junghans</a></p> |
|
0 commit comments