Skip to content

Commit da3d704

Browse files
committed
Merge branch 'develop'
2 parents f773f49 + 1127f5a commit da3d704

File tree

1,446 files changed

+29507
-21407
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,446 files changed

+29507
-21407
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ lapack-netlib/TESTING/testing_results.txt
2121
lib.grd
2222
nohup.out
2323
config.h
24+
config_kernel.h
2425
Makefile.conf
2526
Makefile.conf_last
27+
Makefile_kernel.conf
2628
config_last.h
2729
getarch
2830
getarch_2nd
@@ -41,6 +43,8 @@ ctest/xzcblat2
4143
ctest/xzcblat3
4244
exports/linktest.c
4345
exports/linux.def
46+
kernel/setparam_*.c
47+
kernel/kernel_*.h
4448
test/CBLAT2.SUMM
4549
test/CBLAT3.SUMM
4650
test/DBLAT2.SUMM

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ env:
1111

1212
before_install:
1313
- sudo apt-get update -qq
14-
- sudo apt-get install -qq gfortran
14+
- sudo apt-get install -qq gfortran
1515
- if [[ "$TARGET_BOX" == "WIN64" ]]; then sudo apt-get install -qq binutils-mingw-w64-x86-64 gcc-mingw-w64-x86-64 gfortran-mingw-w64-x86-64; fi
1616
- if [[ "$TARGET_BOX" == "LINUX32" ]]; then sudo apt-get install -qq gcc-multilib gfortran-multilib; fi
1717

CONTRIBUTORS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
* Improve the windows build.
3232

3333
* Chen Shaohu <[email protected]>
34-
* Optimize GEMV on the Loongson 3A processor.
34+
* Optimize GEMV on the Loongson 3A processor.
3535

3636
* Luo Wen
3737
* Intern. Test Level-2 BLAS.
@@ -53,11 +53,11 @@ In chronological order:
5353
* [2012-05-19] Fix building bug on FreeBSD and NetBSD.
5454

5555
* Sylvestre Ledru <https://github.com/sylvestre>
56-
* [2012-07-01] Improve the detection of sparc. Fix building bug under
56+
* [2012-07-01] Improve the detection of sparc. Fix building bug under
5757
Hurd and kfreebsd.
5858

5959
* Jameson Nash <https://github.com/vtjnash>
60-
* [2012-08-20] Provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to
60+
* [2012-08-20] Provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to
6161
make on the command line.
6262

6363
* Alexander Nasonov <[email protected]>
@@ -80,7 +80,7 @@ In chronological order:
8080
* [2013-06-30] Add Intel Haswell support (using sandybridge optimizations).
8181

8282
* grisuthedragon <https://github.com/grisuthedragon>
83-
* [2013-07-11] create openblas_get_parallel to retrieve information which parallelization
83+
* [2013-07-11] create openblas_get_parallel to retrieve information which parallelization
8484
model is used by OpenBLAS.
8585

8686
* Elliot Saba <[email protected]>

Changelog.txt

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -55,25 +55,25 @@ Version 0.2.7
5555
common:
5656
* Support LSB (Linux Standard Base) 4.1.
5757
e.g. make CC=lsbcc
58-
* Include LAPACK 3.4.2 source codes to the repo.
58+
* Include LAPACK 3.4.2 source codes to the repo.
5959
Avoid downloading at compile time.
6060
* Add NO_PARALLEL_MAKE flag to disable parallel make.
61-
* Create openblas_get_parallel to retrieve information which
61+
* Create openblas_get_parallel to retrieve information which
6262
parallelization model is used by OpenBLAS. (Thank grisuthedragon)
6363
* Detect LLVM/Clang compiler. The default compiler is Clang on Mac OS X.
6464
* Change LIBSUFFIX from .lib to .a on windows.
6565
* A work-around for dtrti_U single thread bug. Replace it with LAPACK codes. (#191)
6666

6767
x86/x86-64:
68-
* Optimize c/zgemm, trsm, dgemv_n, ddot, daxpy, dcopy on
68+
* Optimize c/zgemm, trsm, dgemv_n, ddot, daxpy, dcopy on
6969
AMD Bulldozer. (Thank Werner Saar)
7070
* Add Intel Haswell support (using Sandybridge optimizations).
7171
(Thank Dan Luu)
7272
* Add AMD Piledriver support (using Bulldozer optimizations).
73-
* Fix the computational error in zgemm avx kernel on
73+
* Fix the computational error in zgemm avx kernel on
7474
Sandybridge. (#237)
7575
* Fix the overflow bug in gemv.
76-
* Fix the overflow bug in multi-threaded BLAS3, getrf when NUM_THREADS
76+
* Fix the overflow bug in multi-threaded BLAS3, getrf when NUM_THREADS
7777
is very large.(#214, #221, #246).
7878
MIPS64:
7979
* Support loongcc (Open64 based) compiler for ICT Loongson 3A/B.
@@ -110,7 +110,7 @@ common:
110110
* Fixed NetBSD build. (#155)
111111
* Fixed compilation with TARGET=GENERIC. (#160)
112112
x86/x86-64:
113-
* Restore the original CPU affinity when calling
113+
* Restore the original CPU affinity when calling
114114
openblas_set_num_threads(1) (#153)
115115
* Fixed a SEGFAULT bug in dgemv_t when m is very large.(#154)
116116
MIPS64:
@@ -120,13 +120,13 @@ Version 0.2.4
120120
8-Oct-2012
121121
common:
122122
* Upgraded LAPACK to 3.4.2 version. (#145)
123-
* Provided support for passing CFLAGS, FFLAGS, PFLAGS,
123+
* Provided support for passing CFLAGS, FFLAGS, PFLAGS,
124124
FPFLAGS to make. (#137)
125-
* f77blas.h:compatibility for compilers without C99 complex
125+
* f77blas.h:compatibility for compilers without C99 complex
126126
number support. (#141)
127127
x86/x86-64:
128128
* Added NO_AVX flag. Check OS supporting AVX on runtime. (#139)
129-
* Fixed zdot incompatibility ABI issue with GCC 4.7 on
129+
* Fixed zdot incompatibility ABI issue with GCC 4.7 on
130130
Windows 32-bit. (#140)
131131
MIPS64:
132132
* Fixed the generation of shared library bug.
@@ -136,14 +136,14 @@ Version 0.2.3
136136
20-Aug-2012
137137
common:
138138
* Fixed LAPACK unstable bug about ?laswp. (#130)
139-
* Fixed the shared library bug about unloading the library on
139+
* Fixed the shared library bug about unloading the library on
140140
Linux (#132).
141141
* Fixed the compilation failure on BlueGene/P (TARGET=PPC440FP2)
142142
Please use gcc and IBM xlf. (#134)
143143
x86/x86-64:
144-
* Supported goto_set_num_threads and openblas_set_num_threads
144+
* Supported goto_set_num_threads and openblas_set_num_threads
145145
APIs in Windows. They can set the number of threads on runtime.
146-
146+
147147
====================================================================
148148
Version 0.2.2
149149
6-July-2012
@@ -191,14 +191,14 @@ x86/x86_64:
191191
* Auto-detect Intel Sandy Bridge Core i7-3xxx & Xeon E7 Westmere-EX.
192192
* Test alpha=Nan in dscale.
193193
* Fixed a SEGFAULT bug in samax on x86 windows.
194-
194+
195195
====================================================================
196196
Version 0.1.0
197197
23-Mar-2012
198198
common:
199199
* Set soname of shared library on Linux.
200-
* Added LIBNAMESUFFIX flag in Makefile.rule. The user can use
201-
this flag to control the library name, e.g. libopenblas.a,
200+
* Added LIBNAMESUFFIX flag in Makefile.rule. The user can use
201+
this flag to control the library name, e.g. libopenblas.a,
202202
libopenblas_ifort.a or libopenblas_omp.a.
203203
* Added GEMM_MULTITHREAD_THRESHOLD flag in Makefile.rule.
204204
The lib use single thread in GEMM function with small matrices.
@@ -229,7 +229,7 @@ x86/x86_64:
229229
Version 0.1 alpha2.4
230230
18-Sep-2011
231231
common:
232-
* Fixed a bug about installation. The header file "fblas77.h"
232+
* Fixed a bug about installation. The header file "fblas77.h"
233233
works fine now.
234234
* Fixed #61 a building bug about setting TARGET and DYNAMIC_ARCH.
235235
* Try to handle absolute path of shared library in OSX. (#57)
@@ -238,49 +238,49 @@ common:
238238
$(PREFIX)/lib
239239

240240
x86/x86_64:
241-
* Fixed #58 zdot/xdot SEGFAULT bug with GCC-4.6 on x86. According
242-
to i386 calling convention, The callee should remove the first
243-
hidden parameter.Thank Mr. John for this patch.
241+
* Fixed #58 zdot/xdot SEGFAULT bug with GCC-4.6 on x86. According
242+
to i386 calling convention, The callee should remove the first
243+
hidden parameter.Thank Mr. John for this patch.
244244

245245
====================================================================
246246
Version 0.1 alpha2.3
247247
5-Sep-2011
248248

249249
x86/x86_64:
250-
* Added DTB_ENTRIES into dynamic arch setting parameters. Now,
250+
* Added DTB_ENTRIES into dynamic arch setting parameters. Now,
251251
it can read DTB_ENTRIES on runtime. (Refs issue #55 on github)
252252

253253
====================================================================
254254
Version 0.1 alpha2.2
255255
14-Jul-2011
256256

257257
common:
258-
* Fixed a building bug when DYNAMIC_ARCH=1 & INTERFACE64=1.
258+
* Fixed a building bug when DYNAMIC_ARCH=1 & INTERFACE64=1.
259259
(Refs issue #44 on github)
260260

261261
====================================================================
262262
Version 0.1 alpha2.1
263263
28-Jun-2011
264264

265265
common:
266-
* Stop the build and output the error message when detecting
266+
* Stop the build and output the error message when detecting
267267
fortran compiler failed. (Refs issue #42 on github)
268268

269269
====================================================================
270270
Version 0.1 alpha2
271271
23-Jun-2011
272272

273273
common:
274-
* Fixed blasint undefined bug in <cblas.h> file. Other software
274+
* Fixed blasint undefined bug in <cblas.h> file. Other software
275275
could include this header successfully(Refs issue #13 on github)
276-
* Fixed the SEGFAULT bug on 64 cores. On SMP server, the number
277-
of CPUs or cores should be less than or equal to 64.(Refs issue #14
276+
* Fixed the SEGFAULT bug on 64 cores. On SMP server, the number
277+
of CPUs or cores should be less than or equal to 64.(Refs issue #14
278278
on github)
279279
* Support "void goto_set_num_threads(int num_threads)" and "void
280280
openblas_set_num_threads(int num_threads)" when USE_OPENMP=1
281-
* Added extern "C" to support C++. Thank Tasio for the patch(Refs
281+
* Added extern "C" to support C++. Thank Tasio for the patch(Refs
282282
issue #21 on github)
283-
* Provided an error message when the arch is not supported.(Refs
283+
* Provided an error message when the arch is not supported.(Refs
284284
issue #19 on github)
285285
* Fixed issue #23. Fixed a bug of f_check script about generating link flags.
286286
* Added openblas_set_num_threads for Fortran.
@@ -298,7 +298,7 @@ x86/x86_64:
298298
* Work-around #27 the low performance axpy issue with small imput size & multithreads.
299299

300300
MIPS64:
301-
* Fixed #28 a wrong result of dsdot on Loongson3A/MIPS64.
301+
* Fixed #28 a wrong result of dsdot on Loongson3A/MIPS64.
302302
* Optimized single/double precision BLAS Level3 on Loongson3A/MIPS64. (Refs #2)
303303
* Optimized single/double precision axpy function on Loongson3A/MIPS64. (Refs #3)
304304

@@ -307,9 +307,9 @@ Version 0.1 alpha1
307307
20-Mar-2011
308308

309309
common:
310-
* Support "make NO_LAPACK=1" to build the library without
310+
* Support "make NO_LAPACK=1" to build the library without
311311
LAPACK functions.
312-
* Fixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34.
312+
* Fixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34.
313313
Thank Mr.Ei-ji Nakama providing this patch. (Refs issue #12 on github)
314314
* Added DEBUG=1 rule in Makefile.rule to build debug version.
315315
* Disable compiling quad precision in reference BLAS library(netlib BLAS).
@@ -318,15 +318,15 @@ common:
318318
* Imported GotoBLAS2 1.13 BSD version
319319

320320
x86/x86_64:
321-
* On x86 32bits, fixed a bug in zdot_sse2.S line 191. This would casue
321+
* On x86 32bits, fixed a bug in zdot_sse2.S line 191. This would casue
322322
zdotu & zdotc failures. Instead, work-around it. (Refs issue #8 #9 on github)
323-
* Modified ?axpy functions to return same netlib BLAS results
323+
* Modified ?axpy functions to return same netlib BLAS results
324324
when incx==0 or incy==0 (Refs issue #7 on github)
325-
* Modified ?swap functions to return same netlib BLAS results
325+
* Modified ?swap functions to return same netlib BLAS results
326326
when incx==0 or incy==0 (Refs issue #6 on github)
327-
* Modified ?rot functions to return same netlib BLAS results
327+
* Modified ?rot functions to return same netlib BLAS results
328328
when incx==0 or incy==0 (Refs issue #4 on github)
329-
* Detect Intel Westmere,Intel Clarkdale and Intel Arrandale
329+
* Detect Intel Westmere,Intel Clarkdale and Intel Arrandale
330330
to use Nehalem codes.
331331
* Fixed a typo bug about compiling dynamic ARCH library.
332332
MIPS64:

GotoBLAS_01Readme.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@
8383
4. Suported precision
8484

8585
Now x86/x86_64 version support 80bit FP precision in addition to
86-
normal double presicion and single precision. Currently only
86+
normal double presicion and single precision. Currently only
8787
gfortran supports 80bit FP with "REAL*10".
8888

8989

GotoBLAS_02QuickInstall.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@
3232

3333
GotoBLAS2 build complete.
3434

35-
OS ... Linux
36-
Architecture ... x86_64
37-
BINARY ... 64bit
35+
OS ... Linux
36+
Architecture ... x86_64
37+
BINARY ... 64bit
3838
C compiler ... GCC (command line : gcc)
3939
Fortran compiler ... PATHSCALE (command line : pathf90)
4040
Library Name ... libgoto_barcelonap-r1.27.a (Multi threaded; Max

GotoBLAS_03FAQ.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656

5757
1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it?
5858

59-
A Please understand that OpenMP is a compromised method to use
59+
A Please understand that OpenMP is a compromised method to use
6060
thread. If you want to use OpenMP based code with GotoBLAS2, you
6161
should enable "USE_OPENMP=1" in Makefile.rule.
6262

GotoBLAS_05LargePage.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
F) Other aarchitecture which doesn't have Large TLB enhancement
4444

4545
If you have root permission, please install device driver which
46-
located in drivers/mapper.
46+
located in drivers/mapper.
4747

4848
$shell> cd drivers/mapper
4949
$shell> make

GotoBLAS_06WeirdPerformance.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
probably you created too many threads or process. Basically GotoBLAS
55
assumes that available cores that you specify are exclusively for
66
BLAS computation. Even one small thread/process conflicts with BLAS
7-
threads, performance will become worse.
7+
threads, performance will become worse.
88

99
The best solution is to reduce your number of threads or insert
1010
some synchronization mechanism and suspend your threads until BLAS
@@ -19,4 +19,4 @@
1919

2020

2121
Anyway, if you see any weird performance loss, it means your code or
22-
algorithm is not optimal.
22+
algorithm is not optimal.

LICENSE

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,17 @@ met:
1212
notice, this list of conditions and the following disclaimer in
1313
the documentation and/or other materials provided with the
1414
distribution.
15-
3. Neither the name of the ISCAS nor the names of its contributors may
16-
be used to endorse or promote products derived from this software
15+
3. Neither the name of the ISCAS nor the names of its contributors may
16+
be used to endorse or promote products derived from this software
1717
without specific prior written permission.
1818

19-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20-
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21-
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22-
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
23-
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24-
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25-
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26-
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27-
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
19+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22+
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
23+
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
2828
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

0 commit comments

Comments
 (0)