BLAST (BLAS Templates) is a high-performance linear algebra library that combines a BLAS-like interface with modern C++ template metaprogramming. BLAST implementation is single-threaded and intended for matrices of small and medium size (a few hundred rows/columns), which is common for embedded control applications.
The figures below shows the performance of BLAS dgemm routine for different LA implementations on an
Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz:

BLAST uses C++20 standrd, so you need to use a compiler that supports it.
The code performance highly depends on the compiler. We recommend Clang.
- CMake 3.24 or higher.
- Boost libraries
sudo apt install libboost-exception-dev. - Blaze 3.9 or higher https://bitbucket.org/blaze-lib/blaze.
- BLASFEO https://github.com/giaf/blasfeo (optional, only if
BLAST_WITH_BLASFEOis selected). Select a proper target architecture by setting theTARGETvariable inMakefile.ruleor inCMake. Build and install as usual. The build system searches for BLASFEO in/opt/blasfeoby default. - Google Test https://github.com/google/googletest must be installed and findable by the CMake build system (optional, only if
BLAST_WITH_TESTis selected).
If you want to run benchmarks, you will also need:
- Google Benchmark https://github.com/google/benchmark must be installed and findable by the CMake build system (optional, only if
BLAST_WITH_BENCHMARKis selected). - Eigen3 3.3.7 or higher (optional, if you want to benchmark it).
- For each installed BLAS library such as MKL, OpenBLAS, etc. a separate benchmark executable will be built, if it is found by CMake
FindBLAS. - Python3
sudo apt install python3 - Matplotlib
sudo apt install python3-matplotlib
BLAST is a header-only library, so you don't need to build it. You can however build the tests:
-
Install the dependencies.
-
Assuming that you are in the
blastsource root, domkdir build && cd build
-
Run CMake
cmake -DBLAST_WITH_TEST=ON ..- Build
make -j 10- Run tests
ctestTODO: add examples
Either use
cmake -C cmake/InitialCache.cmake ..
that has all the right variables set up or specify BLAST_WITH_BENCHMARK=ON in CMake configure steps if you want to build benchmarks. The following CMake variables must be switched ON to enable specific benchmarks:
BLAST_BUILD_BLAST_BENCHMARKBLAST_BUILD_LIBXSMM_BENCHMARKBLAST_BUILD_BLAS_BENCHMARKBLAST_BUILD_BLAZE_BENCHMARKBLAST_BUILD_EIGEN_BENCHMARKBLAS_BUILD_BLASFEO_BENCHMARK
For the BLASFEO benchmark, you also need to set BLAST_WITH_BLASFEO=ON.
Benchmarks will be built in build/bin/bench-*. The set of benchmarks depends on the options and the installed libraries. It might look like the following:
$ls -1 build/bin/bench*
build/bin/bench-blas-Intel10_64_dyn
build/bin/bench-blas-Intel10_64lp_seq
build/bin/bench-blas-OpenBLAS
build/bin/bench-blasfeo
build/bin/bench-blast
build/bin/bench-blaze
build/bin/bench-blazefeo
build/bin/bench-eigen
build/bin/bench-libxsmm
Here bench-blast is the benchmark for BLAST itself, and the others are for other libraries. You can run all BLAST benchmarks by simply typing
build/bin/bench-blastYou can select benchmarks for specific functions using --benchmark_filter=<regex>.
There are a few targets in the root Makefile that run benchmarks and record results in JSON files. The following command runs and records dgemm benchmarks:
make dgemm-benchmarksThere are also a few make targets for performance plots:
make bench_result/image/dgemm_performance.png bench_result/image/dgemm_performance_ratio.pngYou are welcome to contribute by organizing the existing benchmark, writing more benchmarks, and writing scripts to visualize the results.
So far, we haven't been able to make the benchmarks give always the same results on the same processor. However, what we tried so far is :
- isolating CPU 11 by adding 'nohz_full=5,11 isolcpus=domain,managed_irq,5,11 irqaffinity=0-4,6-10' to the boot parameters (see https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/)
- using taskset -c 11 to then run the benchmark only on that core
- use performance governor (see https://google.github.io/benchmark/reducing_variance.html)
- turn off Intel Boost (see https://llvm.org/docs/Benchmarking.html)
- use benchmark repetitions, random interleaving, warmup time (these are options of Google benchmark)
To automate building and running benchmarks, a Dockerfile is provided. Use the following commands to build a Docker image and run benchmarks in a Docker container:
cd blast
docker build . --tag blast_bench .
docker run -v `pwd`/bench_result/docker:/root/blast/bench_result blast_benchThe benchmark results will be put in /bench_result/docker.