This project analyzes and benchmarks multiple implementations of the softmax algorithm.
- A recent C++17 compiler (GCC/Clang)
- CMake >= 3.15 and Ninja
- Python 3 (for data generation script)
- Optional: OpenMP for parallel CPU variants
- Optional: oneDNN (DNNL) for the oneDNN implementation
include/— public headers likesoftmax.hsrc/cpu_sequential/— sequential CPU implementationssrc/cpu_parallel/— OpenMP-based CPU implementations (built if OpenMP is found)benchmarks/— Google Benchmark driver and registrationstests/— GoogleTest unit testscommon/— shared utilitiesscripts/— helpers (e.g.,generate_data.py)data/— generated input vectors for benchmarks
-
Configure & Build: This will download dependencies and compile the code using CMake/Ninja.
make build
-
Run Benchmarks: This command generates test data for various sizes and runs all C++ benchmarks.
make benchmark
-
Clean Up:
make clean
make configure— run CMake configuration (called automatically bymake build)make build— build all targetsmake benchmarkormake bench— generate data and run benchmarksmake test— build and run unit tests viactestmake clean— remove the build directory and generated data files
Run the unit tests (GoogleTest):
make testThis builds the test binary and runs it via ctest.
Run benchmarks (both commands are equivalent):
make benchmark
# or
make bench- Data files are generated automatically under
data/for the preset sizes. - To control OpenMP threads during benchmarking, you can set
SOFTMAX_OMP_THREADSorOMP_NUM_THREADS.SOFTMAX_OMP_THREADS=8 make bench
- The benchmark sizes are defined in
Makefileand registered inbenchmarks/main_bench.cpp.
Follow these steps to add a new implementation and integrate it into builds, benchmarks, and tests.
- Declare the function
- Add the declaration to
include/softmax.h(choose the appropriate section, e.g., CPU, OpenMP, or oneDNN/GPU in the future):void softmax_my_awesome_impl(std::vector<float>& vec);
- Implement the function
- Create the implementation file in the appropriate directory:
- Sequential CPU:
src/cpu_sequential/ - OpenMP parallel CPU:
src/cpu_parallel/
- Sequential CPU:
- Example path:
src/cpu_sequential/softmax_my_awesome_impl.cpp
- Register the file in the build
- Edit
CMakeLists.txt:- If it is a sequential CPU implementation, add the
.cppto theSOFTMAX_SOURCESlist. - If it is an OpenMP variant, add the
.cppto thesoftmax_omptarget sources (inside theif (OpenMP_CXX_FOUND)block).
- If it is a sequential CPU implementation, add the
- Add a benchmark entry
- Open
benchmarks/main_bench.cppand:- Define a benchmark body similar to existing ones using
BENCHMARK_DEFINE_F(SoftmaxBench, Name)and call your function inside it. - Register it with the helper macro so it runs at the preset sizes:
REGISTER_SOFTMAX_BENCHMARK(My_Awesome_Impl); - Tip: Pick a concise, descriptive benchmark name to replace
My_Awesome_Impl.
- Define a benchmark body similar to existing ones using
- Add the implementation to tests (recommended)
- Open
tests/softmax_tests.cppand add your implementation to theimplslist so it is compared against the reference naive CPU version:{"softmax_my_awesome_impl", softmax_my_awesome_impl, 1e-5f, 1e-5f}, - Adjust tolerances if needed for numerical differences.
- Build and run
make build
make test
make benchNotes
- If oneDNN is installed and detected, an additional oneDNN-based implementation and benchmarks will be built automatically.
- If OpenMP is available, OpenMP-based variants will also be enabled.