Skip to content

Commit 109e412

Browse files
faiziktkmmikaitisFaizan A. Khattak
committed
Add v0.1 of the tensor core simulator
Co-authored-by: Mantas Mikaitis <[email protected]> Co-authored-by: Faizan A. Khattak <[email protected]>
0 parents  commit 109e412

File tree

196 files changed

+724674
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

196 files changed

+724674
-0
lines changed

LICENCE

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
BSD 2-Clause License
2+
3+
Copyright (c) 2025, Faizan A. Khattak and Mantas Mikaitis
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without
7+
modification, are permitted provided that the following conditions are met:
8+
9+
1. Redistributions of source code must retain the above copyright notice, this
10+
list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright notice,
13+
this list of conditions and the following disclaimer in the documentation
14+
and/or other materials provided with the distribution.
15+
16+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
17+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
MATLAB Tensor Core models
2+
--
3+
4+
[![Open in MATLAB Online](https://www.mathworks.com/images/responsive/global/open-in-matlab-online.svg)](https://matlab.mathworks.com/open/github/v1?repo=north-numerical-computing/MATLAB-tensor-core)
5+
6+
## Overview
7+
8+
This repository provides accurate tensor core models written in MATLAB. It also includes parts of the model validation data which is used to refine the models as shown in [1].
9+
10+
The [models](models/) directory contains the MATLAB models of tensor core in different GPUs, all of which are build on the parameterised model in [Generic_BFMA_TC.m](models/tools/Generic_BFMA_TC.m). For example the [B200TC.m](models/B200TC.m) models the General Matrix Multiply (GEMM) based on the accurate model of a tensor core in the NVIDIA Blackwell B200 GPUs. In the current version of the toolbox, the models take matrices and input and output floating-point formats as inputs and multiply the matrices by using a recursive summation algorithm to accummulate the results of several tensor core invocations.
11+
12+
The initial analysis of the behaviour of GPU tensor cores is performed with the code available at [IEEE_HPEC2025_block_FMA_tests](https://github.com/faiziktk/IEEE_HPEC2025_block_FMA_tests).
13+
It is based on the generalised testing methodology [2] which determines the following features of hardware computing mixed-precision inner products:
14+
15+
* Support for subnormal numbers
16+
* Presence of extra bits for significand alignment in multi-term addition
17+
* Availability of extra carry bits
18+
* Normalization patterns in multi-term floating-point addition
19+
* Supported rounding modes
20+
* Effective FMA size (i.e., number of terms accumulated before a single normalization)
21+
22+
The [model_validation](data/model_validation) contains part of the model validation data that was used in [1] to refine the models and verify the bit-accurate behaviour against the corresponding GPUs. Full-sized experiments and data is not stored in this repository but is available on request.
23+
24+
The [experiments](experiments/) directory contains various experiments with some of the [models](models/) that were performed to plot the results in [1]. These can serve as examples on how to utilise the models.
25+
26+
## Dependencies and installation
27+
28+
1. Set up the custom precision floating-point format simulator [CPFloat](https://github.com/north-numerical-computing/cpfloat).
29+
2. Add [models/](models/) to the MATLAB search path.
30+
3. Add [models/tools](models/tools) to the MATLAB search path.
31+
32+
## Example: Using in-built models
33+
34+
The following example rounds two matrices to fp16 and multiplies them using the model of the B200 tensor core.
35+
Note that B200TC compute the GEMM and alpha and beta scale factors are set to 1.
36+
37+
```
38+
>> inopts.format = 'binary16';
39+
>> outopts.format = 'binary32';
40+
>> A = cpfloat(rand(4,4), inopts);
41+
>> B = cpfloat(rand(4,4), inopts);
42+
>> B200TC(1, A, B, 1, 0, inopts.format, outopts.format)
43+
44+
ans =
45+
46+
0.995566666126251 1.208170533180237 1.368334889411926 1.017799258232117
47+
0.991239666938782 1.084852933883667 1.350871562957764 1.328557014465332
48+
1.190854787826538 1.693876862525940 1.763551592826843 1.278026223182678
49+
0.901759386062622 1.838499188423157 1.608222723007202 1.265371918678284
50+
```
51+
52+
The following example uses an 8-bit floating-point format as an input format in the B200 tensor core model.
53+
54+
```
55+
>> inopts.format = 'fp8-e4m3';
56+
>> A = cpfloat(rand(4,4), inopts);
57+
>> B = cpfloat(rand(4,4), inopts);
58+
>> B200TC(1, A, B, 1, 0, inopts.format, outopts.format)
59+
60+
ans =
61+
62+
0.390136718750000 0.589843750000000 0.625976562500000 0.748046875000000
63+
1.180175781250000 1.117187500000000 1.220703125000000 1.935546875000000
64+
1.267822265625000 0.752929687500000 0.867187500000000 1.813476562500000
65+
1.007812500000000 1.242187500000000 1.395996093750000 1.740234375000000
66+
```
67+
68+
## Example: Setting up the NVIDIA B200 model
69+
70+
While the B200 tensor core model comes with this toolbox, below is a minimal example for setting it up. The input matrices are assumed to be rounded to the appropriate formats with CPFloat. The model in [B200.m](models/B200TC.m) provides a more detailed set up that changes the parameters of a generalised model based on all possible input/output format combinations.
71+
72+
```
73+
% Default structures assuming fp16 in and fp32 output
74+
def_params.fma = 32; % Fused multiply-add (FMA) size
75+
def_params.neab = 2; % TC extra alignment bits
76+
def_params.frmode = 'rz'; % TC final rounding mode
77+
def_params.inter_pattern=1; % Interleave two 16-element vectors
78+
79+
D = GEMM(alpha, A, B, beta, C, informat, outformat, def_params);
80+
```
81+
82+
## References
83+
84+
[1] F. A. Khattak and M. Mikaitis, [Accurate Models of NVIDIA Tensor Cores](https://). In Preparation. 2025.<br>
85+
[2] F. A. Khattak and M. Mikaitis, [Generalized Methodology for Determining Numerical Features of Hardware Floating-Point Matrix Multipliers: Part I](https://ieeexplore.ieee.org/abstract/document/11196657). 2025 IEEE High Performance Extreme Computing Conference (HPEC). Sep. 2025.<br>
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 1.469606e-03 7.813096e-03
3+
18 3.555061e-03 7.813573e-03
4+
33 1.424942e-03 7.814467e-03
5+
61 1.121651e-03 7.816136e-03
6+
112 9.231988e-04 7.819176e-03
7+
206 1.191711e-03 7.824779e-03
8+
379 1.168148e-03 7.835090e-03
9+
695 6.131915e-04 7.853925e-03
10+
1274 3.561456e-04 7.888436e-03
11+
2335 2.296391e-04 7.951677e-03
12+
4281 1.666999e-04 8.067667e-03
13+
7847 1.293470e-04 8.280218e-03
14+
14384 9.468106e-05 8.669853e-03
15+
26366 6.854227e-05 9.384036e-03
16+
48329 5.275334e-05 1.069313e-02
17+
88586 3.668062e-05 1.309264e-02
18+
162377 2.624632e-05 1.749092e-02
19+
297635 1.776545e-05 2.555293e-02
20+
545559 1.392744e-05 4.033035e-02
21+
1000000 9.058678e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 1.469606e-03 7.813096e-03
3+
18 3.555061e-03 7.813573e-03
4+
33 1.424950e-03 7.814467e-03
5+
61 1.121651e-03 7.816136e-03
6+
112 9.231993e-04 7.819176e-03
7+
206 1.191709e-03 7.824779e-03
8+
379 1.168169e-03 7.835090e-03
9+
695 6.132106e-04 7.853925e-03
10+
1274 3.561488e-04 7.888436e-03
11+
2335 2.296500e-04 7.951677e-03
12+
4281 1.667126e-04 8.067667e-03
13+
7847 1.293651e-04 8.280218e-03
14+
14384 9.470974e-05 8.669853e-03
15+
26366 6.857924e-05 9.384036e-03
16+
48329 5.279643e-05 1.069313e-02
17+
88586 3.674114e-05 1.309264e-02
18+
162377 2.633955e-05 1.749092e-02
19+
297635 1.788045e-05 2.555293e-02
20+
545559 1.413737e-05 4.033035e-02
21+
1000000 9.216378e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 1.469606e-03 7.813096e-03
3+
18 3.555024e-03 7.813573e-03
4+
33 1.424938e-03 7.814467e-03
5+
61 1.121640e-03 7.816136e-03
6+
112 9.231870e-04 7.819176e-03
7+
206 1.191694e-03 7.824779e-03
8+
379 1.168091e-03 7.835090e-03
9+
695 6.131415e-04 7.853925e-03
10+
1274 3.561058e-04 7.888436e-03
11+
2335 2.295923e-04 7.951677e-03
12+
4281 1.666514e-04 8.067667e-03
13+
7847 1.292860e-04 8.280218e-03
14+
14384 9.456275e-05 8.669853e-03
15+
26366 6.840958e-05 9.384036e-03
16+
48329 5.259720e-05 1.069313e-02
17+
88586 3.649750e-05 1.309264e-02
18+
162377 2.592669e-05 1.749092e-02
19+
297635 1.740007e-05 2.555293e-02
20+
545559 1.328081e-05 4.033035e-02
21+
1000000 8.572748e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 2.685298e-06 7.813096e-03
3+
18 8.495590e-06 7.813573e-03
4+
33 1.068589e-05 7.814467e-03
5+
61 7.675930e-06 7.816136e-03
6+
112 8.412317e-06 7.819176e-03
7+
206 5.275834e-06 7.824779e-03
8+
379 2.984756e-06 7.835090e-03
9+
695 1.967346e-06 7.853925e-03
10+
1274 1.971237e-06 7.888436e-03
11+
2335 1.736162e-06 7.951677e-03
12+
4281 1.371019e-06 8.067667e-03
13+
7847 9.019864e-07 8.280218e-03
14+
14384 8.764391e-07 8.669853e-03
15+
26366 7.598034e-07 9.384036e-03
16+
48329 6.884745e-07 1.069313e-02
17+
88586 7.522646e-07 1.309264e-02
18+
162377 9.631385e-07 1.749092e-02
19+
297635 1.267747e-06 2.555293e-02
20+
545559 1.504347e-06 4.033035e-02
21+
1000000 1.947012e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 2.685482e-06 7.813096e-03
3+
18 8.492632e-06 7.813573e-03
4+
33 1.069706e-05 7.814467e-03
5+
61 7.691104e-06 7.816136e-03
6+
112 8.408059e-06 7.819176e-03
7+
206 5.276636e-06 7.824779e-03
8+
379 3.023472e-06 7.835090e-03
9+
695 1.994175e-06 7.853925e-03
10+
1274 2.013317e-06 7.888436e-03
11+
2335 1.827506e-06 7.951677e-03
12+
4281 1.452267e-06 8.067667e-03
13+
7847 1.005616e-06 8.280218e-03
14+
14384 1.002506e-06 8.669853e-03
15+
26366 9.279706e-07 9.384036e-03
16+
48329 9.119277e-07 1.069313e-02
17+
88586 1.036446e-06 1.309264e-02
18+
162377 1.332787e-06 1.749092e-02
19+
297635 1.804460e-06 2.555293e-02
20+
545559 2.181737e-06 4.033035e-02
21+
1000000 2.845524e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 2.673686e-06 7.813096e-03
3+
18 8.480798e-06 7.813573e-03
4+
33 1.066218e-05 7.814467e-03
5+
61 7.670310e-06 7.816136e-03
6+
112 8.333973e-06 7.819176e-03
7+
206 5.221300e-06 7.824779e-03
8+
379 2.906700e-06 7.835090e-03
9+
695 1.881637e-06 7.853925e-03
10+
1274 1.867536e-06 7.888436e-03
11+
2335 1.508535e-06 7.951677e-03
12+
4281 1.147718e-06 8.067667e-03
13+
7847 5.990264e-07 8.280218e-03
14+
14384 5.282287e-07 8.669853e-03
15+
26366 2.847603e-07 9.384036e-03
16+
48329 2.111327e-07 1.069313e-02
17+
88586 1.533100e-07 1.309264e-02
18+
162377 1.292368e-07 1.749092e-02
19+
297635 8.309870e-08 2.555293e-02
20+
545559 5.861596e-08 4.033035e-02
21+
1000000 4.209459e-08 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 3.647780e-08 7.813096e-03
3+
18 2.625300e-07 7.813573e-03
4+
33 2.226674e-07 7.814467e-03
5+
61 1.735915e-07 7.816136e-03
6+
112 5.084085e-08 7.819176e-03
7+
206 7.563091e-08 7.824779e-03
8+
379 1.595666e-07 7.835090e-03
9+
695 2.383044e-07 7.853925e-03
10+
1274 1.907880e-07 7.888436e-03
11+
2335 2.908922e-07 7.951677e-03
12+
4281 3.424805e-07 8.067667e-03
13+
7847 3.892480e-07 8.280218e-03
14+
14384 3.839149e-07 8.669853e-03
15+
26366 5.982430e-07 9.384036e-03
16+
48329 6.873602e-07 1.069313e-02
17+
88586 8.462328e-07 1.309264e-02
18+
162377 9.346184e-07 1.749092e-02
19+
297635 1.070291e-06 2.555293e-02
20+
545559 1.763669e-06 4.033035e-02
21+
1000000 1.813173e-06 6.741714e-02
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
n error bound
2+
10 3.647788e-08 7.813096e-03
3+
18 2.623484e-07 7.813573e-03
4+
33 2.238087e-07 7.814467e-03
5+
61 1.529410e-07 7.816136e-03
6+
112 5.782372e-08 7.819176e-03
7+
206 9.019348e-08 7.824779e-03
8+
379 2.007581e-07 7.835090e-03
9+
695 2.954656e-07 7.853925e-03
10+
1274 2.567460e-07 7.888436e-03
11+
2335 3.846343e-07 7.951677e-03
12+
4281 4.912073e-07 8.067667e-03
13+
7847 5.603519e-07 8.280218e-03
14+
14384 5.478050e-07 8.669853e-03
15+
26366 8.616604e-07 9.384036e-03
16+
48329 1.023959e-06 1.069313e-02
17+
88586 1.227796e-06 1.309264e-02
18+
162377 1.400346e-06 1.749092e-02
19+
297635 1.619451e-06 2.555293e-02
20+
545559 2.667678e-06 4.033035e-02
21+
1000000 2.742058e-06 6.741714e-02

0 commit comments

Comments
 (0)