Skip to content

Commit 1845700

Browse files
committed
Update readme
1 parent 109e412 commit 1845700

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ MATLAB Tensor Core models
77

88
This repository provides accurate tensor core models written in MATLAB. It also includes parts of the model validation data which is used to refine the models as shown in [1].
99

10-
The [models](models/) directory contains the MATLAB models of tensor core in different GPUs, all of which are build on the parameterised model in [Generic_BFMA_TC.m](models/tools/Generic_BFMA_TC.m). For example the [B200TC.m](models/B200TC.m) models the General Matrix Multiply (GEMM) based on the accurate model of a tensor core in the NVIDIA Blackwell B200 GPUs. In the current version of the toolbox, the models take matrices and input and output floating-point formats as inputs and multiply the matrices by using a recursive summation algorithm to accummulate the results of several tensor core invocations.
10+
The [models](models/) directory contains the MATLAB models of tensor cores of several NVIDIA GPUs, all of which are build on the parameterised model in [Generic_BFMA_TC.m](models/tools/Generic_BFMA_TC.m). For example the [B200TC.m](models/B200TC.m) models the General Matrix Multiply (GEMM) based on the accurate model of a tensor core in the NVIDIA Blackwell B200 GPUs. In the current version of the toolbox, the models take matrices and input and output floating-point formats as inputs and multiply the matrices by using a recursive summation algorithm to accummulate the results of several tensor core invocations.
1111

1212
The initial analysis of the behaviour of GPU tensor cores is performed with the code available at [IEEE_HPEC2025_block_FMA_tests](https://github.com/faiziktk/IEEE_HPEC2025_block_FMA_tests).
1313
It is based on the generalised testing methodology [2] which determines the following features of hardware computing mixed-precision inner products:
@@ -32,7 +32,7 @@ The [experiments](experiments/) directory contains various experiments with some
3232
## Example: Using in-built models
3333

3434
The following example rounds two matrices to fp16 and multiplies them using the model of the B200 tensor core.
35-
Note that B200TC compute the GEMM and alpha and beta scale factors are set to 1.
35+
Note that B200TC computes the GEMM, with alpha and beta scale factors set to 1.
3636

3737
```
3838
>> inopts.format = 'binary16';
@@ -49,7 +49,7 @@ ans =
4949
0.901759386062622 1.838499188423157 1.608222723007202 1.265371918678284
5050
```
5151

52-
The following example uses an 8-bit floating-point format as an input format in the B200 tensor core model.
52+
The following example uses an 8-bit floating-point format as the input format in the B200 tensor core model.
5353

5454
```
5555
>> inopts.format = 'fp8-e4m3';
@@ -71,7 +71,7 @@ While the B200 tensor core model comes with this toolbox, below is a minimal exa
7171

7272
```
7373
% Default structures assuming fp16 in and fp32 output
74-
def_params.fma = 32; % Fused multiply-add (FMA) size
74+
def_params.fma = 16; % Fused multiply-add (FMA) size
7575
def_params.neab = 2; % TC extra alignment bits
7676
def_params.frmode = 'rz'; % TC final rounding mode
7777
def_params.inter_pattern=1; % Interleave two 16-element vectors

0 commit comments

Comments
 (0)