Skip to content

Bgemm for arm64#5287

Closed
taoye9 wants to merge 9 commits intoOpenMathLib:developfrom
taoye9:bgemm_for_arm64
Closed

Bgemm for arm64#5287
taoye9 wants to merge 9 commits intoOpenMathLib:developfrom
taoye9:bgemm_for_arm64

Conversation

@taoye9
Copy link
Contributor

@taoye9 taoye9 commented May 28, 2025

No description provided.

@taoye9 taoye9 marked this pull request as draft May 28, 2025 13:27
#define SIZE 8
#define BASE_SHIFT 3
#define ZBASE_SHIFT 4
#elif defined(BFLOAT16_ONLY)
Copy link
Contributor

@annop-w annop-w May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need to introduce BFLOAT16_ONLY build flag.
The type of FLOAT is the only difference. Can we simplyl use XFLOAT instead of FLOAT for C matrix, for example, in kernel/arm64/bgemm_beta.c ?

@martin-frbg
Copy link
Collaborator

Thanks. Failures on CirrusCI are from running out of compute credits (as we're close to the end of the month), I'll rectify that in a minute. Not sure how DGEMM got hit by your PR on at least RISCV and LOONGARCH64 platforms, haven't looked closely yet but possibly some unintentional shifting of GEMM_UNROLL parameters ? (Also not sure why this touches SBGEMM settings ?)

#define STOP_RPCC(COUNTER)
#endif

#if defined(HALF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove BUILD_BFLOAT16_ONLY flag, this is not needed I suppose.

#define STOP_RPCC(COUNTER)
#endif

#if defined(HALF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. See comment in level3.c.

#elif defined(BFLOAT16)
#define ERROR_NAME "SBGEMM "
#define GEMV BLASFUNC(sbgemv)
#elif defined(BFLOAT16_ONLY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different naming ? Perhaps change to SBFLOAT16 for SBGEMM and BFLOAT16 for BGEMM. What do you think ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to split it? BUILD_BFLOAT16 could cover both cases and enable all bfloat16 ops?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mousius there's a slight difference between the global BUILD_BFLOAT16 and this BFLOAT16 (that IIRC is local to the interface and gets added by the (C)Makefile ). Maybe we could have a "BGEMM" define that gets set by the Makefile specifically when the function is bgemm and that is processed inside the "elif defined(BFLOAT16)" ? I haven't looked into the assumed requirement for a BUILD_BFLOAT16_ONLY in the level3 driver yet, but my gut feeling is that a local variable in interface/Makefile (and eventually CMakeLists.txt) to modify gemm.c build behaviour for bgemm.o should be sufficient.

$(CC) $(CFLAGS) -c -DBFLOAT16 -UDOUBLE -UCOMPLEX $< -o $@

ifneq ($(SBGEMM_UNROLL_M), $(SBGEMM_UNROLL_N))
#ifneq ($(SBGEMM_UNROLL_M), $(SBGEMM_UNROLL_N))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes introduced a bug that raises an error when the kernels are reused. this line need to be uncomment when the fix is fixed.


int CNAME(BLASLONG m, BLASLONG n, BLASLONG dummy1, FLOAT beta, IFLOAT *dummy2,
BLASLONG dummy3, IFLOAT *dummy4, BLASLONG dummy5, FLOAT *c,
BLASLONG ldc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change FLOAT to IFLOAT or XFLOAT ?

Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Jul 3, 2025
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in OpenMathLib#5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Jul 3, 2025
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in OpenMathLib#5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Jul 8, 2025
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in OpenMathLib#5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
@martin-frbg
Copy link
Collaborator

Closing as superseded by #5357 ; thank you very much to all involved !

@martin-frbg martin-frbg closed this Jul 9, 2025
Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Jul 10, 2025
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of OpenMathLib#5357 and derived from OpenMathLib#5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Jul 10, 2025
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of OpenMathLib#5357 and derived from OpenMathLib#5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
Mousius added a commit to Mousius/OpenBLAS that referenced this pull request Oct 6, 2025
This fixes an issue originally introduced with the BGEMM kernel when I was tweaking it. OpenMathLib#5287 didn't suffer from this bug.

I've updated the tests to run with `beta=1.0` so as to test loading and
updating from C.

Alongside this, the tests now return sensible return values to reduce
the risk of them being ignored.

Co-authored-by: Ye Tao <ye.tao@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants