Add Sifive Xsfvqmacc extension intrinsics by yulong18 · Pull Request #5 · sifive/riscv-gcc

yulong18 · 2024-09-29T06:58:10Z

No description provided.

yulong18 · 2024-09-29T07:06:19Z

Hi, @kito-cheng and @monkchiang :
I create a new PR. I create a new temporary branch(sifive-rvv-intrinsic-temp)，but I will delete it later.
I put the test case in the comment.
`#include "riscv_vector.h"

vint32m1_t test1(vint32m1_t vd, vint8m1_t vs1, vint8mf2_t vs2, size_t vl) {
return __riscv_sf_vqmacc_4x8x4_i32m1(vd, vs1, vs2, vl);
}`

kito-cheng · 2024-09-30T05:30:14Z

gcc/common/config/riscv/riscv-common.cc

  {"xsfcease", &gcc_options::x_riscv_sifive_subext, MASK_XSFCEASE},
+  {"xsfvqmaccqoq", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCQOQ},
+  {"xsfvqmaccdod", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCDOD},
+


Drop this blank line

kito-cheng · 2024-09-30T05:30:24Z

gcc/config/riscv/riscv-c.cc

    builtin_define_with_int_value ("__riscv_th_v_intrinsic",
 				   riscv_ext_version_value (0, 11));

+


Drop this blank line

kito-cheng · 2024-09-30T05:31:16Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

 SHAPE(crypto_vv, crypto_vv)
 SHAPE(crypto_vi, crypto_vi)
 SHAPE(crypto_vv_no_op_type, crypto_vv_no_op_type)
+SHAPE(sf_vqmacc,sf_vqmacc)


Suggested change

SHAPE(sf_vqmacc,sf_vqmacc)

SHAPE(sf_vqmacc, sf_vqmacc)

kito-cheng

Could you add testcase?

Also you could copy /contrib/clang-format to /.clang-format, then you can use clang-format or git clang-format to format your code

kito-cheng · 2024-10-25T05:12:11Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+      return e.use_exact_insn (
+	  code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ()));


How to distinguish between x and xu here?

kito-cheng · 2024-10-25T05:13:09Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+      gcc_unreachable ();
+    }
+  }


Suggested change

gcc_unreachable ();

}

}

}

gcc_unreachable ();

}

gcc_unreachable in wrong level I think?

kito-cheng · 2024-10-25T05:13:32Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+static CONSTEXPR const vfnrclip x_obj;
+static CONSTEXPR const vfnrclip xu_obj;


sf_vfnrclip_x_obj;
sf_vfnrclip_xu_obj;

kito-cheng · 2024-10-25T05:14:57Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    if (e.op_info->op == OP_TYPE_4x8x4)
+      return e.use_widen_ternop_insn (
+    code_for_pred_quad_mul_plusus_qoq (e.vector_mode ()));


kito-cheng · 2024-10-25T05:15:16Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+static CONSTEXPR const vqmacc vqmacc_obj;
+static CONSTEXPR const vqmaccu vqmaccu_obj;
+static CONSTEXPR const vqmaccsu vqmaccsu_obj;
+static CONSTEXPR const vqmaccsu vqmaccus_obj;


add sf_ prefix

kito-cheng · 2024-10-25T05:15:57Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+  /* vop_v --> vop_v_<type>.  */
+  b.append_name (type_suffixes[instance.type.index].vector);


kito-cheng · 2024-10-25T05:16:17Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+    if (overloaded_p && (instance.pred == PRED_TYPE_tu || instance.pred == PRED_TYPE_mu ||
+    instance.pred == PRED_TYPE_tumu))


This patch folds svindex with constant arguments into a vector series. We implemented this in svindex_impl::fold using the function build_vec_series. For example, svuint64_t f1 () { return svindex_u642 (10, 3); } compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...} in the gimple pass lower. This optimization benefits cases where svindex is used in combination with other gimple-level optimizations. For example, svuint64_t f2 () { return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5); } has previously been compiled to f2: index z0.d, riscvarchive#10, #3 mul z0.d, z0.d, #5 ret Now, it is compiled to f2: mov x0, 50 index z0.d, x0, riscvarchive#15 ret We added test cases checking - the application of the transform during gimple for constant arguments, - the interaction with another gimple-level optimization. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svindex_impl::fold): Add constant folding. gcc/testsuite/ * gcc.target/aarch64/sve/index_const_fold.c: New test.

We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation away from loops and hot parts. This should be safe to do for 64-bit vectors as well even though the XAR instructions operate on 128-bit values, as the bottom 64-bit results is later accessed through the right subregs. This strategy is used whenever we have XAR instructions, the logic in aarch64_emit_opt_vec_rotate is adjusted to resort to expand_rotate_as_vec_perm only when it's expected to generate a single REV* instruction or when XAR instructions are not present. With this patch we can gerate for the input: v4si G1 (v4si r) { return (r >> 23) | (r << 9); } v8qi G2 (v8qi r) { return (r << 3) | (r >> 5); } the assembly for +sve2: G1: movi v31.4s, 0 xar z0.s, z0.s, z31.s, riscvarchive#23 ret G2: movi v31.4s, 0 xar z0.b, z0.b, z31.b, #5 ret instead of the current: G1: shl v31.4s, v0.4s, 9 usra v31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usra v31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add generation of XAR sequences when possible. gcc/testsuite/ * gcc.target/aarch64/rotate_xar_1.c: New test.

kito-cheng · 2024-11-04T10:58:51Z

gcc/config/riscv/riscv-vector-builtins.h

+/* Return true if intrinsics maybe require qfrm operand.  */
+  virtual bool may_require_qfrm_p () const;
+


What's qfrm ?

kito-cheng · 2024-11-04T10:59:01Z

gcc/config/riscv/riscv-vector-builtins.h

+/* We choose to return false by default since most of the intrinsics does
+   not need qfrm operand.  */
+inline bool
+function_base::may_require_qfrm_p () const
+{
+  return false;
+}
+


kito-cheng · 2024-11-05T14:02:22Z

gcc/testsuite/gcc.target/riscv/rvv/sf_vfnrclip.c

+#include "riscv_vector.h"
+
+vint8mf8_t test1(float vs1, vfloat32mf2_t vs2, size_t vl) {
+        return  __riscv_sf_vfnrclip_x_f_qf_i8mf8(vs2, vs1, vl);
+}
+


Could you reference this file https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/riscv/target-attr-01.c and add correct test directive?
e.g.

/* { dg-do compile } */ /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */ /* { dg-options "-march=rv64gcv_xsfvfnrclipxfqf -O2 -mabi=lp64d" } */ /* { dg-final { check-function-bodies "**" "" } } */

and add some check within the testcase like:

/* ** foo: ** ... ** vsetvli\s*x0, a0, e32, m1, ta, ta ** sf.vfnrclip.x.f.qf\s*fa0,v8 ** ... */

And don't forgot sf_vqmacc.c

kito-cheng · 2024-11-05T14:03:36Z

gcc/testsuite/gcc.target/riscv/rvv/sf_vfnrclip.c

+vint8m2_t test2(float vs1, vfloat32m8_t vs2, size_t vl) {
+        return  __riscv_sf_vfnrclip_x_f_qf_i8m2(vs2, vs1, vl);
+}


Also testcase for __riscv_sf_vfnrclip_xu_f_qf_*, not really need all combination but please add few to improve the coverage

kito-cheng · 2024-11-05T14:04:56Z

gcc/config/riscv/riscv.md

 ;; vidiv       vector single-width integer divide instructions
 ;; viwmul      vector widening integer multiply instructions
 ;; vimuladd    vector single-width integer multiply-add instructions
+;; vsfmuladd   vector matrix integer multiply-add instructions


Minor comment, please put this in the end of the comment, don't mixed with standard instruction

kito-cheng · 2024-11-05T14:05:01Z

gcc/config/riscv/riscv.md

 ;; vsmul       vector single-width fractional multiply with rounding and saturation instructions
 ;; vsshift     vector single-width scaling shift instructions
 ;; vnclip      vector narrowing fixed-point clip instructions
+;; vsfclip     vector fp32 to int8 ranged clip instructions


yulong18 · 2024-11-21T07:09:19Z

Hi, @kito-cheng :
There are two new commits that add the support for vqmacc. I tested it and all testcases passed. Please review it. Thanks!
Today, I also will add the code that support vfnrclipxfqf and vcix extensions for this PR.

kito-cheng · 2024-11-21T10:17:45Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+    /* According to SIFIVE vector-intrinsic-doc, it adds  suffixes
+       for vop_m C++ overloaded API.*/


the comment seems not updated?

kito-cheng · 2024-11-21T10:18:25Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    return e.use_exact_insn (
+      code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
+    gcc_unreachable ();


Suggested change

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

gcc_unreachable ();

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

kito-cheng · 2024-11-21T10:18:51Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    return e.use_exact_insn (
+      code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
+    gcc_unreachable ();


Suggested change

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

gcc_unreachable ();

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

kito-cheng · 2024-11-21T10:19:48Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

 };

+/* Implements vqmacc.  */
+class vqmacc : public function_base


Suggested change

class vqmacc : public function_base

class sf_vqmacc : public function_base

kito-cheng · 2024-11-21T10:20:01Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+};
+
+/* Implements vqmaccu.  */
+class vqmaccu : public function_base


Suggested change

class vqmaccu : public function_base

class sf_vqmaccu : public function_base

kito-cheng · 2024-11-21T10:20:35Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+};
+
+/* Implements vqmaccus.  */
+class vqmaccus : public function_base


Suggested change

class vqmaccus : public function_base

class sf_vqmaccus : public function_base

kito-cheng · 2024-11-21T10:22:12Z

gcc/config/riscv/riscv-vector-builtins-bases.h

+extern const function_base *const sf_vqmacc;
+extern const function_base *const sf_vqmaccu;
+extern const function_base *const sf_vqmaccsu;
+extern const function_base *const sf_vqmaccus;


Could you create a new file sifive-vector-builtins-bases.h to hold sifive intrinsic

kito-cheng · 2024-11-21T10:22:57Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+/* Implements vfnrclip.  */
+template <int UNSPEC, enum frm_op_type FRM_OP = NO_FRM>
+class vfnrclip_x_f_qf : public function_base


create sifive-vector-builtins-bases.cc to hold those intrinsic

kito-cheng · 2024-11-21T10:23:54Z

gcc/config/riscv/riscv.md

 ;; vsmul       vector single-width fractional multiply with rounding and saturation instructions
 ;; vsshift     vector single-width scaling shift instructions
 ;; vnclip      vector narrowing fixed-point clip instructions
+;; vfnrclip     vector fp32 to int8 ranged clip instructions


They are not standard instruction, so plz create a new section in the comment rather than mixed in the std section, also plz prefixed with sf_

kito-cheng · 2024-11-21T10:24:01Z

gcc/config/riscv/riscv.md

 ;; vfncvtbf16  vector narrowing single floating-point to brain floating-point instruction
 ;; vfwcvtbf16  vector widening brain floating-point to single floating-point instruction
 ;; vfwmaccbf16  vector BF16 widening multiply-accumulate
+;; vqmacc      vector matrix integer multiply-add instructions


yulong18 · 2024-11-22T09:40:12Z

Hi, @kito-cheng and @pz9115 :
We add the new commits for vqmacc.
Please review it. If there are no comment, We will send patch to upstream.

yulong18 · 2024-11-26T01:00:03Z

Hi, @kito-cheng :
We have rebased the code. Please help review the vqmacc code. Thanks.
We want send the vqmacc patchset to upstream in few days.

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n#riscvarchive#8) A(n#riscvarchive#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n#riscvarchive#8) B(n#riscvarchive#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n#riscvarchive#8) C(n#riscvarchive#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n#riscvarchive#8) D(n#riscvarchive#9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)

kito-cheng reviewed Oct 1, 2024

View reviewed changes

kito-cheng reviewed Oct 25, 2024

View reviewed changes

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 22a7f1e to 584fce9 Compare November 4, 2024 10:56

kito-cheng reviewed Nov 4, 2024

View reviewed changes

kito-cheng reviewed Nov 5, 2024

View reviewed changes

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 584fce9 to da24b1c Compare November 21, 2024 07:00

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from da24b1c to 16f13a0 Compare November 21, 2024 07:23

pz9115 requested a review from kito-cheng November 21, 2024 08:43

kito-cheng reviewed Nov 21, 2024

View reviewed changes

yulong18 added 2 commits November 22, 2024 16:46

RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

f053bad

RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod extensions.

1f32394

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 2c9c076 to 1f32394 Compare November 22, 2024 09:35

yulong18 added 2 commits November 22, 2024 18:29

RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.

e837693

RISC-V: Add intrinsics testcases for SiFive Xsfvfnrclipxfqf extensions.

124e727

		builtin_define_with_int_value ("__riscv_th_v_intrinsic",
		riscv_ext_version_value (0, 11));

		return e.use_exact_insn (
		code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ()));

		static CONSTEXPR const vfnrclip x_obj;
		static CONSTEXPR const vfnrclip xu_obj;

		/* vop_v --> vop_v_<type>. */
		b.append_name (type_suffixes[instance.type.index].vector);

		if (overloaded_p && (instance.pred == PRED_TYPE_tu \|\| instance.pred == PRED_TYPE_mu \|\|
		instance.pred == PRED_TYPE_tumu))

		/* Return true if intrinsics maybe require qfrm operand. */
		virtual bool may_require_qfrm_p () const;

		/* According to SIFIVE vector-intrinsic-doc, it adds suffixes
		for vop_m C++ overloaded API.*/

	class vqmacc : public function_base
	class sf_vqmacc : public function_base

	class vqmaccu : public function_base
	class sf_vqmaccu : public function_base

	class vqmaccus : public function_base
	class sf_vqmaccus : public function_base

Conversation

yulong18 commented Sep 29, 2024

Uh oh!

yulong18 commented Sep 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kito-cheng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yulong18 commented Nov 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yulong18 commented Nov 22, 2024

Uh oh!

yulong18 commented Nov 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants