You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,14 +55,14 @@ This organized structure enables you to browse through SIMD instruction sets acr
55
55
- XOR
56
56
57
57
#### Advanced search functionality
58
-
With its robust search engine, SIMD.info allows you to either search for a specific intrinsic, for example `vaddq_f64`, or enter more general terms, for example "How to add 2 vectors", and it returns a list of the corresponding intrinsics.
58
+
With its robust search engine, SIMD.info allows you to either search for a specific intrinsic, for example `vaddq_f64`, or enter more general terms, for example "How to add 2 vectors," and it returns a list of the corresponding intrinsics.
59
59
60
60
You can also filter results based on the specific engine you're working with, such as NEON, SSE4.2, AVX, AVX512, or VSX. This functionality streamlines the process of finding the right commands tailored to your needs.
61
61
62
62
#### Comparison tools
63
63
This feature lets you directly compare SIMD instructions from different, or the same, platforms side by side, offering a clear view of the similarities and differences. It’s a helpful tool for porting code across architectures, as it ensures accuracy and efficiency.
64
64
65
65
#### Discussion forum
66
-
The integrated discussion forum, powered by **[discuss](https://disqus.com/)**, allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support. It fosters collaboration and knowledge-sharing among SIMD developers. Imagine something like **[StackOverflow](https://stackoverflow.com/)** but specific to SIMD intrinsics.
66
+
The integrated discussion forum, powered by **[Disqus](https://disqus.com/)**, allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support. It fosters collaboration and knowledge-sharing among SIMD developers. Imagine something like **[StackOverflow](https://stackoverflow.com/)** but specific to SIMD intrinsics.
67
67
68
68
Now let's look at these features in the context of a real example.
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,29 +7,29 @@ layout: learningpathall
7
7
---
8
8
9
9
### Using SIMD.info to find NEON Equivalents
10
-
Now that you have a clear view of the example, you can start the process of porting the code to Arm **Neon/ASIMD**.
10
+
Now that you have a clear view of the example, you can start the process of porting the code to Arm Neon/ASIMD.
11
11
12
12
This is where [SIMD.info](https://simd.info/) comes in.
13
13
14
14
In SIMD programming, the primary focus is the integrity and accuracy of the calculations. Ensuring that these calculations are done correctly is crucial. Performance is almost always a secondary concern.
15
15
16
-
For the operations in your **SSE4.2** example, you have the following intrinsics:
16
+
For the operations in your SSE4.2 example, you have the following intrinsics:
17
17
18
18
-**`_mm_cmpgt_ps`**
19
19
-**`_mm_add_ps`**
20
20
-**`_mm_mul_ps`**
21
21
-**`_mm_sqrt_ps`**
22
22
23
-
To gain a deeper understanding of how these intrinsics work and to surface detailed descriptions, you can use the search feature on SIMD.info. Simply enter the name of the intrinsic into the search bar. You can either select from the suggested results or perform a direct search to retrieve information about each intrinsic.
23
+
To gain a deeper understanding of how these intrinsics work and to surface detailed descriptions, you can use the search feature on SIMD.info. Simply enter the name of the intrinsic in the search bar. You can either select from the suggested results or perform a direct search to retrieve information about each intrinsic.
24
24
25
-
1. By searching [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you will retrieve information about its purpose, the result type, assembly instruction, prototype, and an example demonstration. By clicking the **engine** option **"NEON"** you can find its [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) helps you find the right one. Based on the prototype provided, you can choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/)because it works with 128-bit vectors which is the same as **SSE4.2**.
25
+
1. By searching for [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you will retrieve information about its purpose, the result type, assembly instructions, prototypes, and an example demonstration. By clicking the **engine** option **"NEON"** you can find its [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) helps you find the right one. Based on the prototype provided, you can choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/)as it works with 128-bit vectors which is the same as **SSE4.2**.
26
26
27
27
2. Moving to the next intrinsic, **`_mm_mul_ps`**, you can use the [Intrinsics Tree](https://simd.info/tag-tree) on SIMD.info to find the equivalent.
28
28
29
-
Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. As you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as before—it operates on 128-bit vectors.
29
+
Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. As you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as before; it operates on 128-bit vectors.
30
30
31
-
3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding NEON intrinsic is by typing **"Square Root"**into the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section regarding**SSE4.2**, you can clearly see that **`_mm_sqrt_ps`** has its place as a direct match for this operation.
31
+
3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding NEON intrinsic is by typing **"Square Root"**in the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section about**SSE4.2**, you can see that **`_mm_sqrt_ps`** has its place as a direct match for this operation.
32
32
33
-
4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and since you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant[**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/).
33
+
4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and as you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant[**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/).
34
34
35
35
Now that you have found the NEON equivalents for each SSE4.2 intrinsic, you're ready to begin porting the code. Understanding these equivalents is key to ensuring that the code produces the correct results in the calculations as you switch between SIMD engines.
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ layout: learningpathall
10
10
11
11
Follow this step-by-step process to porting:
12
12
13
-
1. Change the loading process to follow NEON's method for initializing vectors. The SSE4.2 intrinsic **`_mm_set_ps`** is in reality a macro, in NEON you can do the same thing with curly braces **`{}`**inititialization.
13
+
1. Change the loading process to follow NEON's method for initializing vectors. The SSE4.2 intrinsic **`_mm_set_ps`** is in reality a macro, in NEON you can do the same thing with curly braces **`{}`**initialization.
14
14
2. Next, replace the SSE4.2 intrinsics with the NEON equivalents that you identified earlier. The key is to ensure that the operations perform the same tasks, such as comparison, addition, multiplication, and square root calculations.
15
15
3. Finally, modify the storing process to match NEON’s way of moving data from vectors to memory. In NEON, you use functions like [**`vst1q_f32`**](https://simd.info/c_intrinsic/vst1q_f32/) for storing 128-bit floating-point vectors and [**`vst1q_u32`**](https://simd.info/c_intrinsic/vst1q_u32/) for storing 128-bit integer vectors.
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ b : a0 8c 78 64 50 3c 28 14
63
63
_mm_madd_epi16(a, b) : a4d8 0 56b8 0 2198 0 578 0
64
64
```
65
65
66
-
You will note that the result of the first element is a negative number, even though you added 2 positive results (`130*140` and `150*160`). This is because the result of the addition has to occupy a 16-bit signed integer element, and when the first is larger we have the effect of an negative overflow. The result is the same in binary arithmetic, but when interpreted into a signed integer, it turns the number into a negative.
66
+
You will note that the result of the first element is a negative number, even though you added 2 positive results (`130*140` and `150*160`). This is because the result of the addition has to occupy a 16-bit signed integer element, and when the first is larger we have the effect of a negative overflow. The result is the same in binary arithmetic, but when interpreted into a signed integer, it turns the number into a negative.
67
67
68
68
The rest of the values are as expected. Notice how each pair has a zero element next to it. The results are correct, but they are not in the correct order. In this example, you used **`vmovl`** to zero-extend values, which achieves the correct order with zero elements in place. While both **`vmovl`** and **`zip`** can be used for this purpose, **`vmovl`** was chosen in this implementation. For more details, see the Arm Software Optimization Guides, such as the [Neoverse V2 guide](https://developer.arm.com/documentation/109898/latest/).
0 commit comments