Skip to content

Commit 8121d55

Browse files
Editorial
1 parent 172dc75 commit 8121d55

File tree

4 files changed

+11
-11
lines changed

4 files changed

+11
-11
lines changed

content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,14 @@ This organized structure enables you to browse through SIMD instruction sets acr
5555
- XOR
5656

5757
#### Advanced search functionality
58-
With its robust search engine, SIMD.info allows you to either search for a specific intrinsic, for example `vaddq_f64`, or enter more general terms, for example "How to add 2 vectors", and it returns a list of the corresponding intrinsics.
58+
With its robust search engine, SIMD.info allows you to either search for a specific intrinsic, for example `vaddq_f64`, or enter more general terms, for example "How to add 2 vectors," and it returns a list of the corresponding intrinsics.
5959

6060
You can also filter results based on the specific engine you're working with, such as NEON, SSE4.2, AVX, AVX512, or VSX. This functionality streamlines the process of finding the right commands tailored to your needs.
6161

6262
#### Comparison tools
6363
This feature lets you directly compare SIMD instructions from different, or the same, platforms side by side, offering a clear view of the similarities and differences. It’s a helpful tool for porting code across architectures, as it ensures accuracy and efficiency.
6464

6565
#### Discussion forum
66-
The integrated discussion forum, powered by **[discuss](https://disqus.com/)**, allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support. It fosters collaboration and knowledge-sharing among SIMD developers. Imagine something like **[StackOverflow](https://stackoverflow.com/)** but specific to SIMD intrinsics.
66+
The integrated discussion forum, powered by **[Disqus](https://disqus.com/)**, allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support. It fosters collaboration and knowledge-sharing among SIMD developers. Imagine something like **[StackOverflow](https://stackoverflow.com/)** but specific to SIMD intrinsics.
6767

6868
Now let's look at these features in the context of a real example.

content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,29 @@ layout: learningpathall
77
---
88

99
### Using SIMD.info to find NEON Equivalents
10-
Now that you have a clear view of the example, you can start the process of porting the code to Arm **Neon/ASIMD**.
10+
Now that you have a clear view of the example, you can start the process of porting the code to Arm Neon/ASIMD.
1111

1212
This is where [SIMD.info](https://simd.info/) comes in.
1313

1414
In SIMD programming, the primary focus is the integrity and accuracy of the calculations. Ensuring that these calculations are done correctly is crucial. Performance is almost always a secondary concern.
1515

16-
For the operations in your **SSE4.2** example, you have the following intrinsics:
16+
For the operations in your SSE4.2 example, you have the following intrinsics:
1717

1818
- **`_mm_cmpgt_ps`**
1919
- **`_mm_add_ps`**
2020
- **`_mm_mul_ps`**
2121
- **`_mm_sqrt_ps`**
2222

23-
To gain a deeper understanding of how these intrinsics work and to surface detailed descriptions, you can use the search feature on SIMD.info. Simply enter the name of the intrinsic into the search bar. You can either select from the suggested results or perform a direct search to retrieve information about each intrinsic.
23+
To gain a deeper understanding of how these intrinsics work and to surface detailed descriptions, you can use the search feature on SIMD.info. Simply enter the name of the intrinsic in the search bar. You can either select from the suggested results or perform a direct search to retrieve information about each intrinsic.
2424

25-
1. By searching [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you will retrieve information about its purpose, the result type, assembly instruction, prototype, and an example demonstration. By clicking the **engine** option **"NEON"** you can find its [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) helps you find the right one. Based on the prototype provided, you can choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/) because it works with 128-bit vectors which is the same as **SSE4.2**.
25+
1. By searching for [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you will retrieve information about its purpose, the result type, assembly instructions, prototypes, and an example demonstration. By clicking the **engine** option **"NEON"** you can find its [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) helps you find the right one. Based on the prototype provided, you can choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/) as it works with 128-bit vectors which is the same as **SSE4.2**.
2626

2727
2. Moving to the next intrinsic, **`_mm_mul_ps`**, you can use the [Intrinsics Tree](https://simd.info/tag-tree) on SIMD.info to find the equivalent.
2828

29-
Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. As you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as beforeit operates on 128-bit vectors.
29+
Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. As you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as before; it operates on 128-bit vectors.
3030

31-
3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding NEON intrinsic is by typing **"Square Root"** into the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section regarding **SSE4.2**, you can clearly see that **`_mm_sqrt_ps`** has its place as a direct match for this operation.
31+
3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding NEON intrinsic is by typing **"Square Root"** in the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section about **SSE4.2**, you can see that **`_mm_sqrt_ps`** has its place as a direct match for this operation.
3232

33-
4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and since you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant [**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/).
33+
4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and as you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant[**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/).
3434

3535
Now that you have found the NEON equivalents for each SSE4.2 intrinsic, you're ready to begin porting the code. Understanding these equivalents is key to ensuring that the code produces the correct results in the calculations as you switch between SIMD engines.

content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ layout: learningpathall
1010

1111
Follow this step-by-step process to porting:
1212

13-
1. Change the loading process to follow NEON's method for initializing vectors. The SSE4.2 intrinsic **`_mm_set_ps`** is in reality a macro, in NEON you can do the same thing with curly braces **`{}`** inititialization.
13+
1. Change the loading process to follow NEON's method for initializing vectors. The SSE4.2 intrinsic **`_mm_set_ps`** is in reality a macro, in NEON you can do the same thing with curly braces **`{}`** initialization.
1414
2. Next, replace the SSE4.2 intrinsics with the NEON equivalents that you identified earlier. The key is to ensure that the operations perform the same tasks, such as comparison, addition, multiplication, and square root calculations.
1515
3. Finally, modify the storing process to match NEON’s way of moving data from vectors to memory. In NEON, you use functions like [**`vst1q_f32`**](https://simd.info/c_intrinsic/vst1q_f32/) for storing 128-bit floating-point vectors and [**`vst1q_u32`**](https://simd.info/c_intrinsic/vst1q_u32/) for storing 128-bit integer vectors.
1616

content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ b : a0 8c 78 64 50 3c 28 14
6363
_mm_madd_epi16(a, b) : a4d8 0 56b8 0 2198 0 578 0
6464
```
6565

66-
You will note that the result of the first element is a negative number, even though you added 2 positive results (`130*140` and `150*160`). This is because the result of the addition has to occupy a 16-bit signed integer element, and when the first is larger we have the effect of an negative overflow. The result is the same in binary arithmetic, but when interpreted into a signed integer, it turns the number into a negative.
66+
You will note that the result of the first element is a negative number, even though you added 2 positive results (`130*140` and `150*160`). This is because the result of the addition has to occupy a 16-bit signed integer element, and when the first is larger we have the effect of a negative overflow. The result is the same in binary arithmetic, but when interpreted into a signed integer, it turns the number into a negative.
6767

6868
The rest of the values are as expected. Notice how each pair has a zero element next to it. The results are correct, but they are not in the correct order. In this example, you used **`vmovl`** to zero-extend values, which achieves the correct order with zero elements in place. While both **`vmovl`** and **`zip`** can be used for this purpose, **`vmovl`** was chosen in this implementation. For more details, see the Arm Software Optimization Guides, such as the [Neoverse V2 guide](https://developer.arm.com/documentation/109898/latest/).
6969

0 commit comments

Comments
 (0)