Merge pull request #1364 from VectorCamp/main

pareenaverma · web-flow · commit cab68ba2c0b9 · 2024-11-11T10:38:23.000-05:00
SIMD.info LP completed
diff --git a/assets/contributors.csv b/assets/contributors.csv
@@ -46,3 +46,4 @@ Alaaeddine Chakroun,Day Devs,Alaaeddine-Chakroun,alaaeddine-chakroun,,https://da
 Koki Mitsunami,Arm,,,,
 Chen Zhang,Zilliz,,,,
 Tianyu Li,Arm,,,,
+Georgios Mermigkis,VectorCamp,gMerm,georgios-mermigkis,,https://vectorcamp.gr/ 
diff --git a/content/learning-paths/cross-platform/simd-info-demo/_index.md b/content/learning-paths/cross-platform/simd-info-demo/_index.md
@@ -0,0 +1,42 @@
+---
+title: Introduction to SIMD.info
+
+minutes_to_complete: 30
+
+who_is_this_for: This is for software developers interested in porting SIMD code across platforms.
+
+learning_objectives: 
+    - Learn how to use SIMD.info’s tools and features, such as navigation, search, and comparison, to simplify the process of finding equivalent SIMD intrinsics between architectures and improving code portability.
+
+prerequisites:
+    - A basic understanding of SIMD.
+    - Access to an Arm platform with SIMD supported engine, with recent versions of a C compiler (Clang or GCC) installed.
+
+author_primary: Georgios Mermigkis & Konstantinos Margaritis, VectorCamp
+
+### Tags
+skilllevels: Advanced
+subjects: Performance and Architecture
+armips:
+    - Aarch64
+    - Armv8-a
+    - Armv9-a
+tools_software_languages:
+    - GCC
+    - Clang
+    - Coding
+    - Rust
+operatingsystems:
+    - Linux
+shared_path: true
+shared_between:
+    - laptops-and-desktops
+    - servers-and-cloud-computing
+    - smartphones-and-mobile
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/cross-platform/simd-info-demo/_next-steps.md b/content/learning-paths/cross-platform/simd-info-demo/_next-steps.md
@@ -0,0 +1,19 @@
+---
+next_step_guidance: You should explore **SIMD.info** more and find out porting opportunities between different SIMD engines.
+
+recommended_path: /learning-paths/cross-platform/
+
+further_reading:
+    - resource:
+        title: SIMD.info
+        link: https://simd.info
+        type: website
+
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 21                  # set to always be larger than the content in this path, and one more than 'review'
+title: "Next Steps"         # Always the same
+layout: "learningpathall"   # All files under learning paths have this same wrapper
+---
diff --git a/content/learning-paths/cross-platform/simd-info-demo/_review.md b/content/learning-paths/cross-platform/simd-info-demo/_review.md
@@ -0,0 +1,44 @@
+---
+review:
+    - questions:
+        question: >
+            What is SIMD.info?
+        answers:
+            - An online resource for SIMD C intrinsics for all major architectures
+            - It's an online forum for SIMD developers
+            - A book about SIMD programming
+        correct_answer: 1                    
+        explanation: >
+            While it allows comments in the SIMD intrinsics, SIMD.info is not really a forum. It is an online **free** resource to assist developers porting C code between popular architectures, for example, from SSE/AVX/AVX512 to Arm ASIMD.
+
+    - questions:
+        question: >
+            What architectures are listed in SIMD.info?
+        answers:
+            - Intel SSE and Arm ASIMD
+            - Power VSX and Arm ASIMD/SVE
+            - Intel SSE4.2/AVX/AVX2/AVX512, Arm ASIMD, Power VSX
+        correct_answer: 3
+        explanation: >
+            At the time of writing SIMD.info supports Intel SSE4.2/AVX/AVX2/AVX512, Arm ASIMD, Power VSX as SIMD architectures. Work is in progress to include Arm SVE/SVE2, MIPS MSA, RISC-V RVV 1.0, s390 Z and others.
+
+    - questions:
+        question: >
+            What are SIMD.info's major features?
+        answers:
+            - Hierarchical tree, Search, AI code translation
+            - Search, Hierarchical tree, Code examples
+            - Hierarchical tree, Search, Intrinsics Comparison, Code examples, Equivalents mapping, links to official documentation
+        correct_answer: 3
+        explanation: >
+            SIMD.info provides multiple features, including a hierarchical tree, Search facility, Intrinsics Comparison, Code examples, Equivalents mapping, links to official documentation and others. AI code translation is not a feature of SIMD.info but will be the focus of another project, SIMD.ai.
+
+
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+title: "Review"                 # Always the same title
+weight: 20                      # Set to always be larger than the content in this path
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+---
diff --git a/content/learning-paths/cross-platform/simd-info-demo/conclusion.md b/content/learning-paths/cross-platform/simd-info-demo/conclusion.md
@@ -0,0 +1,17 @@
+---
+title: Conclusion
+weight: 8
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+### Conclusion and Additional Resources
+
+Porting SIMD code between architecture can be a daunting process, in many cases requiring many hours of studying multiple ISAs in online resources or ISA manuals of thousands pages. Our primary focus in this work was to optimize the existing algorithm directly with SIMD intrinsics, without altering the algorithm or data layout. While reordering data to align with native ARM instructions could offer performance benefits, our scope remained within the constraints of the current data layout and algorithm. For those interested in data layout strategies to further enhance performance on ARM, the [vectorization-friendly data layout learning path](https://learn.arm.com/learning-paths/cross-platform/vectorization-friendly-data-layout/) offers valuable insights.
+
+Using **[SIMD.info](https://simd.info)** can be be instrumental in reducing the amount of time spent in this process, providing a centralized and user-friendly resource for finding **NEON** equivalents to intrinsics of other architectures. It saves considerable time and effort by offering detailed descriptions, prototypes, and comparisons directly, eliminating the need for extensive web searches and manual lookups.
+
+While porting between vectors of different sizes is more complex, work is underway -at the time of writing this guide- to complete integration of **SVE**/**SVE2** Arm extensions and allow matching them with **AVX512** intrinsics, as they are both using predicate masks.
+
+Please check **[SIMD.info](https://simd.info)** regularly for updates on this.
diff --git a/content/learning-paths/cross-platform/simd-info-demo/intro-to-simdinfo.md b/content/learning-paths/cross-platform/simd-info-demo/intro-to-simdinfo.md
@@ -0,0 +1,17 @@
+---
+title: Overview & Context
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+### The Challenge of SIMD Code Portability
+One of the biggest challenges developers face when working with SIMD code is making it portable across different platforms. SIMD instructions are designed to increase performance by executing the same operation on multiple data elements in parallel. However, each architecture has its own set of SIMD instructions, making it difficult to write code that works on all of them without major changes to the code and/or algorithm.
+
+Consider you have the task of porting a software written using Intel intrinsics, like SSE/AVX/AVX512, to Arm Neon.
+The differences in instruction sets and data handling require careful attention.
+
+This lack of portability increases development time and introduces the risk of errors during the porting process. Currently, developers rely on ISA documentation and manually search across various vendor platforms like [ARM Developer](https://developer.arm.com/architectures/instruction-sets/intrinsics/) and [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html) to find equivalent instructions.
+
+[SIMD.info](https://simd.info) aims to solve this by helping you find equivalent instructions and providing a more streamlined way to adapt your code for different architectures.
diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md
@@ -0,0 +1,67 @@
+---
+title: SIMD.info Features
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+### Comprehensive SIMD.info Capabilities
+**[SIMD.info](https://simd.info/)** offers a variety of powerful tools to help developers work more efficiently with SIMD code across different architectures. With a database of over 10,000 intrinsics, it provides detailed information to support effective SIMD development.
+
+For each intrinsic, SIMD.info provides comprehensive details, including:
+
+1. **Purpose**: A brief description of what the intrinsic does and its primary use case.
+2. **Result**: Explanation of the output or result of the intrinsic.
+3. **Example**: A code snippet demonstrating how to use the intrinsic.
+4. **Prototypes**: Function prototypes for different programming languages (currently C/C++).
+5. **Assembly Instruction**: The corresponding assembly instruction used by the intrinsic.
+6. **Notes**: Any additional notes or caveats about the intrinsic.
+7. **Architecture**: List of architectures that support the intrinsic
+8. **Link(s) to Official Documentation**
+
+This detailed information ensures you have all the necessary resources to effectively use and port SIMD instructions across different platforms. Each feature is designed to simplify navigation, improve the search for equivalent instructions, and foster a collaborative environment for knowledge-sharing.
+
+- **Tree-based navigation:** **SIMD.info** uses a clear, hierarchical layout to organize instructions. It categorizes instructions into broad groups like **Arithmetic**, which are further divided into specific subcategories such as **Vector Add** and **Vector Subtract**. This organized structure makes it straightforward to browse through SIMD instruction sets across various platforms, allowing you to efficiently find and access the exact instructions you need.
+An example of how the tree structure looks like:
+
+
+    - **Arithmetic** 
+    - **Arithmetic (Complex Numbers)** 
+    - **Boolean Logic & Bit Manipulation** 
+        - **Boolean AND** 
+        - **Boolean AND NOT** 
+            - **Boolean AND NOT 128-bit vector** 
+            - **Boolean AND NOT 16-bit signed integers** 
+            - **Boolean AND NOT 16-bit unsigned integers** 
+            - **Boolean AND NOT 256-bit vector**
+            - **Boolean AND NOT 32-bit floats** 
+            - **Boolean AND NOT 32-bit signed integers** 
+                - AVX512: mm512_andnot_epi32 
+                - NEON: vbic_s32
+                - NEON: vbicq_s32 
+                - VSX: vec_andc 
+        - **Bit Clear** 
+        - **XOR**
+
+- **Advanced search functionality:** With its robust search engine, **SIMD.info** allows you to either search for a specific intrinsic (e.g. `vaddq_f64`) or enter more general terms (e.g. *How to add 2 vectors*), and it will return a list of the corresponding intrinsics. You can also filter results based on the specific engine you're working with, such as **NEON**, **SSE4.2**, **AVX**, **AVX512**, **VSX**. This functionality streamlines the process of finding the right commands tailored to your needs.
+
+- **Comparison tools:** This feature lets you directly compare SIMD instructions from different (or the same) platforms side by side, offering a clear view of the similarities and differences. It’s an invaluable tool for porting code across architectures, as it ensures accuracy and efficiency.
+
+- **Discussion forum (like StackOverflow):** The integrated discussion forum, powered by **[discuss](https://disqus.com/)** allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support, fostering collaboration and knowledge-sharing among SIMD developers. Imagine something like **StackOverflow** but specific to SIMD intrinsics.
+
+### Work in Progress & Future Development
+- **Pseudo-code:** Currently under development, this feature will enable users to generate high-level pseudo-code based on specific SIMD instructions. This tool aims to enable better understanding of the SIMD instructions, in a *common language*. This will also be used in the next feature, **Intrinsics Diagrams**.
+
+- **Intrinsics Diagrams:** A feature under progress, creating detailed diagrams for each intrinsic to visualize how it operates on a low level using registers. These diagrams will help you grasp the mechanics of SIMD instructions more clearly, aiding in optimization and debugging.
+
+- **[SIMD.ai](https://simd.ai/):** SIMD.ai is an upcoming feature that promises to bring AI-assisted insights and recommendations to the SIMD development process, making it faster and more efficient to port SIMD code between architectures.
+
+### How These Features Aid in SIMD Development
+**[SIMD.info](https://simd.info/)** offers a range of features that streamline the process of porting SIMD code across different architectures. The hierarchical structure of tree-based navigation allows you to easily locate instructions within a clear framework. This organization into broad categories and specific subcategories, such as **Arithmetic** and **Boolean Logic**, makes it straightforward to identify the relevant SIMD instructions.
+
+When you need to port code from one architecture to another, the advanced search functionality proves invaluable. You can either search for specific intrinsics or use broader terms to find equivalent instructions across platforms. This capability ensures that you quickly find the right intrinsics for Arm, Intel or Power architectures.
+
+Furthermore, **SIMD.info**’s comparison tools enhance this process by enabling side-by-side comparisons of instructions from various platforms. This feature highlights the similarities and differences between instructions, which is crucial for accurately adapting your code. By understanding how similar operations are implemented across architectures, you can ensure that your ported code performs optimally.
+
+Let's look at an actual example.
diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md
@@ -0,0 +1,33 @@
+---
+title: Porting Process
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+### Using SIMD.info to find NEON Equivalents
+Now that you have a clear view of the example, you can start the process of porting the code to Arm **Neon/ASIMD**.
+
+This is where [SIMD.info](https://simd.info/) comes in.
+
+In SIMD programming, the primary concern is the integrity and accuracy of the calculations. Ensuring that these calculations are done correctly is crucial. Performance almost always comes second.
+
+For the operations in your **SSE4.2** example, you have the following intrinsics:
+
+- **`_mm_cmpgt_ps`**
+- **`_mm_add_ps`**
+- **`_mm_mul_ps`**
+- **`_mm_sqrt_ps`**
+
+To gain a deeper understanding of how these intrinsics work and to get detailed descriptions, you can use the search feature on **SIMD.info**. Simply enter the intrinsic's name into the search bar. You can either select from the suggested results or perform a direct search to find detailed information about each intrinsic.
+
+1. By searching [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you get information about it's purpose, result-type, assembly instruction, prototype and an example about it. By clicking the **engine** option **"NEON"** you can find it's [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) will help you find the right one. Based on the prototype provided, you would choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/) because it works with 128-bit vectors which is the same as **SSE4.2**.
+
+2. Moving to the next intrinsic, **`_mm_mul_ps`**, you will use the [Intrinsics Tree](https://simd.info/tag-tree) on **SIMD.info** to find the equivalent. Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. Since you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as before—it operates on 128-bit vectors.
+
+3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding **NEON** intrinsic is by typing **"Square Root"** into the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section regarding **SSE4.2**, you can clearly see that **`_mm_sqrt_ps`** has its place as a direct match for this operation.
+
+4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and since you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant [**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/).
+
+Now that you have found the **NEON** equivalents for each **SSE4.2** intrinsic, you're ready to begin porting the code. Understanding these equivalents is key to ensuring that the code produces the correct results in the calculations as you switch between SIMD engines.
diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md
diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1.md
diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md