Update README.md

IgorOchocki · web-flow · commit eeb5a1d1290f · 2024-10-15T14:53:58.000+02:00
diff --git a/DirectProgramming/C++SYCL/DenseLinearAlgebra/address_sanitizer/README.md b/DirectProgramming/C++SYCL/DenseLinearAlgebra/address_sanitizer/README.md
@@ -19,12 +19,6 @@ All samples can be run on a CPU or a PVC GPU.
 
 The sample includes nine different mini samples that showcase the usage of ASan.
 
-| File Name                                      | Description
-|:---                                            |:---
-|`array_reduction.cpp`  			 | Demonstrates a basic, serial CPU implementation.
-|`bad_free.cpp`          			 | Demonstrates an initial single-GPU offload using SYCL.
-|`device_global.cpp`				 | Demonstrates multi-GPU offload using SYCL.
-
 ## Prerequisites
 
 | Optimized for          | Description
@@ -49,7 +43,7 @@ The basic SYCL implementation explained in the code includes:
 
 When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
 
-## Build the `Jacobi Iterative Solver` Sample
+## Build the `Address Sanitizer` Sample
 
 > **Note**: If you have not already done so, set up your CLI
 > environment by sourcing  the `setvars` script in the root of your oneAPI installation.
@@ -61,16 +55,6 @@ When working with the command-line interface (CLI), you should configure the one
 >
 > For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
 
-> **Note**: For GPU Analysis on Linux*, enable collecting GPU hardware metrics by setting the value of `dev.i915 perf_stream_paranoidsysctl` option to `0`. 
->
-> The command makes a temporary change that is lost after reboot:
->
-> `sudo sysctl -w dev.i915.perf_stream_paranoid=0`
->
-> To make the change permanent, enter:
->
-> `sudo echo dev.i915.perf_stream_paranoid=0 > /etc/sysctl.d/60-mdapi.conf`
-
 ### Using Visual Studio Code*  (Optional)
 
 You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.
@@ -95,201 +79,52 @@ To learn more about the extensions and how to configure the oneAPI environment,
    cmake ..
    make
    ```
-3. Run the program for CPU.
-   ```
-   make run_1_cpu
-   ```
-4. Run the program for GPU. (Optional)
-   ```
-   make run_2_gpu
+3. Run the program.
    ```
-5. Run the program for multiple GPUs. (Optional)
-   ```
-   make run_3_multi_gpu
-   ```
-   > **Note**: This option is untested, but you should be able to run it in a multi-GPU environment.
-   
+   make run_array_reduction
+   ```   
 6. Clean the project. (Optional)
    ```
    make clean
    ```
 
-## Guided Builds and Offloads
-
-These guided instructions show how to optimize code using the Jacobi Iterative method in the following steps:
-
-1. Start with code that runs on the CPU.
-2. Change the code for basic GPU offload.
-3. Change the code for the multi-GPU offload. (Not available in this release.)
-
-In each step, the Intel® Advisor analysis tool provides performance analysis for the applications. The Intel® Advisor runtime might take a long time.
-
-> **Important**: The performance results and measurements depend on hardware. Your results may vary from what is shown.
-
-### CPU Offload Modeling
-
-The first step is to run offload modeling on the CPU version to identify portions of the code can benefit from acceleration.
-
-> **Note**: This process may take up to 30 minutes.
-
-1. Run Intel® Advisor to model the CPU code.
-   ```
-   advisor --collect=offload --config=gen9_gt2 --project-dir=./../advisor/1_cpu -- ./src/1_guided_jacobi_iterative_solver_cpu
-   ```
-2. View the results. 
-   ```
-   advisor-gui ../advisor/1_cpu/e000/e000.advixeexp
-   ```
-   > **Note**: If you are connecting to a remote system with oneAPI tools installed, you might not be able to launch the Intel® Advisor graphical interface. You can transfer the HTML-based report to a local machine to view it.
-
-   Based on the output captured from Intel® Advisor, one can see an estimated speed-up if we offload loops identified in the Top Offloaded section of the output. We can get about 9x speedup for one loop and a 4.5x speedup for another. The generation of the matrix can be speedup almost 7x times. In the next step, we will offload those loops to the GPUs.
-
-   ![offload Modeling results](images/cpu.PNG)
-
-### Basic GPU Offload
-
-The second step is to offload to a GPU. The `2_guided_jacobi_iterative_solver_gpu` uses the basic offload of each for loop into a GPU. For example, the main loop calculating the unknown variables will be calculated by N kernels where N is the number of rows in the matrix. Even this basic offload improves the execution time.
-
-Once the offload code changes, run the roofline analysis to look for performance optimization opportunities.
-
-> **Note**: Multi-GPU programs are not supported in this release, so no output from Intel® Advisor will be available.
-
-1. Run Intel® Advisor again.
-   ```
-   advisor --collect=roofline --profile-gpu --project-dir=./../advisor/2_gpu -- ./src/2_guided_jacobi_iterative_solver_gpu
-   ```
-
-2. View the results again.
-   ```
-   advisor-gui ../advisor/2_gpu/e000/e000.advixeexp
-   ```
-
-   As we can see in the charts below, the execution time has been sped up as we predicted in all cases besides the main loop. The reason is that we have to wait for each iteration to finish calculations, as the next iteration is dependent on the results we get. This is why 74.6% of the time that the GPU has been in the **Stalled** mode.
-
-   ![offload Modeling results](images/gpu.PNG)
-
-
-### Build and Run the Sample in Intel® DevCloud (Optional)
-
->**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
-
-1. Open a terminal on a Linux* system.
-2. Log in to the Intel® DevCloud.
-   ```
-   ssh devcloud
-   ```
-3. Change to the sample directory.
-4. Configure the sample for the appropriate node.
-   
-   <details>
-   <summary>You can specify nodes using a single line script.</summary>
-
-   The following example is for a GPU node. (This is a single line script.)
-	```
-	qsub  -I  -l nodes=1:gpu:ppn=2 -d .
-	```
-   - `-I` (upper case I) requests an interactive session.
-   - `-l nodes=1:gpu:ppn=2` (lower case L) assigns one full GPU node.
-   - `-d .` makes the current folder as the working directory for the task.
-
-     |Available Nodes    |Command Options
-     |:---               |:---
-     |GPU	             |`qsub -l nodes=1:gpu:ppn=2 -d .`
-     |CPU	             |`qsub -l nodes=1:xeon:ppn=2 -d .`
-
-  </details>
-
-5. Perform build steps you would on Linux. (Including optionally cleaning the project.)
-
-   <details>
-   <summary>You can submit build and run jobs through a Portable Bash Script (PBS).</summary>
-
-   A job is a script that submitted to PBS through the `qsub` utility. By default, the `qsub` utility does not inherit the current environment variables or your current working directory, so you might need to submit jobs to configure the environment variables. To indicate the correct working directory, you can use either absolute paths or pass the `-d \<dir\>` option to `qsub`. 
-
-   If you choose to use scripts, jobs terminate with writing files to the disk:
-   - `<script_name>.sh.eXXXX`, which is the job stderr
-   - `<script_name>.sh.oXXXX`, which is the job stdout
-
-   Here XXXX is the job ID, which gets printed to the screen after each qsub command.
-
-   You can inspect output of the sample.
-   ```
-   cat run.sh.oXXXX
-   ```
-   Once the jobs complete, you can remove the stderr and stdout files.
-   ```
-   rm run.sh.*
-   ```
-</details>
-
-6. Run the sample.
-
-   > **Note**: To inspect job progress if you are using a script, use the qstat utility.
-   >   ```
-   >   watch -n 1 qstat -n -1
-   >   ```
-   >  The command displays the results every second. The job is complete when no new results display.
-
-7. Review the output, then exit.
-
+> **Note**: List of all samples and command to run them.
+> environment by sourcing  the `setvars` script in the root of your oneAPI installation.
+>| File Name                                      | Run command
+>|:---                                            |:---
+>|`array_reduction.cpp`  			  | make run_array_reduction
+>|`bad_free.cpp`          			  | make run_bad_free
+>|`device_global.cpp`				  | make run_device_global
+>|`group_local.cpp`				  | make run_group_local
+>|`local_stencil.cpp`				  | make run_local_stencil
+>|`map.cpp`					  | make run_map
+>|`matmul_broadcast.cpp`		 	  | make run_matmul_broadcast
+>|`misalign-long.cpp`			 	  | make run_misalign-long
+>|`nd_range_reduction.cpp`			  | make run_nd_range_reduction
 
 ## Example Output
 
-### CPU Results
-
-The following output is for a **9x9** matrix.
+The following output is for the arrayreduction.cpp sample.
 ```
-Scanning dependencies of target run_cpu
-./jacobi_cpu_iterative_solver
-Device : Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
-
-Matrix generated, time elapsed: 0.31112 seconds.
-[5122.01 263.22 1.67 626.22 317 -333.22 947.8 -852.83 -808.99 ][277.63]
-[277.63 -4634.73 529.95 -657.22 -564.18 601.12 676.36 452.62 314.03 ][740.29]
-[740.29 499.7 -4774.05 794.3 -156.33 -237.9 397.63 -160.86 916.96 ][76.47]
-[76.47 -693.42 -135.63 4075.95 -287.56 993 936.34 -28.63 86.55 ][833.52]
-[833.52 769.69 -400.31 443.35 -5587.83 431.49 453.66 556.13 845.82 ][844.33]
-[844.33 549.12 440.36 416.48 -236.84 -4121.63 -460.53 -236.51 -359.54 ][-358.82]
-[-358.82 -655.35 -569.26 -982.24 102.27 522.64 -4836.28 -534.2 -552.49 ][545.54]
-[545.54 -403.51 249.11 918.1 -575.88 -151.27 159.93 3964.62 -770 ][-216.26]
-[-216.26 751.31 267.88 691.34 -161.82 973.27 908.53 -175.76 -4150.42 ][-813.02]
-
-Computations complete, time elapsed: 0.295054 seconds.
-Total number of sweeps: 30
-Checking results
-All values are correct.
-
-Check complete, time elapsed: 0.00481672 seconds.
-Total runtime is 0.750036 seconds.
-X1 equals: 0.09895873327
-X2 equals: -0.15802401129
-X3 equals: 0.03854686450
-X4 equals: 0.16617806412
-X5 equals: -0.12925305745
-X6 equals: 0.11829640135
-X7 equals: -0.13914309700
-X8 equals: -0.09521910620
-X9 equals: 0.19864875400
-Built target run_cpu
-```
-
-### GPU Offload Results
-
-The following output is for a **30000x30000** matrix.
-
-```
-Device : Intel(R) Graphics [0x020a]
-
-Matrix generated, time elapsed: 3.58536 seconds.
-
-Computations complete, time elapsed: 3.41483 seconds.
-Total number of sweeps: 7
-Checking results
-All values are correct.
-
-Check complete, time elapsed: 2.61934 seconds.
-Total runtime is 13.5157 seconds.
-[100%] Built target run_2_gpu
+Histogram:
+bin[0]: 4
+bin[1]: 4
+bin[2]: 4
+bin[3]: 4
+SUCCESS
+
+====ERROR: DeviceSanitizer: bad-free on address 0x250d040
+  #0 ./array_reduction() [0x405249]
+  #1 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x735d35229d90]
+  #2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x735d35229e40]
+  #3 ./array_reduction() [0x4049b5]
+
+0x250d040 may be allocated on Host Memory
+Segmentation fault (core dumped)
+make[3]: *** [src/CMakeFiles/run_array_reduction.dir/build.make:70: src/CMakeFiles/run_array_reduction] Error 139
+make[2]: *** [CMakeFiles/Makefile2:392: src/CMakeFiles/run_array_reduction.dir/all] Error 2
+make[1]: *** [CMakeFiles/Makefile2:399: src/CMakeFiles/run_array_reduction.dir/rule] Error 2
+make: *** [Makefile:254: run_array_reduction] Error 2
 ```
 
 ## License