Skip to content

Commit eeb5a1d

Browse files
authored
Update README.md
1 parent b748e1d commit eeb5a1d

File tree

1 file changed

+37
-202
lines changed
  • DirectProgramming/C++SYCL/DenseLinearAlgebra/address_sanitizer

1 file changed

+37
-202
lines changed

DirectProgramming/C++SYCL/DenseLinearAlgebra/address_sanitizer/README.md

Lines changed: 37 additions & 202 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,6 @@ All samples can be run on a CPU or a PVC GPU.
1919
2020
The sample includes nine different mini samples that showcase the usage of ASan.
2121

22-
| File Name | Description
23-
|:--- |:---
24-
|`array_reduction.cpp` | Demonstrates a basic, serial CPU implementation.
25-
|`bad_free.cpp` | Demonstrates an initial single-GPU offload using SYCL.
26-
|`device_global.cpp` | Demonstrates multi-GPU offload using SYCL.
27-
2822
## Prerequisites
2923

3024
| Optimized for | Description
@@ -49,7 +43,7 @@ The basic SYCL implementation explained in the code includes:
4943

5044
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
5145

52-
## Build the `Jacobi Iterative Solver` Sample
46+
## Build the `Address Sanitizer` Sample
5347

5448
> **Note**: If you have not already done so, set up your CLI
5549
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
@@ -61,16 +55,6 @@ When working with the command-line interface (CLI), you should configure the one
6155
>
6256
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
6357
64-
> **Note**: For GPU Analysis on Linux*, enable collecting GPU hardware metrics by setting the value of `dev.i915 perf_stream_paranoidsysctl` option to `0`.
65-
>
66-
> The command makes a temporary change that is lost after reboot:
67-
>
68-
> `sudo sysctl -w dev.i915.perf_stream_paranoid=0`
69-
>
70-
> To make the change permanent, enter:
71-
>
72-
> `sudo echo dev.i915.perf_stream_paranoid=0 > /etc/sysctl.d/60-mdapi.conf`
73-
7458
### Using Visual Studio Code* (Optional)
7559

7660
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.
@@ -95,201 +79,52 @@ To learn more about the extensions and how to configure the oneAPI environment,
9579
cmake ..
9680
make
9781
```
98-
3. Run the program for CPU.
99-
```
100-
make run_1_cpu
101-
```
102-
4. Run the program for GPU. (Optional)
103-
```
104-
make run_2_gpu
82+
3. Run the program.
10583
```
106-
5. Run the program for multiple GPUs. (Optional)
107-
```
108-
make run_3_multi_gpu
109-
```
110-
> **Note**: This option is untested, but you should be able to run it in a multi-GPU environment.
111-
84+
make run_array_reduction
85+
```
11286
6. Clean the project. (Optional)
11387
```
11488
make clean
11589
```
11690

117-
## Guided Builds and Offloads
118-
119-
These guided instructions show how to optimize code using the Jacobi Iterative method in the following steps:
120-
121-
1. Start with code that runs on the CPU.
122-
2. Change the code for basic GPU offload.
123-
3. Change the code for the multi-GPU offload. (Not available in this release.)
124-
125-
In each step, the Intel® Advisor analysis tool provides performance analysis for the applications. The Intel® Advisor runtime might take a long time.
126-
127-
> **Important**: The performance results and measurements depend on hardware. Your results may vary from what is shown.
128-
129-
### CPU Offload Modeling
130-
131-
The first step is to run offload modeling on the CPU version to identify portions of the code can benefit from acceleration.
132-
133-
> **Note**: This process may take up to 30 minutes.
134-
135-
1. Run Intel® Advisor to model the CPU code.
136-
```
137-
advisor --collect=offload --config=gen9_gt2 --project-dir=./../advisor/1_cpu -- ./src/1_guided_jacobi_iterative_solver_cpu
138-
```
139-
2. View the results.
140-
```
141-
advisor-gui ../advisor/1_cpu/e000/e000.advixeexp
142-
```
143-
> **Note**: If you are connecting to a remote system with oneAPI tools installed, you might not be able to launch the Intel® Advisor graphical interface. You can transfer the HTML-based report to a local machine to view it.
144-
145-
Based on the output captured from Intel® Advisor, one can see an estimated speed-up if we offload loops identified in the Top Offloaded section of the output. We can get about 9x speedup for one loop and a 4.5x speedup for another. The generation of the matrix can be speedup almost 7x times. In the next step, we will offload those loops to the GPUs.
146-
147-
![offload Modeling results](images/cpu.PNG)
148-
149-
### Basic GPU Offload
150-
151-
The second step is to offload to a GPU. The `2_guided_jacobi_iterative_solver_gpu` uses the basic offload of each for loop into a GPU. For example, the main loop calculating the unknown variables will be calculated by N kernels where N is the number of rows in the matrix. Even this basic offload improves the execution time.
152-
153-
Once the offload code changes, run the roofline analysis to look for performance optimization opportunities.
154-
155-
> **Note**: Multi-GPU programs are not supported in this release, so no output from Intel® Advisor will be available.
156-
157-
1. Run Intel® Advisor again.
158-
```
159-
advisor --collect=roofline --profile-gpu --project-dir=./../advisor/2_gpu -- ./src/2_guided_jacobi_iterative_solver_gpu
160-
```
161-
162-
2. View the results again.
163-
```
164-
advisor-gui ../advisor/2_gpu/e000/e000.advixeexp
165-
```
166-
167-
As we can see in the charts below, the execution time has been sped up as we predicted in all cases besides the main loop. The reason is that we have to wait for each iteration to finish calculations, as the next iteration is dependent on the results we get. This is why 74.6% of the time that the GPU has been in the **Stalled** mode.
168-
169-
![offload Modeling results](images/gpu.PNG)
170-
171-
172-
### Build and Run the Sample in Intel® DevCloud (Optional)
173-
174-
>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
175-
176-
1. Open a terminal on a Linux* system.
177-
2. Log in to the Intel® DevCloud.
178-
```
179-
ssh devcloud
180-
```
181-
3. Change to the sample directory.
182-
4. Configure the sample for the appropriate node.
183-
184-
<details>
185-
<summary>You can specify nodes using a single line script.</summary>
186-
187-
The following example is for a GPU node. (This is a single line script.)
188-
```
189-
qsub -I -l nodes=1:gpu:ppn=2 -d .
190-
```
191-
- `-I` (upper case I) requests an interactive session.
192-
- `-l nodes=1:gpu:ppn=2` (lower case L) assigns one full GPU node.
193-
- `-d .` makes the current folder as the working directory for the task.
194-
195-
|Available Nodes |Command Options
196-
|:--- |:---
197-
|GPU |`qsub -l nodes=1:gpu:ppn=2 -d .`
198-
|CPU |`qsub -l nodes=1:xeon:ppn=2 -d .`
199-
200-
</details>
201-
202-
5. Perform build steps you would on Linux. (Including optionally cleaning the project.)
203-
204-
<details>
205-
<summary>You can submit build and run jobs through a Portable Bash Script (PBS).</summary>
206-
207-
A job is a script that submitted to PBS through the `qsub` utility. By default, the `qsub` utility does not inherit the current environment variables or your current working directory, so you might need to submit jobs to configure the environment variables. To indicate the correct working directory, you can use either absolute paths or pass the `-d \<dir\>` option to `qsub`.
208-
209-
If you choose to use scripts, jobs terminate with writing files to the disk:
210-
- `<script_name>.sh.eXXXX`, which is the job stderr
211-
- `<script_name>.sh.oXXXX`, which is the job stdout
212-
213-
Here XXXX is the job ID, which gets printed to the screen after each qsub command.
214-
215-
You can inspect output of the sample.
216-
```
217-
cat run.sh.oXXXX
218-
```
219-
Once the jobs complete, you can remove the stderr and stdout files.
220-
```
221-
rm run.sh.*
222-
```
223-
</details>
224-
225-
6. Run the sample.
226-
227-
> **Note**: To inspect job progress if you are using a script, use the qstat utility.
228-
> ```
229-
> watch -n 1 qstat -n -1
230-
> ```
231-
> The command displays the results every second. The job is complete when no new results display.
232-
233-
7. Review the output, then exit.
234-
91+
> **Note**: List of all samples and command to run them.
92+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
93+
>| File Name | Run command
94+
>|:--- |:---
95+
>|`array_reduction.cpp` | make run_array_reduction
96+
>|`bad_free.cpp` | make run_bad_free
97+
>|`device_global.cpp` | make run_device_global
98+
>|`group_local.cpp` | make run_group_local
99+
>|`local_stencil.cpp` | make run_local_stencil
100+
>|`map.cpp` | make run_map
101+
>|`matmul_broadcast.cpp` | make run_matmul_broadcast
102+
>|`misalign-long.cpp` | make run_misalign-long
103+
>|`nd_range_reduction.cpp` | make run_nd_range_reduction
235104
236105
## Example Output
237106

238-
### CPU Results
239-
240-
The following output is for a **9x9** matrix.
107+
The following output is for the arrayreduction.cpp sample.
241108
```
242-
Scanning dependencies of target run_cpu
243-
./jacobi_cpu_iterative_solver
244-
Device : Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
245-
246-
Matrix generated, time elapsed: 0.31112 seconds.
247-
[5122.01 263.22 1.67 626.22 317 -333.22 947.8 -852.83 -808.99 ][277.63]
248-
[277.63 -4634.73 529.95 -657.22 -564.18 601.12 676.36 452.62 314.03 ][740.29]
249-
[740.29 499.7 -4774.05 794.3 -156.33 -237.9 397.63 -160.86 916.96 ][76.47]
250-
[76.47 -693.42 -135.63 4075.95 -287.56 993 936.34 -28.63 86.55 ][833.52]
251-
[833.52 769.69 -400.31 443.35 -5587.83 431.49 453.66 556.13 845.82 ][844.33]
252-
[844.33 549.12 440.36 416.48 -236.84 -4121.63 -460.53 -236.51 -359.54 ][-358.82]
253-
[-358.82 -655.35 -569.26 -982.24 102.27 522.64 -4836.28 -534.2 -552.49 ][545.54]
254-
[545.54 -403.51 249.11 918.1 -575.88 -151.27 159.93 3964.62 -770 ][-216.26]
255-
[-216.26 751.31 267.88 691.34 -161.82 973.27 908.53 -175.76 -4150.42 ][-813.02]
256-
257-
Computations complete, time elapsed: 0.295054 seconds.
258-
Total number of sweeps: 30
259-
Checking results
260-
All values are correct.
261-
262-
Check complete, time elapsed: 0.00481672 seconds.
263-
Total runtime is 0.750036 seconds.
264-
X1 equals: 0.09895873327
265-
X2 equals: -0.15802401129
266-
X3 equals: 0.03854686450
267-
X4 equals: 0.16617806412
268-
X5 equals: -0.12925305745
269-
X6 equals: 0.11829640135
270-
X7 equals: -0.13914309700
271-
X8 equals: -0.09521910620
272-
X9 equals: 0.19864875400
273-
Built target run_cpu
274-
```
275-
276-
### GPU Offload Results
277-
278-
The following output is for a **30000x30000** matrix.
279-
280-
```
281-
Device : Intel(R) Graphics [0x020a]
282-
283-
Matrix generated, time elapsed: 3.58536 seconds.
284-
285-
Computations complete, time elapsed: 3.41483 seconds.
286-
Total number of sweeps: 7
287-
Checking results
288-
All values are correct.
289-
290-
Check complete, time elapsed: 2.61934 seconds.
291-
Total runtime is 13.5157 seconds.
292-
[100%] Built target run_2_gpu
109+
Histogram:
110+
bin[0]: 4
111+
bin[1]: 4
112+
bin[2]: 4
113+
bin[3]: 4
114+
SUCCESS
115+
116+
====ERROR: DeviceSanitizer: bad-free on address 0x250d040
117+
#0 ./array_reduction() [0x405249]
118+
#1 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x735d35229d90]
119+
#2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x735d35229e40]
120+
#3 ./array_reduction() [0x4049b5]
121+
122+
0x250d040 may be allocated on Host Memory
123+
Segmentation fault (core dumped)
124+
make[3]: *** [src/CMakeFiles/run_array_reduction.dir/build.make:70: src/CMakeFiles/run_array_reduction] Error 139
125+
make[2]: *** [CMakeFiles/Makefile2:392: src/CMakeFiles/run_array_reduction.dir/all] Error 2
126+
make[1]: *** [CMakeFiles/Makefile2:399: src/CMakeFiles/run_array_reduction.dir/rule] Error 2
127+
make: *** [Makefile:254: run_array_reduction] Error 2
293128
```
294129

295130
## License

0 commit comments

Comments
 (0)