You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -19,12 +19,6 @@ All samples can be run on a CPU or a PVC GPU.
19
19
20
20
The sample includes nine different mini samples that showcase the usage of ASan.
21
21
22
-
| File Name | Description
23
-
|:--- |:---
24
-
|`array_reduction.cpp` | Demonstrates a basic, serial CPU implementation.
25
-
|`bad_free.cpp` | Demonstrates an initial single-GPU offload using SYCL.
26
-
|`device_global.cpp` | Demonstrates multi-GPU offload using SYCL.
27
-
28
22
## Prerequisites
29
23
30
24
| Optimized for | Description
@@ -49,7 +43,7 @@ The basic SYCL implementation explained in the code includes:
49
43
50
44
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
51
45
52
-
## Build the `Jacobi Iterative Solver` Sample
46
+
## Build the `Address Sanitizer` Sample
53
47
54
48
> **Note**: If you have not already done so, set up your CLI
55
49
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
@@ -61,16 +55,6 @@ When working with the command-line interface (CLI), you should configure the one
61
55
>
62
56
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
63
57
64
-
> **Note**: For GPU Analysis on Linux*, enable collecting GPU hardware metrics by setting the value of `dev.i915 perf_stream_paranoidsysctl` option to `0`.
65
-
>
66
-
> The command makes a temporary change that is lost after reboot:
> **Note**: If you are connecting to a remote system with oneAPI tools installed, you might not be able to launch the Intel® Advisor graphical interface. You can transfer the HTML-based report to a local machine to view it.
144
-
145
-
Based on the output captured from Intel® Advisor, one can see an estimated speed-up if we offload loops identified in the Top Offloaded section of the output. We can get about 9x speedup for one loop and a 4.5x speedup for another. The generation of the matrix can be speedup almost 7x times. In the next step, we will offload those loops to the GPUs.
146
-
147
-

148
-
149
-
### Basic GPU Offload
150
-
151
-
The second step is to offload to a GPU. The `2_guided_jacobi_iterative_solver_gpu` uses the basic offload of each for loop into a GPU. For example, the main loop calculating the unknown variables will be calculated by N kernels where N is the number of rows in the matrix. Even this basic offload improves the execution time.
152
-
153
-
Once the offload code changes, run the roofline analysis to look for performance optimization opportunities.
154
-
155
-
> **Note**: Multi-GPU programs are not supported in this release, so no output from Intel® Advisor will be available.
As we can see in the charts below, the execution time has been sped up as we predicted in all cases besides the main loop. The reason is that we have to wait for each iteration to finish calculations, as the next iteration is dependent on the results we get. This is why 74.6% of the time that the GPU has been in the **Stalled** mode.
168
-
169
-

170
-
171
-
172
-
### Build and Run the Sample in Intel® DevCloud (Optional)
173
-
174
-
>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
175
-
176
-
1. Open a terminal on a Linux* system.
177
-
2. Log in to the Intel® DevCloud.
178
-
```
179
-
ssh devcloud
180
-
```
181
-
3. Change to the sample directory.
182
-
4. Configure the sample for the appropriate node.
183
-
184
-
<details>
185
-
<summary>You can specify nodes using a single line script.</summary>
186
-
187
-
The following example is for a GPU node. (This is a single line script.)
188
-
```
189
-
qsub -I -l nodes=1:gpu:ppn=2 -d .
190
-
```
191
-
-`-I` (upper case I) requests an interactive session.
192
-
-`-l nodes=1:gpu:ppn=2` (lower case L) assigns one full GPU node.
193
-
-`-d .` makes the current folder as the working directory for the task.
194
-
195
-
|Available Nodes |Command Options
196
-
|:--- |:---
197
-
|GPU |`qsub -l nodes=1:gpu:ppn=2 -d .`
198
-
|CPU |`qsub -l nodes=1:xeon:ppn=2 -d .`
199
-
200
-
</details>
201
-
202
-
5. Perform build steps you would on Linux. (Including optionally cleaning the project.)
203
-
204
-
<details>
205
-
<summary>You can submit build and run jobs through a Portable Bash Script (PBS).</summary>
206
-
207
-
A job is a script that submitted to PBS through the `qsub` utility. By default, the `qsub` utility does not inherit the current environment variables or your current working directory, so you might need to submit jobs to configure the environment variables. To indicate the correct working directory, you can use either absolute paths or pass the `-d \<dir\>` option to `qsub`.
208
-
209
-
If you choose to use scripts, jobs terminate with writing files to the disk:
210
-
-`<script_name>.sh.eXXXX`, which is the job stderr
211
-
-`<script_name>.sh.oXXXX`, which is the job stdout
212
-
213
-
Here XXXX is the job ID, which gets printed to the screen after each qsub command.
214
-
215
-
You can inspect output of the sample.
216
-
```
217
-
cat run.sh.oXXXX
218
-
```
219
-
Once the jobs complete, you can remove the stderr and stdout files.
220
-
```
221
-
rm run.sh.*
222
-
```
223
-
</details>
224
-
225
-
6. Run the sample.
226
-
227
-
> **Note**: To inspect job progress if you are using a script, use the qstat utility.
228
-
> ```
229
-
> watch -n 1 qstat -n -1
230
-
> ```
231
-
> The command displays the results every second. The job is complete when no new results display.
232
-
233
-
7. Review the output, then exit.
234
-
91
+
> **Note**: List of all samples and command to run them.
92
+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
93
+
>| File Name | Run command
94
+
>|:--- |:---
95
+
>|`array_reduction.cpp` | make run_array_reduction
96
+
>|`bad_free.cpp` | make run_bad_free
97
+
>|`device_global.cpp` | make run_device_global
98
+
>|`group_local.cpp` | make run_group_local
99
+
>|`local_stencil.cpp` | make run_local_stencil
100
+
>|`map.cpp` | make run_map
101
+
>|`matmul_broadcast.cpp` | make run_matmul_broadcast
102
+
>|`misalign-long.cpp` | make run_misalign-long
103
+
>|`nd_range_reduction.cpp` | make run_nd_range_reduction
235
104
236
105
## Example Output
237
106
238
-
### CPU Results
239
-
240
-
The following output is for a **9x9** matrix.
107
+
The following output is for the arrayreduction.cpp sample.
241
108
```
242
-
Scanning dependencies of target run_cpu
243
-
./jacobi_cpu_iterative_solver
244
-
Device : Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
0 commit comments