|
131 | 131 | "Next, use the SYCLomatic Tool (c2s) to migrate the code; it will store the result in the migration folder `dpct_output`:\n",
|
132 | 132 | "\n",
|
133 | 133 | "```\n",
|
134 |
| - "c2s -p compile_commands.json --in-root ../../.. --gen-helper-function\n", |
| 134 | + "c2s -p compile_commands.json --in-root ../../.. --gen-helper-function --use-experimental-features=logical-group\n", |
135 | 135 | "```\n",
|
136 | 136 | "\n",
|
137 | 137 | "The `--gen-helper-function` option will copy the SYCLomatic helper header files to output directory.\n",
|
138 | 138 | "\n",
|
| 139 | + "The `-use-experimental-features=logical-group` option is needed since this CUDA example is using CUDA cooperative groups, and SYCLomatic will migrate this using a experimemtal feature currently.\n", |
| 140 | + "\n", |
139 | 141 | "The `--in-root` option will specify the path for all the common include files for the CUDA project.\n",
|
140 | 142 | "\n",
|
141 | 143 | "This command should migrate the CUDA source to the C++ SYCL source in a folder named `dpct_output` by default, and the folder will have the C++ SYCL source along with any dependencies from the `Common` folder,\n",
|
|
147 | 149 | },
|
148 | 150 | {
|
149 | 151 | "cell_type": "markdown",
|
150 |
| - "id": "5122ddf2-6bcb-4ec7-8daf-18273d48d8a6", |
| 152 | + "id": "f394c88b-9fb6-4160-b320-1e7491fa7139", |
151 | 153 | "metadata": {},
|
152 | 154 | "source": [
|
153 | 155 | "## Analyze, Compile and Run the migrated SYCL source\n",
|
|
171 | 173 | "\n",
|
172 | 174 | "To compile the migrated SYCL code we can use the following command:\n",
|
173 | 175 | "```\n",
|
174 |
| - "icpx -fsycl -fsycl-targets=intel_gpu_pvc -I ../../../Common -I ../../../include *.cpp -pthread\n", |
| 176 | + "icpx -fsycl -fsycl-targets=intel_gpu_pvc -I ../../../Common -I ../../../include *.cpp\n", |
175 | 177 | "```\n",
|
176 | 178 | "\n",
|
177 | 179 | "There may be compile errors based on whether all of the CUDA code was migrated to SYCL or not. The migrated code may also include comments with warning messages, which could help make it easier to fix the errors. These errors have to be manually fixed to get the code to compile.\n",
|
178 | 180 | "\n",
|
179 |
| - "\n", |
| 181 | + "#### Build and Run\n", |
| 182 | + "Select the cell below and click run ▶ to compile and execute the code (expect to see errors):" |
| 183 | + ] |
| 184 | + }, |
| 185 | + { |
| 186 | + "cell_type": "code", |
| 187 | + "execution_count": null, |
| 188 | + "id": "51e1df07-401e-4842-95c8-0e753bd41df5", |
| 189 | + "metadata": {}, |
| 190 | + "outputs": [], |
| 191 | + "source": [ |
| 192 | + "! ./q.sh run_dpct_output.sh" |
| 193 | + ] |
| 194 | + }, |
| 195 | + { |
| 196 | + "cell_type": "markdown", |
| 197 | + "id": "8a58a333-ceee-4bea-ba9c-f0982b49449c", |
| 198 | + "metadata": {}, |
| 199 | + "source": [ |
180 | 200 | "### Fixing unmigrated SYCL code\n",
|
181 | 201 | "\n",
|
182 | 202 | "The manual migration of CUDA Graph API calls to SYCL can be done using two separate approaches,\n",
|
|
243 | 263 | "\n",
|
244 | 264 | "For more information on memory operations refer [here](https://github.com/taskflow/taskflow/blob/master/taskflow/sycl/syclflow.hpp).\r\n",
|
245 | 265 | "\n",
|
246 |
| - "##### Option 2 (using SYCL Graph):\n", |
| 266 | + "##### Option 2 (using SYCL Graph):\n", |
247 | 267 | "Similar to memcpy node, memset operation can also be included as a node through the command graph add method\n",
|
248 | 268 | "\n",
|
249 | 269 | "```\n",
|
|
258 | 278 | " nodeDependencies.size(), &kernelNodeParams);\n",
|
259 | 279 | "```\n",
|
260 | 280 | "\n",
|
261 |
| - "##### Option 1 (using Taskflow): \n", |
| 281 | + "##### Option 1 (using Taskflow): \n", |
262 | 282 | "The tf::syclFlow::on creates a task to launch the given command group function object and tf::syclFlow::parallel_for creates a kernel task from a parallel_for method through the handler object associated with a command group. The SYCL runtime schedules command group function objects from an out-of-order queue and constructs a task graph based on submitted events.\n",
|
263 | 283 | "\n",
|
264 | 284 | "```\n",
|
|
272 | 292 | " }).name(\"reduce_kernel\");\n",
|
273 | 293 | "```\n",
|
274 | 294 | "\n",
|
275 |
| - "##### Option 2 (using SYCL Graph):\n", |
| 295 | + "##### Option 2 (using SYCL Graph):\n", |
276 | 296 | "Kernel operations are also included as a node through the command graph `add` method. These commands are captured into the graph and executed asynchronously when the graph is submitted to a queue. The `property::node::depends_on` property can be passed here with a list of nodes to create dependency edges on.\n",
|
277 | 297 | "\n",
|
278 | 298 | "```\n",
|
|
296 | 316 | "cudaGraphAddHostNode(&hostNode, graph, nodeDependencies.data(), nodeDependencies.size(), &hostParams);\n",
|
297 | 317 | "```\n",
|
298 | 318 | "\n",
|
299 |
| - "##### Option 1 (using Taskflow): \n", |
| 319 | + "##### Option 1 (using Taskflow): \n", |
300 | 320 | "The tf::syclFlow doesn’t have a host method to run the callable on the host, instead, we can achieve this by creating a subflow graph since Taskflow supports dynamic tasking and runs the callable on the host.\n",
|
301 | 321 | "\n",
|
302 | 322 | "```\n",
|
|
313 | 333 | "cudaGraphGetNodes(graph, nodes, &numNodes);\n",
|
314 | 334 | "```\n",
|
315 | 335 | "\n",
|
316 |
| - "##### Option 1 (using Taskflow): \n", |
| 336 | + "##### Option 1 (using Taskflow): \n", |
317 | 337 | "CUDA graph nodes are equivalent to SYCL tasks, both tf::Taskflow and tf::syclFlow classes include num_tasks() function to query the total number of tasks.\n",
|
318 | 338 | "\n",
|
319 | 339 | "```\n",
|
|
335 | 355 | "\n",
|
336 | 356 | "The inputVec_h2d and outputVec_memset tasks run parallelly followed by the reduce_kernel task.\n",
|
337 | 357 | "\n",
|
338 |
| - "##### Option 2 (using SYCL Graph):\n", |
| 358 | + "##### Option 2 (using SYCL Graph):\n", |
339 | 359 | "After all the operations are added as a node the graph is finalized using `finalize()` so that no more nodes can be added and creates an executable graph that can be submitted for execution\n",
|
340 | 360 | "```\n",
|
341 | 361 | "auto exec_graph = graph.finalize();\r\n",
|
|
348 | 368 | "cudaGraphClone(&clonedGraph, graph);\n",
|
349 | 369 | "```\n",
|
350 | 370 | "\n",
|
351 |
| - "##### Option 1 (using Taskflow): \n", |
| 371 | + "##### Option 1 (using Taskflow): \n", |
352 | 372 | "In SYCL, no clone function is available as Taskflow graph objects are move-only. We can use the std::move() function as shown below to achieve functionality.\n",
|
353 | 373 | "\n",
|
354 | 374 | "```\n",
|
|
362 | 382 | "for (int i = 0; i < GRAPH_LAUNCH_ITERATIONS; i++) {\r\n",
|
363 | 383 | " cudaGraphLaunch(graphExec, streamForGraph); }\r\n",
|
364 | 384 | "```\n",
|
365 |
| - "##### Option 1 (using Taskflow): \n", |
| 385 | + "##### Option 1 (using Taskflow): \n", |
366 | 386 | "A taskflow graph can be run once or multiple times using an executor. run_n() will run the taskflow the number of times specified by the second argument.\n",
|
367 | 387 | "\n",
|
368 | 388 | "```\n",
|
369 | 389 | "exe.run_n(tflow, GRAPH_LAUNCH_ITERATIONS).wait();\n",
|
370 | 390 | "```\n",
|
371 | 391 | "\n",
|
372 |
| - "##### Option 2 (using SYCL Graph):\n", |
| 392 | + "##### Option 2 (using SYCL Graph):\n", |
373 | 393 | "The graph is submitted in its entirety for execution via `handler::ext_oneapi_graph(graph)`.\n",
|
374 | 394 | "\n",
|
375 | 395 | "```\n",
|
|
385 | 405 | "cudaGraphDestroy(graph);\n",
|
386 | 406 | "```\n",
|
387 | 407 | "\n",
|
388 |
| - "##### Option 1 (using Taskflow): \n", |
| 408 | + "##### Option 1 (using Taskflow): \n", |
389 | 409 | "tf::Taskflow class has default destructor operators for both tf::executor and tf::taskflow objects created.\n",
|
390 | 410 | "\n",
|
391 | 411 | "```\n",
|
|
0 commit comments