|
| 1 | +# `Restartable Kernel` Sample |
| 2 | + |
| 3 | +This tutorial demonstrates how to make a restartable kernel. The technique shown in this tutorial lets you dynamically terminate your kernel while it runs, allowing it to load a new set of kernel arguments. |
| 4 | + |
| 5 | +| Optimized for | Description | |
| 6 | +| :------------------ | :------------------------------------------------------------------------------------------ | |
| 7 | +| OS | Ubuntu* 20.04 <br> RHEL*/CentOS* 8 <br> SUSE* 15 <br> Windows* 10 <br> Windows Server* 2019 | |
| 8 | +| Hardware | Intel® Agilex® 7, Agilex® 5, Arria® 10, Stratix® 10, and Cyclone® V FPGAs | |
| 9 | +| Software | Intel® oneAPI DPC++/C++ Compiler | |
| 10 | +| What you will learn | Best practices for creating and managing a oneAPI FPGA project | |
| 11 | +| Time to complete | 10 minutes | |
| 12 | + |
| 13 | +> **Note**: Even though the Intel DPC++/C++ oneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. |
| 14 | +> |
| 15 | +> To use the simulator flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) and one of the following simulators must be installed and accessible through your PATH: |
| 16 | +> |
| 17 | +> - Questa\*-Intel® FPGA Edition |
| 18 | +> - Questa\*-Intel® FPGA Starter Edition |
| 19 | +> - ModelSim® SE |
| 20 | +> |
| 21 | +> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) must be installed and accessible through your PATH. |
| 22 | +> |
| 23 | +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. |
| 24 | +
|
| 25 | +> **Note**: In oneAPI full systems, kernels that use SYCL Unified Shared Memory (USM) host allocations or USM shared allocations (and therefore the code in this tutorial) are only supported by Board Support Packages (BSPs) with USM support. Kernels that use these types of allocations can always be used to generate standalone IPs. |
| 26 | +
|
| 27 | +## Prerequisites |
| 28 | + |
| 29 | +This sample is part of the FPGA code samples. |
| 30 | +It is categorized as a Tier 3 sample that demonstrates a design pattern. |
| 31 | + |
| 32 | +```mermaid |
| 33 | +flowchart LR |
| 34 | + tier1("Tier 1: Get Started") |
| 35 | + tier2("Tier 2: Explore the Fundamentals") |
| 36 | + tier3("Tier 3: Explore the Advanced Techniques") |
| 37 | + tier4("Tier 4: Explore the Reference Designs") |
| 38 | +
|
| 39 | + tier1 --> tier2 --> tier3 --> tier4 |
| 40 | +
|
| 41 | + style tier1 fill:#f96,stroke:#0071c1,stroke-width:1px,color:#fff |
| 42 | + style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff |
| 43 | + style tier3 fill:#0071c1,stroke:#333,stroke-width:1px,color:#fff |
| 44 | + style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff |
| 45 | +``` |
| 46 | + |
| 47 | +Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/C++SYCL_FPGA/README.md). |
| 48 | +You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), etc. |
| 49 | + |
| 50 | +## Purpose |
| 51 | + |
| 52 | +This tutorial demonstrates how to add a `stop` register to allow a host application to kill (or reset) your kernel at any point. This design pattern is useful in applications where you want your kernel to run for some indefinite number of iterations that can't be communicated ahead of time. For example, consider a situation where you want your kernel to periodically re-launch with new kernel arguments when something happens that only the host is aware of, such as an input device disconnecting, or some amount of time passing. |
| 53 | + |
| 54 | +## Key Implementation Details |
| 55 | + |
| 56 | +The key to implementing this behavior is to create a `while()` loop that terminates when a 'stop' signal is seen on a pipe interface. Pipe interfaces (unlike kernel arguments) can be read multiple times during a kernel's execution, so you can use them to send messages to your kernel while it executes. The `while()` loop continues iterating until the host application (or even a different kernel) writes a `true` into the `StopPipe`. We use **non-blocking** pipe operations to guarantee that the kernel checks *all* of its pipe interfaces every clock cycle. It is important to use non-blocking pipe reads and writes, because blocking pipe operations may take some time to respond. If the kernel is blocking on a different pipe operation, it will not respond to a write to the `StopPipe` interface. |
| 57 | + |
| 58 | +```c++ |
| 59 | +[[intel::initiation_interval(1)]] // NO-FORMAT: Attribute |
| 60 | +while (keep_going) { |
| 61 | + // Use non-blocking operations to ensure that the kernel can check all its |
| 62 | + // pipe interfaces every clock cycle, even if one or more data interfaces |
| 63 | + // are stalling (asserting valid = 0) or back-pressuring (asserting ready |
| 64 | + // = 0). |
| 65 | + bool did_write = false; |
| 66 | + PipePayloadType beat = <...>; |
| 67 | + OutputPipe::write(beat, did_write); |
| 68 | + |
| 69 | + // Only adjust the state of the kernel if the pipe write succeeded. |
| 70 | + // This is logically equivalent to blocking. |
| 71 | + if (did_write) { |
| 72 | + counter++; |
| 73 | + } |
| 74 | + |
| 75 | + // Use non-blocking operations to ensure that the kernel can check all its |
| 76 | + // pipe interfaces every clock cycle. |
| 77 | + bool did_read_keep_going = false; |
| 78 | + bool stop_result = StopPipe::read(did_read_keep_going); |
| 79 | + if (did_read_keep_going) { |
| 80 | + keep_going = !stop_result; |
| 81 | + } |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +In this sample, `StopPipe` has been assigned the `protocol::avalon_mm_uses_ready` property so it terminates in the kernel's control/status register (CSR) instead of in a streaming interface. Terminating in the CSR allows this kernel to be managed by a memory-mapped host (such as a Nios® V softcore processor), while terminating in a streaming interface is convenient if this kernel were to be managed by another SYCL kernel. For details about the `protocol::avalon_mm_uses_ready` property, see the [CSR Pipes](/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/hls_flow_interfaces/component_interfaces_comparison/csr-pipes) sub-sample within the [Component Interfaces Comparison](/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/hls_flow_interfaces/component_interfaces_comparison) code sample. |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +The testbench in `main.cpp` exercises the kernel in the following steps: |
| 90 | + |
| 91 | +1. Initialize the counter kernel with an initial value of 7. |
| 92 | +2. Read a sequence of 256 outputs from the kernel, which should be a monotonically growing sequence starting at 7. |
| 93 | +3. Read 256 more outputs from the kernel, which should be a monotonically growing sequence starting at 263. |
| 94 | +4. Stop the kernel. |
| 95 | +5. Initialize the kernel with a new initialization value of 77. |
| 96 | +6. ead 256 more outputs from the kernel, which should be a monotonically growing sequence starting at 77. |
| 97 | + |
| 98 | +## Building the `restartable_kernel` Tutorial |
| 99 | + |
| 100 | +> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. |
| 101 | +> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window. |
| 102 | +> This practice ensures that your compiler, libraries, and tools are ready for development. |
| 103 | +> |
| 104 | +> Linux\*: |
| 105 | +> |
| 106 | +> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` |
| 107 | +> - For private installations: ` . ~/intel/oneapi/setvars.sh` |
| 108 | +> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'` |
| 109 | +> |
| 110 | +> Windows\*: |
| 111 | +> |
| 112 | +> - `C:\"Program Files (x86)"\Intel\oneAPI\setvars.bat` |
| 113 | +> - Windows PowerShell\*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'` |
| 114 | +> |
| 115 | +> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows\*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). |
| 116 | +
|
| 117 | +Use these commands to run the design, depending on your OS. |
| 118 | + |
| 119 | +### On a Linux\* System |
| 120 | + |
| 121 | +This design uses CMake to generate a build script for GNU/make. |
| 122 | + |
| 123 | +1. Change to the sample directory. |
| 124 | + |
| 125 | +2. Configure the build system for the Agilex® 7 device family, which is the default. |
| 126 | + |
| 127 | + ``` |
| 128 | + mkdir build |
| 129 | + cd build |
| 130 | + cmake .. |
| 131 | + ``` |
| 132 | + |
| 133 | + > **Note**: You can change the default target by using the command: |
| 134 | + > |
| 135 | + > ``` |
| 136 | + > cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number> |
| 137 | + > ``` |
| 138 | +
|
| 139 | +3. Compile the design. (The provided targets match the recommended development flow.) |
| 140 | +
|
| 141 | + 1. Compile for emulation (fast compile time, targets emulates an FPGA device). |
| 142 | + ``` |
| 143 | + make fpga_emu |
| 144 | + ``` |
| 145 | + 2. Generate the HTML optimization reports. |
| 146 | + ``` |
| 147 | + make report |
| 148 | + ``` |
| 149 | + 3. Compile for simulation (fast compile time, targets simulator FPGA device). |
| 150 | + ``` |
| 151 | + make fpga_sim |
| 152 | + ``` |
| 153 | + 4. Compile with Quartus place and route (To get accurate area estimate, longer compile time). |
| 154 | + ``` |
| 155 | + make fpga |
| 156 | + ``` |
| 157 | +
|
| 158 | +### On a Windows\* System |
| 159 | +
|
| 160 | +This design uses CMake to generate a build script for `nmake`. |
| 161 | +
|
| 162 | +1. Change to the sample directory. |
| 163 | +
|
| 164 | +2. Configure the build system for the Agilex® 7 device family, which is the default. |
| 165 | +
|
| 166 | + ``` |
| 167 | + mkdir build |
| 168 | + cd build |
| 169 | + cmake -G "NMake Makefiles" .. |
| 170 | + ``` |
| 171 | +
|
| 172 | + You can create a debuggable binary by setting `CMAKE_BUILD_TYPE` to `Debug`: |
| 173 | +
|
| 174 | + ``` |
| 175 | + mkdir build |
| 176 | + cd build |
| 177 | + cmake -G "NMake Makefiles" .. -DCMAKE_BUILD_TYPE=Debug |
| 178 | + ``` |
| 179 | +
|
| 180 | + If you want to use the `report`, `fpga_sim`, or `fpga` flows, you should switch the `CMAKE_BUILD_TYPE` back to `Release``: |
| 181 | +
|
| 182 | + ``` |
| 183 | + cmake -G "NMake Makefiles" .. -DCMAKE_BUILD_TYPE=Release |
| 184 | + ``` |
| 185 | +
|
| 186 | + > **Note**: You can change the default target by using the command: |
| 187 | + > |
| 188 | + > ``` |
| 189 | + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number> |
| 190 | + > ``` |
| 191 | +
|
| 192 | +
|
| 193 | +
|
| 194 | +3. Compile the design. (The provided targets match the recommended development flow.) |
| 195 | +
|
| 196 | + 1. Compile for emulation (fast compile time, targets emulated FPGA device). |
| 197 | + ``` |
| 198 | + nmake fpga_emu |
| 199 | + ``` |
| 200 | + 2. Generate the optimization report. |
| 201 | + ``` |
| 202 | + nmake report |
| 203 | + ``` |
| 204 | + 3. Compile for simulation (fast compile time, targets simulator FPGA device). |
| 205 | + ``` |
| 206 | + nmake fpga_sim |
| 207 | + ``` |
| 208 | + 4. Compile with Quartus place and route (To get accurate area estimate, longer compile time). |
| 209 | + ``` |
| 210 | + nmake fpga |
| 211 | + ``` |
| 212 | +
|
| 213 | + > **Note**: If you encounter any issues with long paths when compiling under Windows\*, you may have to create your 'build' directory in a shorter path, for example `C:\samples\build`. You can then run cmake from that directory, and provide cmake with the full path to your sample directory, for example: |
| 214 | + > |
| 215 | + > ``` |
| 216 | + > C:\samples\build> cmake -G "NMake Makefiles" C:\long\path\to\code\sample\CMakeLists.txt |
| 217 | + > ``` |
| 218 | +
|
| 219 | +## Run the `restartable_kernel` Executable |
| 220 | +
|
| 221 | +### On Linux |
| 222 | +
|
| 223 | +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). |
| 224 | + ``` |
| 225 | + ./restartable.fpga_emu |
| 226 | + ``` |
| 227 | +2. Run the sample on the FPGA simulator device. |
| 228 | + ``` |
| 229 | + CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./restartable.fpga_sim |
| 230 | + ``` |
| 231 | +
|
| 232 | +### On Windows |
| 233 | +
|
| 234 | +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). |
| 235 | + ``` |
| 236 | + restartable.fpga_emu.exe |
| 237 | + ``` |
| 238 | +2. Run the sample on the FPGA simulator device. |
| 239 | + ``` |
| 240 | + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 |
| 241 | + restartable.fpga_sim.exe |
| 242 | + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= |
| 243 | + ``` |
| 244 | +
|
| 245 | +## Example Output |
| 246 | +
|
| 247 | +``` |
| 248 | +Running on device: Intel(R) FPGA Emulation Device |
| 249 | + |
| 250 | +Start kernel RestartableCounter at 7. |
| 251 | +Flush pipe until 'start of packet' is seen. |
| 252 | + Flushed 0 beats. |
| 253 | +Start counting from 7 |
| 254 | +Start counting from 263 |
| 255 | +Stop kernel RestartableCounter |
| 256 | + |
| 257 | +Start RestartableCounter at 77. |
| 258 | +Flush pipe until 'start of packet' is seen. |
| 259 | + Flushed 239107 beats. |
| 260 | +Start counting from 77 |
| 261 | +Stop kernel RestartableCounter |
| 262 | +PASSED |
| 263 | +``` |
| 264 | +
|
| 265 | +## License |
| 266 | +
|
| 267 | +Code samples are licensed under the MIT license. See |
| 268 | +[License.txt](/License.txt) for details. |
| 269 | +
|
| 270 | +Third party program Licenses can be found here: [third-party-programs.txt](/third-party-programs.txt). |
0 commit comments