|
2 | 2 |
|
3 | 3 | ## Rationale behind caching |
4 | 4 |
|
5 | | -During SYCL program execution SYCL runtime will create internal objects |
| 5 | +During SYCL program execution, SYCL runtime will create internal objects |
6 | 6 | representing kernels and programs, it may also invoke JIT compiler to bring |
7 | 7 | kernels in a program to executable state. Those runtime operations are quite |
8 | 8 | expensive, and in some cases caching approach can be employed to eliminate |
@@ -65,7 +65,7 @@ examples below illustrate scenarios where such optimization is possible. |
65 | 65 | }); |
66 | 66 | ``` |
67 | 67 |
|
68 | | -In both cases SYCL runtime will need to build the program and kernels multiple |
| 68 | +In both cases, SYCL runtime will need to build the program and kernels multiple |
69 | 69 | times, which may involve JIT compilation and take quite a lot of time. |
70 | 70 |
|
71 | 71 | In order to eliminate this waste of run-time we introduce a kernel and program |
@@ -97,42 +97,27 @@ The cache is split into two levels: |
97 | 97 |
|
98 | 98 | ### In-memory cache |
99 | 99 |
|
100 | | -The cache stores underlying PI objects behind `sycl::program` and `sycl::kernel` |
101 | | -user-level objects in a per-context data storage. The storage consists of two |
102 | | -maps: one is for programs and the other is for kernels. |
| 100 | +The cache stores the underlying UR objects behind `sycl::program` and `sycl::kernel` |
| 101 | +user-level objects in a per-context data storage. The storage consists of three |
| 102 | +maps: one is for programs and the other two are for kernels. |
103 | 103 |
|
104 | 104 | The programs map's key consists of four components: |
105 | 105 |
|
106 | | -- kernel set id<sup>[1](#what-is-ksid)</sup>, |
| 106 | +- ID of the device image containing the program, |
107 | 107 | - specialization constants values, |
108 | | -- the device this program is built for, |
109 | | -- build options id <sup>[2](#what-is-bopts)</sup>. |
| 108 | +- the set of devices this program is built for. |
110 | 109 |
|
111 | 110 | The kernels map's key consists of two components: |
112 | 111 |
|
113 | 112 | - the program the kernel belongs to, |
114 | 113 | - kernel name<sup>[3](#what-is-kname)</sup>. |
115 | 114 |
|
116 | | -(what-is-ksid)= |
117 | | -<a name="what-is-ksid">1</a>: Kernel set id is an ordinal number of the device |
118 | | -binary image the kernel is contained in. |
| 115 | +The third map, called Fast Kernel Cache, is used as an optimization to reduce the |
| 116 | +number of lookups in the kernels map. It's key consists of the following components: |
119 | 117 |
|
120 | | -(what-is-bopts)= |
121 | | -<a name="what-is-bopts">2</a>: The concatenation of build options (both compile |
122 | | -and link options) set in application or environment variables. There are three |
123 | | -sources of build options that the cache is aware of: |
124 | | - |
125 | | -- from device image (pi_device_binary_struct::CompileOptions, |
126 | | - pi_device_binary_struct::LinkOptions); |
127 | | -- environment variables (SYCL_PROGRAM_COMPILE_OPTIONS, |
128 | | - SYCL_PROGRAM_LINK_OPTIONS); |
129 | | -- options passed through SYCL API. |
130 | | - |
131 | | -Note: Backend runtimes used by SYCL can have extra environment or configurations |
132 | | -values (e.g. IGC has |
133 | | -[igc_flags.def](https://github.com/intel/intel-graphics-compiler/blob/7f91dd6b9f2ca9c1a8ffddd04fa86461311c4271/IGC/common/igc_flags.def) |
134 | | -which affect JIT process). Changing such configuration will invalidate cache and |
135 | | -manual cache cleanup should be done. |
| 118 | +- specialization constants values, |
| 119 | +- the UR handle of the device this kernel is built for, |
| 120 | +- kernel name<sup>[3](#what-is-kname)</sup>. |
136 | 121 |
|
137 | 122 | (what-is-kname)= |
138 | 123 | <a name="what-is-kname">3</a>: Kernel name is a kernel ID mangled class' name |
@@ -408,10 +393,12 @@ LRU (least recently used) strategy both for in-memory and persistent cache. |
408 | 393 |
|
409 | 394 | #### In-memory cache eviction |
410 | 395 |
|
411 | | -It is initiated on program/kernel maps access/add item operation. When cache |
412 | | -size exceeds storage threshold the items which are least recently used are |
413 | | -deleted. |
414 | | -TODO: add detailed description of in-memory cache eviction mechanism. |
| 396 | +Eviction in in-memory cache is disabled by default but can be controlled by SYCL_IN_MEM_CACHE_EVICTION_THRESHOLD |
| 397 | +environment variable. The threshold is set in bytes and when the cache size exceeds the threshold the eviction process is initiated. The eviction process is based on LRU strategy. The cache is walked through and the least recently used items are deleted until the cache size is below the threshold. |
| 398 | +To implement eviction for in-memory cache efficiently, we store the programs in a linked-list, called eviction list. When the program is first added to the cache, it is also added to the back of the eviction list. When a program is fetched from cache, we move the program to the end of the eviction list. This way, we ensure that the programs at the beginning of the eviction list are always the least recently used. |
| 399 | +When adding a new program to cache, we check if the size of the program cache exceeds the threshold, if so, we iterate through the eviction list starting from the front and delete the programs until the cache size is below the threshold. When a program is deleted from the cache, we also evict its corresponding kernels from the kernel and fast kernel cache. |
| 400 | + |
| 401 | +***When the application run out-of-memory,*** either due to cache eviction being disabled or the cache eviction threshold being too high, we will evict all the items from program and kernel caches. This is done to prevent the application from crashing due to running out of memory. |
415 | 402 |
|
416 | 403 | #### Persistent cache eviction |
417 | 404 |
|
|
0 commit comments