@@ -23,10 +23,156 @@ Potential Topics
2323* Function pointers revisited
2424* oneDPL C++ standard library support
2525
26+ 2023-09-19
27+ =============
28+
29+ * Ruyman Reyes (Intel/Codeplay)
30+ * Lukas Sommer (Codeplay Software Ltd)
31+ * Benie (Codeplay Software Ltd)
32+ * Hyesun Hong (Samsung SAIT)
33+ * Julian Oppermann (Codeplay Software Ltd)
34+ * Mehdi Goli (Codeplay Software Ltd)
35+ * Lueck, Gregory (Intel)
36+ * Jesus Labarta (BSC) (Guest)
37+ * Brodman, James (Intel)
38+ * Hanwoong Jung (Samsung SAIT)
39+ * Brice Goglin (Invité)
40+ * Plaska, Oskar (Contractor, Cognizant)
41+ * Tom Deakin (Univ. of Bristol)
42+ * Marcin (N/A)
43+ * Victor Lomuller (Codeplay Software Ltd)
44+ * Biagio COSENZA (Università degli Studi di Salerno)
45+ * Voss, Michael J (Intel)
46+ * Kukanov, Alexey (Intel)
47+ * Richards, Alison L (Intel)
48+ * Adam Kuźniar (Mobica)
49+ * Slavova, Gergana S (Intel)
50+ * bongjun kim (Samsung SAIT)
51+ * Keryell, Ronan (XILINX LABS)
52+ * Juan Fumero (University of Manchester)
53+ * Gordon Brown (Codeplay Software Ltd)
54+ * Tim (N/A)
55+ * Kinsner, Michael (Intel)
56+ * Petersen, Paul (Intel)
57+ * Videau, Brice (ANL)
58+ * Holmes, Daniel John (Intel)
59+ * Frank Brill (Cadence)
60+ * Mrozek, Michal (Intel)
61+ * Reble, Pablo (Intel)
62+ * Andrew Richards (Intel/Codeplay)
63+ * Smith, Timmie (Intel)
64+
65+
66+ SYCL Extension Proposal for PIM/PNM
67+ --------------------------------------
68+
69+ Hyesun Hong,
70+ `Slides <presentation/2023-09-19-HS-sycl-pim-extensions.pdf> `
71+
72+ * PIM/PNM technology enables computation directly on memory
73+ * Prevents data movement improving performance and reducing consumption
74+ * Operates directly on memory banks by reading and storing on rows and columns
75+ * Aquabolt-XL is the first demonstrator
76+ * Can be drop in on any memory controller
77+ * CXL-PNM is the CXL variant for PNM, can work with multiple PIM
78+
79+ SYCL Extension for PIM/PNM
80+ * Work in collaboration with Codeplay Software team
81+ * Goals
82+
83+ * Seamlessly integrate PIM/PNM operation into SYCL
84+ * Allow combination of xGPU and PIM/PNM in one device kernel
85+ * Not specific to one hardware
86+
87+ * Design
88+
89+ * Vector operation seem like natural fit
90+ * no convergence guarantee and vector size explicit
91+
92+ * Model as special function unit
93+
94+ * Aligns with trends to model special functional units inside accelerators
95+ * Compiler automatic mapping often not possible
96+ * joint_matrix-like interface
97+
98+
99+ * Group functions
100+
101+ * Easy to use
102+ * Can easily be combined with device code
103+ * Give necessary convergence guarantees
104+
105+
106+ * Recap of SYCL work-item, work-group and group functions
107+
108+ * Group functions must be encountered in converged control flow
109+
110+ * Extension
111+
112+ * Extended group functions with additional overload of joint_reduce
113+ * and new joint_transform and joint_inner_product
114+ * Block size as template parameter, number of blocks as runtime parameter
115+ * allows calculation of number of elements to process
116+
117+ * Extension for PNM
118+
119+ * Added new overloads of joint_exclusive_scan,
120+ * joint_inclusive_scan, reduce_over_group
121+
122+ * PNM standalone has less opportunity for parallelism
123+
124+ * limited by memory controller
125+ * -> Combine PNM and PIM, PNM generates commands for PIM blocks
126+
127+ * Two modes
128+
129+ * PIM mode: PIM blocks can operate independently, can choose number of blocks
130+ * PNM mode: Synchronized execution on multiple PIM blocks
131+
132+ * Mapping
133+
134+ * Every PIM block is one work-item
135+ * PNM with attached PIM blocks forms one work-group
136+
137+ * Execution
138+
139+ * Work-item operations map to PIM operation
140+ * Group functions map to PNM operation
141+
142+ * Example
143+
144+ * work-item execution maps to PIM
145+ * group function maps to PNM
146+
147+ * Conclusion
148+
149+ * Integrate support for PIM/PNM into SYCL
150+
151+ Q&A
152+ * Are the proposed functions specific to PIM, could also be used with other HW?
153+
154+ * Can also be used with other hardware.
155+ * Semantics not PIM-specific, but translation of C++ to SYCL
156+ * Can also map nicely to other types of hardware, e.g. vector processor
157+
158+ * Why have the user explicitly specify a block-size?
159+
160+ * Not a hardware detail
161+ * Rather a promise by the user that data-blocks
162+ will always be at least that big
163+ * Promise allows device compiler to perform optimizations,
164+ efficient looping inside PIM unit
165+
166+ * Could num_blocks runtime parameter be replaced by iterator?
167+
168+ * requires to be divisable by block-size
169+ * Yes, that is possible, mainly a design question
170+ * Current version might have additional implications regarding alignment
171+
172+
261732023-06-05
27174==========
28175
29-
30176* Ruyman Reyes
31177* Rod Burns
32178* Cohn, Robert S
0 commit comments