@@ -23,10 +23,114 @@ Potential Topics
2323* Function pointers revisited
2424* oneDPL C++ standard library support
2525
26+ 2023-09-19
27+ =============
28+
29+ * Ruyman Reyes (Intel/Codeplay)
30+ * Lukas Sommer (Codeplay Software Ltd)
31+ * Benie (Codeplay Software Ltd)
32+ * Hyesun Hong (Samsung SAIT)
33+ * Julian Oppermann (Codeplay Software Ltd)
34+ * Mehdi Goli (Codeplay Software Ltd)
35+ * Lueck, Gregory (Intel)
36+ * Jesus Labarta (BSC) (Guest)
37+ * Brodman, James (Intel)
38+ * Hanwoong Jung (Samsung SAIT)
39+ * Brice Goglin (Invité)
40+ * Plaska, Oskar (Contractor, Cognizant)
41+ * Tom Deakin (Univ. of Bristol)
42+ * Marcin (N/A)
43+ * Victor Lomuller (Codeplay Software Ltd)
44+ * Biagio COSENZA (Università degli Studi di Salerno)
45+ * Voss, Michael J (Intel)
46+ * Kukanov, Alexey (Intel)
47+ * Richards, Alison L (Intel)
48+ * Adam Kuźniar (Mobica)
49+ * Slavova, Gergana S (Intel)
50+ * bongjun kim (Samsung SAIT)
51+ * Keryell, Ronan (XILINX LABS)
52+ * Juan Fumero (University of Manchester)
53+ * Gordon Brown (Codeplay Software Ltd)
54+ * Tim (N/A)
55+ * Kinsner, Michael (Intel)
56+ * Petersen, Paul (Intel)
57+ * Videau, Brice (ANL)
58+ * Holmes, Daniel John (Intel)
59+ * Frank Brill (Cadence)
60+ * Mrozek, Michal (Intel)
61+ * Reble, Pablo (Intel)
62+ * Andrew Richards (Intel/Codeplay)
63+ * Smith, Timmie (Intel)
64+
65+
66+ SYCL Extension Proposal for PIM/PNM
67+ --------------------------------------
68+
69+ Hyesun Hong,
70+ `Slides <presentation/2023-09-19-HS-sycl-pim-extensions.pdf> `
71+
72+ * PIM/PNM technology enables computation directly on memory
73+ * Prevents data movement improving performance and reducing consumption
74+ * PIM operates directly on memory banks by reading and storing on rows and columns
75+ * Aquabolt-XL is the first demonstrator
76+ * Can be drop in on any memory controller
77+ * CXL-PNM is the CXL variant for PNM, can work with multiple PIM
78+
79+ SYCL Extension for PIM/PNM
80+ * Goals
81+ * Seamlessly integrate PIM/PNM operation into SYCL
82+ * Allow combination of xGPU and PIM/PNM in one device kernel
83+ * Not specific to one hardware
84+ * Design
85+ * Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
86+ * Model as special function unit
87+ * Aligns with trends to model special functional units inside accelerators
88+ * Compiler automatic mapping often not possible
89+ * joint_matrix
90+ * Group functions
91+ * Easy to use
92+ * Can easily be combined with device code
93+ * Give necessary convergence guarantees
94+ * Recap of SYCL work-item, work-group and group functions
95+ * Group functions must be encountered in converged control flow
96+ * Extension
97+ * Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
98+ * Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
99+ * Extension for PNM
100+ * Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
101+ * PNM standalone has less opportunity for parallelism, also limited by memory controller
102+ * -> Combine PNM and PIM, PNM generates commands for PIM blocks
103+ * Two modes
104+ * PIM mode: PIM blocks can operate independently, can choose number of blocks
105+ * PNM mode: Synchronized execution on multiple PIM blocks
106+ * Mapping
107+ * Every PIM block is one work-item
108+ * PNM with attached PIM blocks forms one work-group
109+ * Execution
110+ * Work-item operations map to PIM operation
111+ * Group functions map to PNM operation
112+ * Example
113+ * work-item execution maps to PIM
114+ * group function maps to PNM
115+ * Conclusion
116+ * Integrate support for PIM/PNM into SYCL
117+
118+ Q&A
119+ * Are the proposed functions specific to PIM or could also be used with other HW?
120+ * Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
121+ * Can also map nicely to other types of hardware, for example vector processor
122+ * Why have the user explicitly specify a block-size?
123+ * Not a hardware detail
124+ * Rather a promise by the user that data-blocks will always be at least that big
125+ * Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
126+ * Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
127+ * Yes, that is possible, mainly a design question
128+ * Current version might have additional implications regarding alignment
129+
130+
261312023-06-05
27132==========
28133
29-
30134* Ruyman Reyes
31135* Rod Burns
32136* Cohn, Robert S
0 commit comments