Skip to content
This repository was archived by the owner on Feb 5, 2024. It is now read-only.

Commit 4870bb1

Browse files
author
Ruyman Reyes Castro
committed
Meeting notes and slides for 19 September 2023
Samsung SAIT presentation about SYCL PIM language extensions
1 parent e6fa907 commit 4870bb1

File tree

2 files changed

+147
-1
lines changed

2 files changed

+147
-1
lines changed

language/README.rst

Lines changed: 147 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,156 @@ Potential Topics
2323
* Function pointers revisited
2424
* oneDPL C++ standard library support
2525

26+
2023-09-19
27+
=============
28+
29+
* Ruyman Reyes (Intel/Codeplay)
30+
* Lukas Sommer (Codeplay Software Ltd)
31+
* Benie (Codeplay Software Ltd)
32+
* Hyesun Hong (Samsung SAIT)
33+
* Julian Oppermann (Codeplay Software Ltd)
34+
* Mehdi Goli (Codeplay Software Ltd)
35+
* Lueck, Gregory (Intel)
36+
* Jesus Labarta (BSC) (Guest)
37+
* Brodman, James (Intel)
38+
* Hanwoong Jung (Samsung SAIT)
39+
* Brice Goglin (Invité)
40+
* Plaska, Oskar (Contractor, Cognizant)
41+
* Tom Deakin (Univ. of Bristol)
42+
* Marcin (N/A)
43+
* Victor Lomuller (Codeplay Software Ltd)
44+
* Biagio COSENZA (Università degli Studi di Salerno)
45+
* Voss, Michael J (Intel)
46+
* Kukanov, Alexey (Intel)
47+
* Richards, Alison L (Intel)
48+
* Adam Kuźniar (Mobica)
49+
* Slavova, Gergana S (Intel)
50+
* bongjun kim (Samsung SAIT)
51+
* Keryell, Ronan (XILINX LABS)
52+
* Juan Fumero (University of Manchester)
53+
* Gordon Brown (Codeplay Software Ltd)
54+
* Tim (N/A)
55+
* Kinsner, Michael (Intel)
56+
* Petersen, Paul (Intel)
57+
* Videau, Brice (ANL)
58+
* Holmes, Daniel John (Intel)
59+
* Frank Brill (Cadence)
60+
* Mrozek, Michal (Intel)
61+
* Reble, Pablo (Intel)
62+
* Andrew Richards (Intel/Codeplay)
63+
* Smith, Timmie (Intel)
64+
65+
66+
SYCL Extension Proposal for PIM/PNM
67+
--------------------------------------
68+
69+
Hyesun Hong,
70+
`Slides <presentation/2023-09-19-HS-sycl-pim-extensions.pdf>`
71+
72+
* PIM/PNM technology enables computation directly on memory
73+
* Prevents data movement improving performance and reducing consumption
74+
* Operates directly on memory banks by reading and storing on rows and columns
75+
* Aquabolt-XL is the first demonstrator
76+
* Can be drop in on any memory controller
77+
* CXL-PNM is the CXL variant for PNM, can work with multiple PIM
78+
79+
SYCL Extension for PIM/PNM
80+
* Work in collaboration with Codeplay Software team
81+
* Goals
82+
83+
* Seamlessly integrate PIM/PNM operation into SYCL
84+
* Allow combination of xGPU and PIM/PNM in one device kernel
85+
* Not specific to one hardware
86+
87+
* Design
88+
89+
* Vector operation seem like natural fit
90+
* no convergence guarantee and vector size explicit
91+
92+
* Model as special function unit
93+
94+
* Aligns with trends to model special functional units inside accelerators
95+
* Compiler automatic mapping often not possible
96+
* joint_matrix-like interface
97+
98+
99+
* Group functions
100+
101+
* Easy to use
102+
* Can easily be combined with device code
103+
* Give necessary convergence guarantees
104+
105+
106+
* Recap of SYCL work-item, work-group and group functions
107+
108+
* Group functions must be encountered in converged control flow
109+
110+
* Extension
111+
112+
* Extended group functions with additional overload of joint_reduce
113+
* and new joint_transform and joint_inner_product
114+
* Block size as template parameter, number of blocks as runtime parameter
115+
* allows calculation of number of elements to process
116+
117+
* Extension for PNM
118+
119+
* Added new overloads of joint_exclusive_scan,
120+
* joint_inclusive_scan, reduce_over_group
121+
122+
* PNM standalone has less opportunity for parallelism
123+
124+
* limited by memory controller
125+
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
126+
127+
* Two modes
128+
129+
* PIM mode: PIM blocks can operate independently, can choose number of blocks
130+
* PNM mode: Synchronized execution on multiple PIM blocks
131+
132+
* Mapping
133+
134+
* Every PIM block is one work-item
135+
* PNM with attached PIM blocks forms one work-group
136+
137+
* Execution
138+
139+
* Work-item operations map to PIM operation
140+
* Group functions map to PNM operation
141+
142+
* Example
143+
144+
* work-item execution maps to PIM
145+
* group function maps to PNM
146+
147+
* Conclusion
148+
149+
* Integrate support for PIM/PNM into SYCL
150+
151+
Q&A
152+
* Are the proposed functions specific to PIM, could also be used with other HW?
153+
154+
* Can also be used with other hardware.
155+
* Semantics not PIM-specific, but translation of C++ to SYCL
156+
* Can also map nicely to other types of hardware, e.g. vector processor
157+
158+
* Why have the user explicitly specify a block-size?
159+
160+
* Not a hardware detail
161+
* Rather a promise by the user that data-blocks
162+
will always be at least that big
163+
* Promise allows device compiler to perform optimizations,
164+
efficient looping inside PIM unit
165+
166+
* Could num_blocks runtime parameter be replaced by iterator?
167+
168+
* requires to be divisable by block-size
169+
* Yes, that is possible, mainly a design question
170+
* Current version might have additional implications regarding alignment
171+
172+
26173
2023-06-05
27174
==========
28175

29-
30176
* Ruyman Reyes
31177
* Rod Burns
32178
* Cohn, Robert S
1.57 MB
Binary file not shown.

0 commit comments

Comments
 (0)