55| [ ![ ] ( https://img.shields.io/badge/Developed_by-PSOR_Lab-342674 )] ( https://psor.uconn.edu/ ) | [ ![ Build Status] ( https://github.com/PSORLab/SourceCodeMcCormick.jl/workflows/CI/badge.svg?branch=master )] ( https://github.com/PSORLab/SourceCodeMcCormick.jl/actions?query=workflow%3ACI ) [ ![ codecov] ( https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl/branch/master/graph/badge.svg )] ( https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl ) |
66
77
8+ This package uses source-code generation to create CUDA kernels that return evaluations of McCormick-based
9+ relaxations. Expressions composed of ` Symbolics.jl ` -type variables can be passed into the main ` SourceCodeMcCormick.jl `
10+ (` SCMC ` ) function, ` kgen ` , after which the expressions are factored, algebraic rearrangements and other
11+ modifications are applied as necessary, and a new, custom function is written that calculates inclusion
12+ monotonic interval extensions, convex and concave relaxations, and subgradients of the convex and concave
13+ relaxations of the original expression. These new custom functions are meant to be used in, e.g., a branch-and-bound
14+ routine where the optimizer would like to calculate relaxations for many nodes simultaneously using GPU
15+ resources. They are designed to be used with ` CUDA.CuArray{Float64} ` objects, with 64-bit (double-precision)
16+ numbers recommended to maintain accuracy.
17+
18+
19+ ## Basic Functionality
20+
21+ The primary user-facing function is ` kgen ` ("kernel generator"). The ` kgen ` function returns a source-code-generated
22+ function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
23+ convex relaxation subgradients, and concave relaxation subgradients, for an input symbolic expression. Inputs
24+ to the newly generated function are expected to be an output placeholder, followed by a separate ` CuArray ` for
25+ each input variable, sorted in alphabetical order. The "input" variables must be ` (m,3) ` -dimensional ` CuArray ` s,
26+ with the columns corresponding to points where evaluations are requested, lower bounds, and upper bounds,
27+ respectively. The "output" placeholder must be ` (m,4+2n) ` -dimensional, where ` n ` is the dimensionality of the
28+ expression. The columns will hold, respectively, the convex and concave relaxations, lower and upper bounds of
29+ an interval extension, a subgradient of the convex relaxation, and a subgradient of the concave relaxation.
30+ Each row ` m ` in the inputs and output are independent of one another, allowing calculations of relaxation information
31+ at multiple points on (potentially) different domains. A demonstration of how to use ` kgen ` is shown here,
32+ with the output compared to the multiple-dispatch-based ` McCormick.jl ` :
33+
34+ ``` julia
35+ using SourceCodeMcCormick, Symbolics, CUDA
36+ Symbolics. @variables x, y
37+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
38+ new_func = kgen (expr)
39+ x_in = CuArray ([1.0 0.5 3.0 ;])
40+ y_in = CuArray ([0.7 0.1 2.0 ;])
41+ OUT = CUDA. zeros (Float64, 1 , 8 )
42+
43+ using McCormick
44+ xMC = MC {2,NS} (1.0 , Interval (0.5 , 3.0 ), 1 )
45+ yMC = MC {2,NS} (0.7 , Interval (0.1 , 2.0 ), 2 )
46+
47+
48+ julia> new_func (OUT, x_in, y_in) # SourceCodeMcCormick.jl
49+ CUDA. HostKernel for f_3zSdMo7LFdg_1 (CuDeviceMatrix{Float64, 1 }, CuDeviceMatrix{Float64, 1 }, CuDeviceMatrix{Float64, 1 })
50+
51+ julia> Array (OUT)
52+ 1 × 8 Matrix{Float64}:
53+ 0.228368 2.96348e12 - 9.62507 1.06865e13 - 2.32491 - 3.62947 3.59209e12 - 8.98023e11
54+
55+ julia> exp (xMC/ yMC) - (xMC* yMC^ 2 )/ (yMC+ 1 ) # McCormick.jl
56+ MC {2, NS} (0.22836802303235793 , 2.963476144457207e12 , [- 9.62507 , 1.06865e+13 ], [- 2.3249068975747305 , - 3.6294726270892967 ], [3.592092296310309e12 , - 8.980230740778097e11 ], false )
57+ ```
58+
59+ In this example, the symbolic expression ` exp(x/y) - (x*y^2)/(y+1) ` is passed into ` kgen ` , which returns the
60+ function ` new_func ` . Passing inputs representing the values and domain where calculations are desired for ` x `
61+ and ` y ` (i.e., evaluating ` x ` at ` 1.0 ` in the domain ` [0.5, 3.0] ` and ` y ` at ` 0.7 ` in the domain ` [0.1, 2.0] ` )
62+ into ` new_func ` returns pointwise evaluations of {cv, cc, lo, hi, [ cvgrad] ..., [ ccgrad] ...} for the original
63+ expression of ` exp(x/y) - (x*y^2)/(y+1) ` on the specified domain of ` x ` and ` y ` .
64+
65+ Evaluating large numbers of points simultaneously using a GPU allows for faster relaxation calculations than
66+ evaluating points individually using a CPU. This is demonstrated in the following example:
67+
68+ ``` julia
69+ using SourceCodeMcCormick, Symbolics, CUDA, McCormick, BenchmarkTools
70+ Symbolics. @variables x, y
71+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
72+ new_func = kgen (expr)
73+
74+ # CuArray values for SCMC timing
75+ x_in = hcat (2.5 .* CUDA. rand (Float64, 8192 ) .+ 0.5 ,
76+ 0.5 .* CUDA. ones (Float64, 8192 ),
77+ 3.0 .* CUDA. ones (Float64, 8192 ))
78+ y_in = hcat (1.9 .* CUDA. rand (Float64, 8192 ) .+ 0.1 ,
79+ 0.1 .* CUDA. ones (Float64, 8192 ),
80+ 2.0 .* CUDA. ones (Float64, 8192 ))
81+ OUT = CUDA. zeros (Float64, 8192 , 8 )
82+
83+ # McCormick objects for McCormick.jl timing
84+ xMC = MC {2,NS} (1.0 , Interval (0.5 , 3.0 ), 1 )
85+ yMC = MC {2,NS} (0.7 , Interval (0.1 , 2.0 ), 2 )
86+
87+ # #######################################################
88+ # #### Timing
89+ # CPU: Intel i7-9850H
90+ # GPU: NVIDIA Quadro T2000
91+
92+
93+ # #### McCormick.jl (single evaluation on CPU)
94+ @btime exp ($ xMC/ $ yMC) - ($ xMC* $ yMC^ 2 )/ ($ yMC+ 1 );
95+ # 236.941 ns (0 allocations: 0 bytes)
96+
97+
98+ # #### SourceCodeMcCormick.jl (8192 simultaneous evaluations on GPU)
99+ @btime new_func ($ OUT, $ x_in, $ y_in);
100+ # 75.700 μs (18 allocations: 384 bytes)
101+
102+ # Average time per evaluation = 75.700 μs / 8192 = 9.241 ns
103+
104+
105+ # Time to move results back to CPU (if needed):
106+ @btime Array ($ OUT);
107+ # 107.200 μs (8 allocations: 512.16 KiB)
108+
109+ # Time including evaluation:
110+ # 75.700 μs + 107.200 μs = 182.900 μs
111+ # Average time per evaluation = 182.900 μs / 8192 = 22.327 ns
112+ ```
113+
114+ As shown by the two previous examples, functions generated by ` kgen ` can be significantly faster than
115+ the same calculations done by ` McCormick.jl ` , and provide the same information for a given input
116+ expression. There is a considerable time penalty for bringing the results from device memory (GPU) back
117+ to host memory (CPU) due to the volume of information generated. If possible, an algorithm designed
118+ to make use of ` SCMC ` -generated functions should utilize calculated relaxation information within the
119+ GPU rather than transfer the information between hardware components.
120+
121+
122+ ## Arguments for ` kgen `
123+
124+ The ` kgen ` function can be called a ` Num ` -type expression as an input (as in the examples in the
125+ previous section), or with several possible arguments that affect ` kgen ` 's behavior. Some of these
126+ functionalities and their use cases are shown in this section.
127+
128+ ### 1) Overwrite
129+
130+ Functions and kernels created by ` kgen ` are human-readable and are [ currently] saved in the
131+ "src\kernel_writer\storage" folder with a title created by hashing the input expression and
132+ list of relevant variables. By saving the files in this way, calling ` kgen ` multiple times
133+ with the same expression in the same Julia session can skip the steps of re-writing and
134+ re-compilling the same expression. If the same expression is used across Julia sessions,
135+ ` kgen ` can skip the re-writing step and instead compile the existing function and kernels.
136+
137+ In some cases, this behavior may not be desired. For example, if there is an error or interrupt
138+ during the initial writing of the kernel, or if the source code of ` kgen ` is being adjusted
139+ to modify how kernels are written, it is preferable to re-write the kernel from scratch
140+ rather than rely on the existing generated code. For these cases, the setting ` overwrite=true `
141+ may be used to tell ` kgen ` to re-write and re-compile the generated code. For example:
142+
143+ ``` julia
144+ using SourceCodeMcCormick, Symbolics, CUDA
145+ Symbolics. @variables x, y
146+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
147+
148+ # First call with an entirely new expression
149+ @time new_func = kgen (expr);
150+ # 3.958299 seconds (3.71 M allocations: 162.940 MiB, 13.60% compilation time)
151+
152+ # Second call in the same Julia session
153+ @time new_func = kgen (expr);
154+ # 0.000310 seconds (252 allocations: 10.289 KiB)
155+
156+ # Third call, with a hint for `kgen` to re-write and re-compile
157+ @time new_func = kgen (expr, overwrite= true );
158+ # 3.933316 seconds (3.70 M allocations: 162.619 MiB, 13.10% compilation time)
159+ ```
160+
161+
162+ ### 2) Problem Variables
163+
164+ When using ` SCMC ` to calculate relaxations for expressions that are components of a larger optimization
165+ problem (such as individual constraints), it is necessary to provide ` kgen ` information about all of the
166+ problem variables. For example, if the objective function in a problem is ` x + y + z ` , and one constraint
167+ is ` y^2 - z <= 0 ` , the subgradients of the relaxations for the constraint should be 3-dimensional despite
168+ the constraint only depending on ` y ` and ` z ` . Further, the dimensions must match across expressions such
169+ that the first subgradient dimension always corresponds to ` x ` , the second always corresponds to ` y ` , and
170+ so on. This is critically important for the primary use case of ` SCMC ` and is built in to work without
171+ the need to use a keyword argument. For example:
172+
173+ ``` julia
174+ using SourceCodeMcCormick, Symbolics, CUDA
175+ Symbolics. @variables x, y, z
176+ obj = x + y + z
177+ constr1 = y^ 2 - z
178+
179+ # A valid way to call `kgen`, which is incorrect in this case because the problem
180+ # variables are {x, y, z}, but `constr1` only depends on {y, z}.
181+ constr1_func_A = kgen (constr1)
182+
183+ # The more correct way to call `kgen` in this case, to inform it that the problem
184+ # variables are {x, y, z}.
185+ constr1_func_B = kgen (constr1, [x, y, z])
186+ ```
187+
188+ In this example, ` constr1_func_A ` assumes that the only variables are ` y ` and ` z ` , and it therefore expects
189+ the ` OUT ` storage to be of size ` (m,8) ` (i.e., each subgradient is 2-dimensional). Because ` kgen ` is being
190+ used for one constraint as part of an optimization problem, it is more correct to use ` constr1_func_B ` ,
191+ which expects ` OUT ` to be of size ` (m,10) ` (i.e., each subgradient is 3-dimensional). Note that this
192+ is directly analogous to the typical use of ` McCormick.jl ` , where ` MC ` objects are constructed with the
193+ larger problem in mind. E.g.:
194+
195+ ``` julia
196+ using McCormick
197+
198+ xMC = MC {3,NS} (1.0 , Interval (0.0 , 2.0 ), 1 )
199+ yMC = MC {3,NS} (1.5 , Interval (1.0 , 3.0 ), 2 )
200+ zMC = MC {3,NS} (1.8 , Interval (1.0 , 4.0 ), 3 )
201+
202+ obj = xMC + yMC + zMC
203+ constr1 = yMC^ 2 - zMC
204+ ```
205+
206+ Here, for the user to calculate ` constr1 ` , they must have already created the ` MC ` objects ` yMC ` and ` zMC ` .
207+ In the construction of these objects, it was specified that the subgradient would be 3-dimensional (the
208+ ` 3 ` in ` MC{3,NS} ` ), and ` yMC ` and ` zMC ` were selected as the second and third dimensions of the subgradient
209+ by the ` 2 ` and ` 3 ` that follow the call to ` Interval() ` . ` SCMC ` needs the same information; only the
210+ format is different from ` McCormick.jl ` .
211+
212+ Note that if a list of problem variables is not provided, ` SCMC ` will collect the variables in the
213+ expression it's provided and sort them alphabetically, and then numerically. I.e., ` x1 < x2 < x10 < y1 ` .
214+ If a variable list is provided, ` SCMC ` will respect the order of variables in the list without sorting
215+ them. For example, if ` constr1_func_B ` were created by calling ` kgen(constr1, [z, y, x]) ` , then
216+ ` constr1_func_B ` would expect its input arguments to correspond to ` (OUTPUT, z, y, x) ` instead of
217+ the alphabetical ` (OUTPUT, x, y, z) ` .
218+
219+
220+ ### 3) Splitting
221+
222+ Internally, ` kgen ` may split the input expression into multiple subexpressions if the generated kernels
223+ would be too long. In general, longer kernels are faster but require longer compilation time. The default
224+ settings attempt to provide a balance of good performance and acceptably low compilation times, but
225+ situations may arise where a different balance is preferred. The amount of splitting can be modified by
226+ setting the keyword argument ` splitting ` to one of ` {:low, :default, :high, :max} ` , where higher values
227+ indicate the (internal) creation of more, shorter kernels (faster compilation, slower performance), and lower
228+ values indicate the (internal) creation of fewer, longer kernels (longer compilation, faster performance).
229+ Actual compilation time and performance, as well as the impact of different ` splitting ` settings, is
230+ expression-dependent. In most cases, however, the default value should be sufficient.
231+
232+ Here's an example showing how different values affect compilation time and performance:
233+ ``` julia
234+ using SourceCodeMcCormick, Symbolics, CUDA
235+ Symbolics. @variables x, y
236+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
237+
238+ # CuArray values for SCMC timing
239+ x_in = hcat (2.5 .* CUDA. rand (Float64, 8192 ) .+ 0.5 ,
240+ 0.5 .* CUDA. ones (Float64, 8192 ),
241+ 3.0 .* CUDA. ones (Float64, 8192 ))
242+ y_in = hcat (1.9 .* CUDA. rand (Float64, 8192 ) .+ 0.1 ,
243+ 0.1 .* CUDA. ones (Float64, 8192 ),
244+ 2.0 .* CUDA. ones (Float64, 8192 ))
245+ OUT = CUDA. zeros (Float64, 8192 , 8 )
246+
247+ # #################################
248+ # Default splitting (creates 1 kernel internally, based on the size of this expr)
249+ @time new_func = kgen (expr, overwrite= true );
250+ # 4.100798 seconds (3.70 M allocations: 162.637 MiB, 1.47% gc time, 12.29% compilation time)
251+
252+ @btime new_func ($ OUT, $ x_in, $ y_in);
253+ # 77.000 μs (18 allocations: 384 bytes)
254+
255+ # --> This provides a good balance of compilation time and fast performance
256+
257+ # #################################
258+ # Higher splitting (creates 2 kernels internally)
259+ @time new_func = kgen (expr, splitting= :high , overwrite= true );
260+ # 3.452594 seconds (3.87 M allocations: 170.845 MiB, 13.87% compilation time)
261+
262+ @btime new_func ($ OUT, $ x_in, $ y_in);
263+ # 117.000 μs (44 allocations: 960 bytes)
264+
265+ # --> Splitting the function into 2 shorter kernels yields a marginally faster
266+ # compilation time, but performance is marginally worse due to needing twice
267+ # as many calls to kernels
268+
269+ # #################################
270+ # Maximum splitting (creates 4 kernels internally)
271+ @time new_func = kgen (expr, splitting= :max , overwrite= true );
272+ # 5.903894 seconds (5.88 M allocations: 260.864 MiB, 12.91% compilation time)
273+
274+ @btime new_func ($ OUT, $ x_in, $ y_in);
275+ # 184.900 μs (83 allocations: 1.91 KiB)
276+
277+ # --> Given the simplicity of the expression, splitting into 4 kernels does not
278+ # provide benefit over the 2 kernels created with :high. Compilation time
279+ # is longer because twice as many kernels need to be compiled (and each is
280+ # not much less complicated than in the :high case), and performance is worse
281+ # (because the number of kernel calls is doubled relative to the :high case
282+ # without much decrease in complexity).
283+ ```
284+
285+ ## The ParBB Algorithm
286+
287+ The intended use of ` SCMC ` is in conjunction with a branch-and-bound algorithm that is able
288+ to make use of relaxations and subgradient information that are calculated in parallel on a
289+ GPU. An implementation of such an algorithm is available for ` SCMC ` version 0.4, and is
290+ under development for version 0.5. See the paper reference in the section "Citing SourceCodeMcCormick" for
291+ a more complete description of the ParBB algorithm for version 0.4. Briefly, ParBB is built
292+ as an extension of the EAGO solver, and it works by parallelizing the node procesing routines
293+ in such a way that tasks may be performed by a GPU.
294+
295+
296+ ## Citing SourceCodeMcCormick
297+
298+ Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
299+ ```
300+ Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
301+ global optimization with parallel architectures. Optimization Methods and Software, 1–39 (2024).
302+ DOI: 10.1080/10556788.2024.2396297
303+ ```
304+ A BibTeX entry is given below:
305+ ``` bibtex
306+ @Article{doi:10.1080/10556788.2024.2396297,
307+ author = {Robert X. Gottlieb, Pengfei Xu, and Matthew D. Stuber},
308+ journal = {Optimization Methods and Software},
309+ title = {Automatic source code generation for deterministic global optimization with parallel architectures},
310+ year = {2024},
311+ pages = {1--39},
312+ doi = {10.1080/10556788.2024.2396297},
313+ eprint = {https://doi.org/10.1080/10556788.2024.2396297},
314+ publisher = {Taylor \& Francis},
315+ url = {https://doi.org/10.1080/10556788.2024.2396297},
316+ }
317+ ```
318+
319+
320+ # README for v0.4 (Deprecated)
321+
8322This package uses source-code transformation to construct McCormick-based relaxations. Expressions composed
9323of ` Symbolics.jl ` -type variables can be passed into ` SourceCodeMcCormick.jl ` (` SCMC ` ) functions, after which
10324the expressions are factored, generalized McCormick relaxation rules and inclusion monotonic interval
@@ -18,7 +332,7 @@ numbers are recommended
18332for relaxations to maintain accuracy.
19333
20334
21- ## Basic Functionality
335+ ## Basic Functionality (v0.4)
22336
23337The primary user-facing function is ` fgen ` ("function generator"). The ` fgen ` function returns a source-code-generated
24338function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
@@ -143,7 +457,7 @@ SourceCodeMcCormick functionality that creates these functions which will even f
143457calculation speed.
144458
145459
146- ## Arguments for ` fgen `
460+ ## Arguments for ` fgen ` (v0.4)
147461
148462The ` fgen ` function can be called with only a ` Num ` -type expression as an input (as in the examples in the
149463previous section), or with many other possible arguments that affect ` fgen ` 's behavior. Some of these
@@ -339,7 +653,7 @@ these outputs does not impact the speed of the generated functions, since the ir
339653aren't used in any calculations.
340654
341655
342- ## The ParBB Algorithm
656+ ## The ParBB Algorithm (v0.4)
343657
344658The intended use of SourceCodeMcCormick is in conjunction with a branch-and-bound algorithm
345659that is able to make use of relaxations and subgradient information that are calculated in
@@ -705,7 +1019,7 @@ that ParBB is able to make use of the faster computing hardware of GPUs, and eve
7051019processed, can converge more quickly overall.
7061020
7071021
708- ## Limitations
1022+ ## Limitations (v0.4)
7091023
7101024(See "Citing SourceCodeMcCormick" for limitations on a pre-subgradient version of SourceCodeMcCormick.)
7111025
@@ -725,7 +1039,7 @@ more frequently with respect to the bounds on its domain may perform worse than
7251039where the structure of the relaxation is more consistent.
7261040
7271041
728- ## Citing SourceCodeMcCormick
1042+ ## Citing SourceCodeMcCormick (v0.3)
7291043Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
7301044```
7311045Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
0 commit comments