Skip to content

Commit 20a4f0e

Browse files
committed
Modifications of Jean-Baptiste
1 parent 075aed1 commit 20a4f0e

File tree

2 files changed

+45
-37
lines changed

2 files changed

+45
-37
lines changed

main_isc.bib

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,18 @@ @article{pulsipher2022unifying
1818
author = {Joshua L. Pulsipher and Weiqi Zhang and Tyler J. Hongisto and Victor M. Zavala},
1919
}
2020

21+
@article{Gondosiswanto2025advances,
22+
title = {Advances to modeling and solving infinite-dimensional optimization problems in InfiniteOpt.jl},
23+
journal = {Digital Chemical Engineering},
24+
volume = {15},
25+
pages = {100236},
26+
year = {2025},
27+
issn = {2772-5081},
28+
doi = {https://doi.org/10.1016/j.dche.2025.100236},
29+
url = {https://www.sciencedirect.com/science/article/pii/S2772508125000201},
30+
author = {Evelyn Gondosiswanto and Joshua L. Pulsipher},
31+
}
32+
2133
@article{shin2021graph,
2234
title={Graph-based modeling and decomposition of energy infrastructures},
2335
author={Shin, Sungho and Coffrin, Carleton and Sundar, Kaarthik and Zavala, Victor M},

main_isc.tex

Lines changed: 33 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -183,10 +183,10 @@ \section{Background and limitations}
183183
Other frameworks, such as ACADO~\cite{houska2011acado} and \texttt{InfiniteOpt.jl}~\cite{pulsipher2022unifying}, which cleverly leverage the modeling power of JuMP~\cite{dunning2017jump}, also follow the same CPU-centric paradigm.
184184
%
185185
This CPU focus limits scalability and real-time performance for large or time-critical problems that could benefit from GPU parallelism.
186-
While some libraries provide GPU-accelerated components, none deliver a fully integrated, GPU-native workflow for nonlinear optimal control.
187-
See, nonetheless, the nice attempt \cite{jeon2024} trying to combine the CasADi API with PyTorch so as to evaluate part of the generated code on GPU.
186+
While some libraries provide GPU-accelerated components, very few deliver a fully integrated, GPU-native workflow for nonlinear optimal control.
187+
See, nonetheless, the nice attempt \cite{jeon2024} trying to combine the CasADi API with PyTorch so as to evaluate part of the generated code on GPU; see as well the recent release of \texttt{InfiniteOpt.jl} \cite{Gondosiswanto2025advances} that follows a line similar to ours, leveraging \texttt{ExaModels.jl} SIMD modeling abilities for GPU.
188188
%
189-
Our work fills this gap with a GPU-first toolchain that unifies modeling, differentiation, and solver execution, addressing the challenges of solving large-scale sparse NLPs.
189+
Our work contributes to fill this gap with a GPU-first toolchain that unifies modeling, differentiation, and solver execution, addressing the challenges of solving large-scale sparse NLPs. In contrast to other available solvers, our framework encompasses both direct transcription for optimal control problems (that is discretization into a math program, suited for optimization solvers and GPU evaluation as discussed in the rest of the paper), and indirect methods (\emph{aka.} shooting; we refer the reader to \cite{OC_jl} for further details).
190190

191191
\section{SIMD parallelism in direct optimal control} \label{s3}
192192
When discretized by \emph{direct transcription}, optimal control problems (OCPs) possess an inherent structure that naturally supports SIMD parallelism.
@@ -285,7 +285,7 @@ \section{From optimal control models to SIMD abstraction}
285285
$$ x(0) = (-1, 0),\quad x(1) = (0,0). $$
286286
The strength of the DSL of the package \texttt{OptimalControl.jl} is to offer a syntax as close as possible to this mathematical formulation.\footnote{Note that one can actually use unicode characters to denote derivatives, integral, \emph{etc.}, making this closeness even more striking. Check \texttt{OptimalControl.jl} documentation online.} The translation of this optimal control problem so reads:
287287

288-
\begin{minted}{julia}
288+
{\footnotesize \begin{minted}{julia}
289289
ocp = @def begin
290290
t in [0, 1], time
291291
x in R^2, state
@@ -297,8 +297,9 @@ \section{From optimal control models to SIMD abstraction}
297297
integral( 0.5u(t)^2 ) => min
298298
end
299299
\end{minted}
300+
}
300301

301-
The intial and final times are fixed in this case but they could be additional unknowns (see, Appendix \ref{sa1}, where the Goddard benchmark problem is modeled with a free final time. Users can also declare additional finite-dimensional parameters (or \emph{variables}) to be optimized. Furthermore, extra constraints on the state, control, or other quantities can be imposed as needed.
302+
The initial and final times are fixed in this case but they could be additional unknowns (see, Appendix \ref{sa1}, where the Goddard benchmark problem is modeled with a free final time. Users can also declare additional finite-dimensional parameters (or \emph{variables}) to be optimized. Furthermore, extra constraints on the state, control, or other quantities can be imposed as needed.
302303
At this stage the crux is to seamlessly parse the abstract problem description and compile it on the fly into a discretized nonlinear optimization problem.
303304
We achieve this by exploiting two features.
304305
First, the DSL syntax is fully compatible with standard Julia, allowing us to use the language's built-in lexical and syntactic parsers.
@@ -320,20 +321,17 @@ \section{From optimal control models to SIMD abstraction}
320321
Let us take a brief look at the generated code for this simple example.
321322
The code is wrapped in a function whose parameters capture the key aspects of the transcription process: the numerical scheme (here trapezoidal), the grid size (here uniform), the backend (CPU or GPU), the initial values for variables, states, and controls (defaulting to nonzero constants across the grid), and the base precision for vectors (defaulting to 64-bit floating point):
322323

323-
{\small
324-
\begin{minted}{julia}
324+
{\footnotesize \begin{minted}{julia}
325325
function def(; scheme=:trapeze,
326326
grid_size=250, init=(0.1, 0.1, 0.1),
327327
backend=CPU(), base_type=Float64)
328328
\end{minted}
329329
}
330330

331331
\noindent The state declaration is compiled into an \texttt{ExaModels.jl} variable representing a $2 \times (N + 1)$ array, where $N$ is the grid size. Lower and upper bounds, plus initial values can be specified, and constraints are vectorized across grid points. Internally, the DSL uses metaprogramming to generate unique variable names and ensure proper initialization, while any syntactic or semantic errors are caught and reported at runtime.
332-
% The state declaration is compiled into an \texttt{ExaModels.jl} variable declaration, here a $2 \times (N + 1)$ array where $N$ is the grid size. Note that lower and upper bounds can be provided at this step (with a first \verb+for+ statement to vectorize the constraint over all grid points), as well as initial values for the optimizer and base type for vector elements (here fp64).
333-
% The whole declaration block is wrapped in a \verb+try ... catch+ statement so that syntactic (or semantic) errors can be returned to the user at runtime:
332+
The state declaration is compiled into an \texttt{ExaModels.jl} variable declaration, here a $2 \times (N + 1)$ array where $N$ is the grid size. Note that lower and upper bounds can be provided at this step (with a first \verb+for+ statement to vectorize the constraint over all grid points), as well as initial values for the optimizer and base type for vector elements (here fp64). The whole declaration block is wrapped in a \verb+try ... catch+ statement so that syntactic (or semantic) errors can be returned to the user at runtime:
334333

335-
{\small
336-
\begin{minted}{julia}
334+
{\footnotesize \begin{minted}{julia}
337335
x = begin
338336
local ex
339337
try
@@ -346,10 +344,10 @@ \section{From optimal control models to SIMD abstraction}
346344
for (var"i##275", var"j##276") =
347345
Base.product(1:2, 0:grid_size)],
348346
start = init[2])
349-
catch ex
350-
println("Line ", 2, ": ",
351-
"(x in R^2, state)")
352-
throw(ex)
347+
catch ex
348+
println("Line ", 2, ": ",
349+
"(x in R^2, state)")
350+
throw(ex)
353351
end
354352
end
355353
\end{minted}
@@ -371,10 +369,9 @@ \section{From optimal control models to SIMD abstraction}
371369
% \end{minted}
372370
% }
373371

374-
\noindent The initial-state boundary constraint must be applied across all state dimensions. This is achieved using the \verb+for+ generator.
375-
A runtime dimension check ensures that the specified bounds match the length of the state vector being addressed:
376-
{\small
377-
\begin{minted}{julia}
372+
\noindent The initial-state boundary constraint must be applied across all state dimensions. This is achieved using the \verb+for+ generator. A runtime dimension check ensures that the specified bounds match the length of the state vector being addressed:
373+
374+
{\footnotesize \begin{minted}{julia}
378375
length([-1, 0]) ==
379376
length([-1, 0]) == length(1:2)
380377
|| throw("wrong bound dimension")
@@ -387,8 +384,7 @@ \section{From optimal control models to SIMD abstraction}
387384

388385
\noindent The first equation of the ODE system is discretized using the trapezoidal scheme, and the corresponding expression (here the right hand side is just $x_2(t)$) is declared thanks to the \verb+for+ generator tailored for SIMD abstraction:
389386

390-
{\small
391-
\begin{minted}{julia}
387+
{\footnotesize \begin{minted}{julia}
392388
constraint(var"p_ocp##266j",
393389
((x[1, var"j##291" + 1] -
394390
x[1, var"j##291"]) - (var"dt##268"
@@ -400,8 +396,7 @@ \section{From optimal control models to SIMD abstraction}
400396

401397
\noindent The same goes on for the second dimension of the ODE, and for the Lagrange integral cost as well, where the same numerical scheme (trapezoidal rule again) is employed for consistency (defining two objectives actually computes their sum):
402398

403-
{\small
404-
\begin{minted}{julia}
399+
{\footnotesize \begin{minted}{julia}
405400
objective(var"p_ocp##266",
406401
((1 * var"dt##268" * (0.5 *
407402
var"u##277"[1, var"j##299"]^2))/2
@@ -523,10 +518,9 @@ \section{Supplementary material}
523518
\subsection{Descriptions of the control problems used for the benchmark}
524519
\label{sa1}
525520
The complete code to reproduce the runs of Section~\ref{s6} can be retrieved at the following address:
526-
\href{https://anonymous.4open.science/r/OC-GPU/}{\texttt{https://anonymous.4open.science/r/OC-GPU/}}
521+
\href{https://anonymous.4open.science/r/OC-GPU/}{\texttt{anonymous.4open.science/r/OC-GPU}}
527522

528-
{\small
529-
\begin{minted}{julia}
523+
{\footnotesize \begin{minted}{julia}
530524
# Goddard problem
531525

532526
r0 = 1.0
@@ -565,8 +559,7 @@ \subsection{Descriptions of the control problems used for the benchmark}
565559
\end{minted}
566560
}
567561

568-
{\small
569-
\begin{minted}{julia}
562+
{\footnotesize \begin{minted}{julia}
570563
# Quadrotor problem
571564

572565
T = 1
@@ -620,13 +613,16 @@ \subsection{Descriptions of the control problems used for the benchmark}
620613
}
621614

622615
\subsection{GPU detailed configurations and results} \label{sa2}
623-
All runs performed with \texttt{OptimalControl.jl v1.1.1},
624-
\texttt{MadNLPMumps.jl v0.5.1} and
625-
\texttt{MadNLPGPU.jl v0.7.7}.\\
616+
All runs performed with
617+
\begin{itemize}
618+
\item \texttt{OptimalControl.jl v1.1.1},
619+
\item \texttt{MadNLPMumps.jl v0.5.1},
620+
\item \texttt{MadNLPGPU.jl v0.7.7}.
621+
\end{itemize}
626622

627623
\noindent \textbf{Configuration for the A100 runs}
628624

629-
{\small \begin{verbatim}
625+
{\footnotesize \begin{verbatim}
630626
julia> CUDA.versioninfo()
631627
CUDA runtime 12.9, artifact installation
632628
CUDA driver 12.9
@@ -657,7 +653,7 @@ \subsection{GPU detailed configurations and results} \label{sa2}
657653

658654
\noindent \textbf{Configuration for the H100 runs}
659655

660-
{\small \begin{verbatim}
656+
{\footnotesize \begin{verbatim}
661657
julia> CUDA.versioninfo()
662658
CUDA toolchain:
663659
- runtime 12.9, artifact installation
@@ -692,7 +688,7 @@ \subsection{GPU detailed configurations and results} \label{sa2}
692688
\end{verbatim}}
693689

694690
% Table 1: Goddard (A100)
695-
\begin{table}[htbp]
691+
\begin{table}[h!tbp]
696692
\centering
697693
\caption{Goddard problem, A100 run}
698694
\begin{tabular}{@{}rrr@{}}
@@ -717,7 +713,7 @@ \subsection{GPU detailed configurations and results} \label{sa2}
717713
\end{table}
718714

719715
% Table 2: Quadrotor (A100)
720-
\begin{table}[htbp]
716+
\begin{table}[h!tbp]
721717
\centering
722718
\caption{Quadrotor problem, A100 run}
723719
\begin{tabular}{@{}rrr@{}}
@@ -742,7 +738,7 @@ \subsection{GPU detailed configurations and results} \label{sa2}
742738
\end{table}
743739

744740
% Table 3: Goddard (H100)
745-
\begin{table}[htbp]
741+
\begin{table}[h!tbp]
746742
\centering
747743
\caption{Goddard problem, H100 run}
748744
\begin{tabular}{@{}rrr@{}}
@@ -769,7 +765,7 @@ \subsection{GPU detailed configurations and results} \label{sa2}
769765
\end{table}
770766

771767
% Table 4: Quadrotor (H100)
772-
\begin{table}[htbp]
768+
\begin{table}[h!tbp]
773769
\centering
774770
\caption{Quadrotor problem, H100 run}
775771
\begin{tabular}{@{}rrr@{}}

0 commit comments

Comments
 (0)