You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: main.tex
+7-5Lines changed: 7 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -182,7 +182,7 @@ \section{Introduction}
182
182
183
183
Problems are solved on NVIDIA GPUs using the interior-point solver \texttt{MadNLP.jl}~\cite{shin2021graph} and the sparse linear solver \texttt{CUDSS.jl}~\cite{Montoison_CUDSS_jl_Julia_interface}, enabling end-to-end acceleration from modeling to solving.
184
184
185
-
We demonstrate the performance of this approach on benchmark problems solved on GPUs such as the NVIDIA GH200.
185
+
We demonstrate the performance of this approach on benchmark problems solved on GPUs such as the NVIDIA H100 and GH200.
186
186
187
187
%\textcolor{red}{We also examine generalizations to hybrid systems characterized by discrete-continuous interactions, Pontryagin-based shooting transcriptions, and infinite-horizon or functional programs modeled with \texttt{InfiniteOpt.jl}~\cite{pulsipher2022unifying}. $\rightarrow$ J-B}
188
188
@@ -215,7 +215,7 @@ \section{Background and limitations}
215
215
Optimal control problems (OCPs) aim to find control inputs for dynamical systems modeled by ODEs that optimize a given performance criterion.
216
216
Direct transcription methods discretize these infinite-dimensional problems into large-scale nonlinear programs (NLPs).
217
217
These NLPs exhibit a sparse structure arising from time discretization: each node introduces state and control variables linked by nonlinear equality constraints enforcing the system dynamics.
218
-
Second-order methods, such as interior-point solvers, exploit this structure for efficient problem solution.
218
+
Second-order methods, such as interior-point solvers, exploit this structure. % for efficient problem solution.
219
219
220
220
Most existing optimal control toolchains target CPU execution.
221
221
For example, CasADi~\cite{Andersson2019} constructs symbolic expressions evaluated just-in-time or exported as C code, typically solved by CPU solvers like IPOPT~\cite{wachter2006implementation} or KNITRO~\cite{byrd2006k}, which rely on CPU linear solvers such as PARDISO~\cite{schenk2004solving}, MUMPS~\cite{amestoy2000mumps}, or HSL~\cite{fowkes2024libhsl}.
@@ -244,13 +244,14 @@ \section{Accelerated direct optimal control with GPU}
The resulting NLP in the vector $(X_0,\dots,X_N,U_0,\dots,U_{N-1})$
251
251
so involves only a few functions (\emph{kernels}), namely $f, f^0$, $g$, $b$ and $c$, that need to be evaluated on many state or control points, $X_i$, $U_i$.
252
252
This massive SIMD parallelism allows for a very efficient GPU solving. GPU acceleration thus facilitates real-time and large-scale optimal control computations critical to robotics and autonomous systems as in \cite{pacaud2024gpu}.
253
-
Note that it is also important to exploit the inherent sparsity of the Jacobian of the NLP constraints, see \emph{e.g.} \cite{alexis-xxxx}.
253
+
% Note that it is also important to exploit the inherent sparsity of the Jacobian of the NLP constraints, see \emph{e.g.} \cite{alexis-xxxx}.
254
+
% J-B what do you have in mind with the previous sentence?
254
255
255
256
%Methods such as multiple shooting or collocation evaluate system dynamics and their derivatives independently across time segments.
256
257
%This parallelism, combined with the sparse and structured pattern of derivative blocks, creates a SIMD-like computational workload ideally suited for GPUs.
@@ -291,7 +292,8 @@ \section{GPU programming in Julia}
291
292
292
293
For vendor-agnostic and portable GPU development, \texttt{KernelAbstractions.jl}~\cite{Churavy_KernelAbstractions_jl} allows writing GPU kernels in Julia that can target multiple backends such as CUDA (NVIDIA), ROCm (AMD), oneAPI (Intel), and Metal (Apple).
293
294
294
-
This ecosystem leverages the LLVM compiler infrastructure and vendor APIs to generate efficient GPU code without requiring users to write native CUDA code.
295
+
This ecosystem leverages the LLVM compiler infrastructure and vendor APIs to generate efficient native GPU code directly from pure high-level Julia code.
296
+
It allows users to exploit GPUs without requiring any knowledge of GPU programming.
295
297
For instance, \texttt{ExaModels.jl} builds on \texttt{KernelAbstractions.jl} to automatically generate specialized GPU kernels for parallel evaluation of ODE residuals, Jacobians, and Hessians needed in optimal control problems.
296
298
297
299
We build on this ecosystem to create a complete GPU-accelerated toolchain spanning modeling, differentiation, and solving.
0 commit comments