Skip to content

Commit 7a54a9c

Browse files
committed
Pass after the meeting with J-B
1 parent 4568245 commit 7a54a9c

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

main.tex

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ \section{Introduction}
182182

183183
Problems are solved on NVIDIA GPUs using the interior-point solver \texttt{MadNLP.jl}~\cite{shin2021graph} and the sparse linear solver \texttt{CUDSS.jl}~\cite{Montoison_CUDSS_jl_Julia_interface}, enabling end-to-end acceleration from modeling to solving.
184184

185-
We demonstrate the performance of this approach on benchmark problems solved on GPUs such as the NVIDIA GH200.
185+
We demonstrate the performance of this approach on benchmark problems solved on GPUs such as the NVIDIA H100 and GH200.
186186

187187
%\textcolor{red}{We also examine generalizations to hybrid systems characterized by discrete-continuous interactions, Pontryagin-based shooting transcriptions, and infinite-horizon or functional programs modeled with \texttt{InfiniteOpt.jl}~\cite{pulsipher2022unifying}. $\rightarrow$ J-B}
188188

@@ -215,7 +215,7 @@ \section{Background and limitations}
215215
Optimal control problems (OCPs) aim to find control inputs for dynamical systems modeled by ODEs that optimize a given performance criterion.
216216
Direct transcription methods discretize these infinite-dimensional problems into large-scale nonlinear programs (NLPs).
217217
These NLPs exhibit a sparse structure arising from time discretization: each node introduces state and control variables linked by nonlinear equality constraints enforcing the system dynamics.
218-
Second-order methods, such as interior-point solvers, exploit this structure for efficient problem solution.
218+
Second-order methods, such as interior-point solvers, exploit this structure. % for efficient problem solution.
219219

220220
Most existing optimal control toolchains target CPU execution.
221221
For example, CasADi~\cite{Andersson2019} constructs symbolic expressions evaluated just-in-time or exported as C code, typically solved by CPU solvers like IPOPT~\cite{wachter2006implementation} or KNITRO~\cite{byrd2006k}, which rely on CPU linear solvers such as PARDISO~\cite{schenk2004solving}, MUMPS~\cite{amestoy2000mumps}, or HSL~\cite{fowkes2024libhsl}.
@@ -244,13 +244,14 @@ \section{Accelerated direct optimal control with GPU}
244244
can be approximated by
245245
$$ g(X_0, X_N) + \sum_{i=0}^{N-1} h_i f^0(X_i, U_i). $$
246246
Discretising boundary or path constraints such as
247-
$$ b(x(0),x(t_f)) \leq 0,\quad c(x(t), u(t)) \leq 0 $$
247+
$$ b\big(x(0),x(t_f)\big) \leq 0,\quad c\big(x(t), u(t)\big) \leq 0 $$
248248
is obviously done according to
249249
$$ b(X_0, X_N) \leq 0, \quad c(X_i, U_i) \leq 0,\quad i = 0, \dots, N-1. $$
250250
The resulting NLP in the vector $(X_0,\dots,X_N,U_0,\dots,U_{N-1})$
251251
so involves only a few functions (\emph{kernels}), namely $f, f^0$, $g$, $b$ and $c$, that need to be evaluated on many state or control points, $X_i$, $U_i$.
252252
This massive SIMD parallelism allows for a very efficient GPU solving. GPU acceleration thus facilitates real-time and large-scale optimal control computations critical to robotics and autonomous systems as in \cite{pacaud2024gpu}.
253-
Note that it is also important to exploit the inherent sparsity of the Jacobian of the NLP constraints, see \emph{e.g.} \cite{alexis-xxxx}.
253+
% Note that it is also important to exploit the inherent sparsity of the Jacobian of the NLP constraints, see \emph{e.g.} \cite{alexis-xxxx}.
254+
% J-B what do you have in mind with the previous sentence?
254255

255256
%Methods such as multiple shooting or collocation evaluate system dynamics and their derivatives independently across time segments.
256257
%This parallelism, combined with the sparse and structured pattern of derivative blocks, creates a SIMD-like computational workload ideally suited for GPUs.
@@ -291,7 +292,8 @@ \section{GPU programming in Julia}
291292

292293
For vendor-agnostic and portable GPU development, \texttt{KernelAbstractions.jl}~\cite{Churavy_KernelAbstractions_jl} allows writing GPU kernels in Julia that can target multiple backends such as CUDA (NVIDIA), ROCm (AMD), oneAPI (Intel), and Metal (Apple).
293294

294-
This ecosystem leverages the LLVM compiler infrastructure and vendor APIs to generate efficient GPU code without requiring users to write native CUDA code.
295+
This ecosystem leverages the LLVM compiler infrastructure and vendor APIs to generate efficient native GPU code directly from pure high-level Julia code.
296+
It allows users to exploit GPUs without requiring any knowledge of GPU programming.
295297
For instance, \texttt{ExaModels.jl} builds on \texttt{KernelAbstractions.jl} to automatically generate specialized GPU kernels for parallel evaluation of ODE residuals, Jacobians, and Hessians needed in optimal control problems.
296298

297299
We build on this ecosystem to create a complete GPU-accelerated toolchain spanning modeling, differentiation, and solving.

0 commit comments

Comments
 (0)