Add exam from last year

blegat · blegat · commit c8195719c436 · 2025-12-19T12:09:53.000+01:00
diff --git a/.github/workflows/ExportPluto.yaml b/.github/workflows/ExportPluto.yaml
@@ -24,6 +24,12 @@ jobs:
             - name: Checkout this repository
               uses: actions/checkout@v4
 
+            - name: Compile exam 2025_01
+              uses: xu-cheng/latex-action@v4
+              with:
+                working_directory: Exams/2025_01/
+                root_file: main.tex
+
             - name: Install Julia
               uses: julia-actions/setup-julia@v2
               with:
diff --git a/Exams/.gitignore b/Exams/.gitignore
@@ -0,0 +1,8 @@
+*.synctex.gz
+*.aux
+*.fdb_latexmk
+*.fls
+*.log
+*.out
+#*.pdf # We shouldn't ignore it otherwise `github-pages-deploy-action` will not add the produced pdf and won't push it to `gh-pages`
+*.synctex.gz
diff --git a/Exams/2025_01/main.tex b/Exams/2025_01/main.tex
@@ -0,0 +1,18 @@
+%\documentclass[addpoints,12pt,a4paper]{exam}
+\documentclass[addpoints,answers,12pt,a4paper]{exam}   % print solutions
+
+\newcommand{\ExamDate}{January 2025}
+\input{../utils/preamble}
+
+\begin{document}
+    \input{../utils/rules}
+
+	\begin{questions}
+        \input{q1}
+        \clearpage
+        \input{q2}
+        \clearpage
+        \input{q3}
+    \end{questions}
+
+\end{document}
diff --git a/Exams/2025_01/q1.tex b/Exams/2025_01/q1.tex
@@ -0,0 +1,52 @@
+\titledquestion{Kernels for prediction of gene expression}
+
+One would like to predict, from the DNA sequence of an individual (or an appropriate subsequence), the level of expression of a particular gene (for instance, in order to estimate the predisposition of said individual to a disease). A DNA (sub)sequence is a string of $n$ letters taken in a four-letter alphabet (A,C,G,T).
+
+The gene expression level is a real number. We thus must build a map from $n$-letter strings to the reals. We may want to infer this map from $N$ examples $(s_i,r_i)$ (for $i=1,\ldots,N$), where $s_i$ is an $n$-letter string (a DNA sample) and $r_i$ a real number (a gene expression level). This is a regression task. Here we propose a kernel ridge regression to solve it.
+
+Given two strings $s$, $s'$ of length $n$ on the A,C,G,T alphabet, and an integer $\ell < n$ let us consider $k_{\ell}(s,s')$ as the number of $\ell$-letter strings present (at least once) in both $s$ and $s'$. For instance if $s=ACACACGT$ and $s'=ACGTCACA$, and $\ell=3$, we find $k_3(s,s')=4$, because $ACA$, $CAC$, $ACG$ and $CGT$ appear (at least once) in both $s$ and $s'$. 
+
+\begin{itemize}
+	\item  Build a map $s \mapsto \phi(s) \in \mathbb{R}^d$ into a feature space $\mathbb{R}^d$ so that  $k_{\ell}(s,s')= \langle \phi(s) , \phi(s')  \rangle$. Express $d$  as a function of $\ell$ and $n$. Here $ \langle .,. \rangle$ is the usual scalar product in $\mathbb{R}^d$. 
+	
+	\begin{solutionbox}{3cm}
+	\end{solutionbox}
+	
+	Thus $k_\ell(.,.)$ is a kernel map indeed. Call $K_\ell$ the corresponding $N$-by-$N$ kernel matrix.
+	
+	\item  Justify that the regression map found by the kernel ridge regression method (for any given choice of $\ell$) can be written as $s \mapsto \sum_{i=1}^N w_i k_\ell(s_i,s)$, for some $w_i$ to be found. You may use facts and theorems stated in the lectures (and cite them clearly). 
+	
+	\begin{solutionbox}{3cm}
+	\end{solutionbox}
+	
+	\item  Formulated in terms of  $\mathbf{w}=(w_1, \ldots, w_N)$, the kernel ridge regression problem can be written as 	
+	$$
+	\min_{\mathbf{w} \in \mathbb{R}^N} L(\mathbf{w})
+	$$
+	for some objective function. Write down the objective function $L(\mathbf{w})$. Explain briefly each term.
+	
+	\begin{solutionbox}{3cm}
+	\end{solutionbox}
+	
+	\item  What is the computational complexity (i.e., computation time) of deriving the regression map (i.e., computing the vector $w$)?  Justify briefly. Your expression should depend of $N$, $n$, $\ell$. 
+	
+	You may accept and use the following facts or assumptions: that $\ell \ll n$; that matrix $K$ is best computed entrywise (all $N^2$ entries separately); that computing $k_{\ell}(s,s')$ is done (for $\ell \ll n$) in time $\mathcal{O}(\ell n)$ (which means ``in time no larger than $c \ell n$, for some constant $c$'') for each given pair of strings $s$,$s'$; that solving an $N$-by-$N$ linear system $Ax=b$ (for some $N$-by-$N$ matrix $A$ and some vector $b$) is done in time $\mathcal{O}(N^3)$ in general.
+	
+	
+	\begin{solutionbox}{4cm}
+	\end{solutionbox}
+	
+	\item  What is the computational complexity (i.e., computation time) of computing  the regression map on a new string $s$? Justify briefly. Your expression should depend of $N$, $n$, $\ell$. Here we assume that the kernel matrix $K$ and the vector $w$ have already been computed.
+	
+	\begin{solutionbox}{4cm}
+	\end{solutionbox}
+	
+	\item {\bf Bonus} We could use a classification approach instead of regression. We can consider that the gene of interest is either ``expressed'' (if $r \geq r_0$, for some appropriate threshold $r_0$) or ``non-expressed''  (if $r < r_0$). 
+	We may now use a Kernel SVM method. Compare this approach (in terms of respective strengths or weaknesses) with the regression approach on three different aspects (e.g.  relevance for the application, computational complexity, simplicity, interpretability, etc.). This is an open question, with a range of sensible answers. 
+	
+	\begin{solutionbox}{4cm}
+	\end{solutionbox}
+	
+\end{itemize}
+
+NB: many machine-learning methods have been proposed by researchers for this important problem, involving kernels (more sophisticated than the one above), or CNN, transformers, etc. 
diff --git a/Exams/2025_01/q2.tex b/Exams/2025_01/q2.tex
@@ -0,0 +1,83 @@
+\titledquestion{Automatic Differentiation and Attention}
+
+Consider the matrices
+$V \in \mathbb{R}^{d_v \times n_\text{ctx}},
+K \in \mathbb{R}^{d_k \times n_\text{ctx}}$
+and an attention head applied to a vector $q$ : $a(q) = V\text{softmax}(K^\top q / \sqrt{d_k})$
+where the $i$-th entry of $\text{softmax}(x)$ is $\exp(x_i) / \sum_j \exp(x_j)$.
+Suppose that you want to compute Jacobian $L$ of $a(q)$, which is the derivative of all outputs of the attention
+head with respect to each entry of the vector $q$.
+In other words, $L_{ij} = \partial a_i / \partial q_j$.
+
+\begin{itemize}
+	\item Find the matrix $J$ to be used to compute
+	  the derivative of $a$ with respect to the $j$-th entry of $q$
+	  with the formula
+	  $$\partial a / \partial q_j = V J K^\top e_j / \sqrt{d_k}.$$
+	  \emph{Hint:} The entries of the matrix $J$ can be written purely in terms
+	  of the entries of the vector $s = \text{softmax}(K^\top q / \sqrt{d_k})$.
+    \begin{solutionbox}{9cm}
+		Let $x = K^\top q / \sqrt{d_k}$. We have
+		\begin{align*}
+			J_{ij} & =
+			\partial s_i / \partial x_j\\
+			& =
+			\frac{\partial}{\partial x_j} \frac{\exp(x_i)}{\sum_{k=1}^{n_{\text{ctx}}} \exp(x_k)}\\
+			& =
+			\frac{\exp(x_i)}{\sum_{k=1}^{n_{\text{ctx}}} \exp(x_k)}
+			\frac{\partial x_i}{\partial x_j}
+			-
+			\frac{\exp(x_i)}{(\sum_{j=1}^{n_{\text{ctx}}} \exp(x_j))^2}
+			\frac{\partial}{\partial x_j} \sum_{k=1}^{n_{\text{ctx}}} \exp(x_k)\\
+			& =
+			s_i
+			\frac{\partial x_i}{\partial x_j}
+			- s_is_j.
+		\end{align*}
+		So the entries of $J$ are:
+		\begin{align*}
+		  J_{ii} & = s_i - s_i^2\\
+		  J_{ij} & = -s_is_j & \quad i \neq j.
+		\end{align*}
+    \end{solutionbox}
+    \item The previous question corresponds to \emph{forward} differentiation.
+      Using \emph{reverse} differentiation, you would like now to
+	  compute the gradient of $a_i$ with respect to the vector $q$.
+	  How can you compute this gradient vector via matrix-vector products ?
+
+	  \emph{Hint:} $\partial a_i / \partial q_j$ is the scalar product between
+	  $\partial a / \partial q_j$ and $e_i$.
+    \begin{solutionbox}{6cm}
+		As $J$ is diagonal, it is its own transpose.
+		\begin{align*}
+	      \partial a_i / \partial q_j
+		  & =
+		  \langle V J K^\top e_j / \sqrt{d_k}, e_i \rangle\\
+		  & =
+		  \langle e_j, K J^\top V^\top e_i / \sqrt{d_k} \rangle\\
+	      \partial a_i / \partial q
+		  & =
+		  K J V^\top e_i / \sqrt{d_k}\\
+		\end{align*}
+    \end{solutionbox}
+    \item
+	Depending on $d_v, d_k, n_\text{ctx}$, which one will be faster between forward and reverse differentiation
+	to compute the \textbf{full} Jacobian matrix $\partial a / \partial q$ ? Why ?
+    \begin{solutionbox}{4cm}
+		Forward diff concatenates $\partial a / \partial q_j$ horizontally for each $j$ and 
+		reverse diff concatenates $\partial a_i / \partial q$
+		vertically for each $i$.
+		In other words, forward diff computes $V(JK^\top)$ while
+		reverse diff computes $K(J^\top V^\top)$ or equivalently
+		$(VJ)K^\top$.
+		The complexity of forward diff is $O(n_\text{ctx}^2 d_k + n_\text{ctx} d_k d_v)$
+		while the complexity of reverse diff is
+		$O(n_\text{ctx}^2 d_v + n_\text{ctx} d_k d_v)$.
+		This means that forward diff is faster if $d_k < d_v$, otherwise reverse diff is faster.
+    \end{solutionbox}
+    \item How could this computation be accelerated using a GPU instead of a CPU ?
+    \begin{solutionbox}{2cm}
+		As these are matrix-matrix products, this computation is highly parallelizable
+		and hence will get a good speed up on a GPU.
+    \end{solutionbox}
+\end{itemize}
diff --git a/Exams/2025_01/q3.tex b/Exams/2025_01/q3.tex
@@ -0,0 +1,40 @@
+\titledquestion{Stable Diffusion for music generation}
+
+Consider the problem of music generation given a text prompt.
+For instance, given ``classic jazz'', the model should generate a music of
+style ``classic jazz''.
+The music is represented by a sequence $(y_k)_{k=1}^N$ of real numbers of
+fixed length $N$.
+These represent the intensity of the music sampled at a given fixed rate (for instance every millisecond).
+
+\begin{itemize}
+    \item Draw the model and different components that you would use for this:
+    \begin{solutionbox}{8.5cm}
+        \emph{(sketch)}
+		Transformer encoder for the text prompt which is then given by
+		a diffusion model with cross-attention.
+		The diffusion model starts with a random vector of $N$ numbers
+		and then generates the music.
+    \end{solutionbox}
+    \item Describe the datasets needed and the procedure to be used to train each component of the model.
+    \begin{solutionbox}{4cm}
+        \emph{(sketch)}
+		First train encoder and then diffusion model.
+    \end{solutionbox}
+    \item Given a text prompt, describe the procedure to generate a new music.
+    \begin{solutionbox}{4cm}
+        \emph{(sketch)}
+		Classifier-Free Guidance.
+    \end{solutionbox}
+    \item Explain how you can use an auto-encoder to prevent the model from learning imperceptible details of the music. How do you adapt the training and inference process with this new component ?
+    \begin{solutionbox}{4cm}
+        \emph{(sketch)}
+		Use Variational Auto-Encoder to first compress the music.
+    \end{solutionbox}
+    \item Suppose that you want to generate a ``classic jazz'' music that, when reversed in time sounds like ``classic rock''.
+      How would you use your already trained model to achieve this task without retraining ?
+    \begin{solutionbox}{4cm}
+        \emph{(sketch)}
+		Like project.
+    \end{solutionbox}
+\end{itemize}
diff --git a/Exams/utils/commands.tex b/Exams/utils/commands.tex
@@ -0,0 +1,19 @@
+%
+% Commandes perso du template d'examen en math
+%
+% Version: 5.1.2022
+% Auteurs: 
+%   PY.Gousenbourger
+
+% Mathématiques
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\C}{\mathbb{C}}
+\newcommand{\Z}{\mathbb{Z}}
+\newcommand{\N}{\mathbb{N}}
+
+% Commands for tabularx (ask Stephanie Guerit for more details)
+\def\tabularxcolumn#1{m{#1}}
+\newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
+\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
+\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
+\newcolumntype{Z}{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}X}
diff --git a/Exams/utils/header.tex b/Exams/utils/header.tex
@@ -0,0 +1,17 @@
+\pagestyle{headandfoot}
+\extraheadheight{2cm}
+
+\chead{
+	\renewcommand{\arraystretch}{1.6}
+	\begin{tabularx}{\textwidth}{|L{3cm} Z L{5.5cm}|}
+		\hline
+		UCLouvain \newline EPL Faculty \newline \ExamDate & %
+		\textbf{Exam : LINMA2472} \newline Algorithms in data science & %
+		First Name : {\sf \Prenom } \newline Last Name : {\sf \Nom} \newline NOMA : {\sf }\\
+		\hline
+	\end{tabularx}
+	\renewcommand{\arraystretch}{1.1}
+}
+\cfoot{
+	-- Page \thepage \ of \numpages \ --
+}
diff --git a/Exams/utils/packages.tex b/Exams/utils/packages.tex
@@ -0,0 +1,36 @@
+% Language
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{lmodern}
+\usepackage[french]{babel}
+\usepackage{titlesec}
+\usepackage{eurosym}
+
+% Formatting
+\usepackage{color}
+\usepackage{enumerate}
+\usepackage{multicol}
+\usepackage{url}
+\usepackage{hyperref}
+\usepackage{comment}    % To hide or show environments (see newcommand)
+\usepackage{booktabs}
+\usepackage{tabularx}
+\usepackage{tabto}
+\usepackage{siunitx}
+
+% Maths
+\usepackage{amssymb,amsmath,amsthm}
+\usepackage{empheq}
+\usepackage{chngcntr} % for theorems numbering
+
+% Graphics
+\usepackage{graphicx}
+\usepackage{caption,subcaption}
+
+% Drawings (tikz)
+\usepackage{tikz}
+    \usetikzlibrary{babel}    % Compatibility with babel package
+    \usetikzlibrary{graphs}
+    \usetikzlibrary{graphs.standard}
+\usepackage{pgfplots}
+    \pgfplotsset{compat=1.13}
diff --git a/Exams/utils/preamble.tex b/Exams/utils/preamble.tex
@@ -0,0 +1,12 @@
+\input{../utils/packages} 		% packages
+\input{../utils/commands}			% commandes perso
+\qformat{\textbf{Question \thequestion\ }: \thequestiontitle \hfill}
+
+
+\newcommand{\blankname}{\parbox[b]{4cm}{\hrulefill}}
+\newcommand{\Nom}{}
+\newcommand{\Prenom}{}
+
+\input{../utils/header}
+
+\setlength\parindent{0pt}
diff --git a/Exams/utils/rules.tex b/Exams/utils/rules.tex
@@ -0,0 +1,30 @@
+\thispagestyle{empty}
+\vspace*{.5cm}
+\begin{center}
+ 
+  {\Large \bfseries  LINMA2472 Algorithms in Data Science}
+
+    \bigskip
+    {\Large \ExamDate}
+
+\end{center}
+\vspace*{.5cm}
+
+
+\paragraph{Instructions}
+\begin{enumerate}
+\item Write your name, surname, NOMA on each sheet. 
+\item Write in the frames, not outside (your exam will be scanned and only the text within the frame will appear on-screen for the grader)
+\item The exam lasts 3 hours.
+%%\item In your answers, you can freely use theorems and algorithms seen in the lectures and/or in the notes, unless otherwise specified. For example you can call an algorithm of the course in any algorithm you are asked to design in the questions. Please cite these theorems and algorithms explicitly, though.
+  \item The back face (verso) of each sheet is not read, you can use it as draft paper.
+  \item You can use your own blank paper as draft paper.
+  \item Calculators and other electronic devices are forbidden.
+  \item You must hand in everything at the end of the exam: questions, answers, drafts. 
+  \item Please hand in the questions in the correct order.
+\end{enumerate}
+
+
+\vfill \pagebreak
+
+\setcounter{page}{1}
diff --git a/index.md b/index.md
@@ -7,3 +7,12 @@ The html versions accessible here are static, see [here](Lectures) for details o
 * Diffusion Models [html](Lectures/diffusion.html) [pdf](Lectures/diffusion.pdf)
 * Implicit Differentiation [html](Lectures/implicit.html) [pdf](Lectures/implicit.pdf)
 * Sparse AD [html](Lectures/sparse.html) [pdf](Lectures/sparse.pdf)
+
+## Past exams
+
+> [!INFO] 
+> The content of the course as changed in the 2025-2026 year,
+> the questions from the 2025 January exam that were therefore
+> not relevant anymore have been omitted.
+
+* [2025 January](Exams/2025_01/main.pdf)