fixed until quiescence search and added profiling

Juanvoid01 · Juanvoid01 · commit d3df5a439bb6 · 2025-05-25T21:03:26.000+02:00
diff --git a/docs/AlphaDeepChess/Capitulos/AnalysisOfImprovements.tex b/docs/AlphaDeepChess/Capitulos/AnalysisOfImprovements.tex
@@ -1,45 +1,81 @@
 \chapter{Analysis and evaluation}\label{cap:analysis}
 
-This chapter documents the implementation of the following techniques used to improve the chess engine:
+This chapter presents an analysis of the performance of the chess engine through profiling, identifying its most computationally intensive components. We then evaluate the effectiveness of the improvement techniques described in ~\cref{cap:ImprovementTechniques}. Finally, we compare the performance of \textit{AlphaDeepChess} against \textit{Stockfish}, and examine its position within the Elo rating distribution on \textit{Lichess.org}.
 
-\begin{itemize}[itemsep=1pt]
-    \item Transposition tables with zobrist hashing.
-    \item Move generator with magic bitboards and PEXT instructions.
-    \item Evaluation with king safety and piece mobility parameters.
-    \item Multithread search.
-    \item Search with Late move Reductions.
-\end{itemize}
+\newpage
 
+\section{Profiling}
+In order to analyze the performance of our chess engine and identify potential bottlenecks where the code consume the most execution time, we used the \texttt{perf} tool available on Linux systems. \texttt{perf} provides robust profiling capabilities by recording CPU events, sampling function execution, and collecting stack traces ~\cite{PerfLinux}. 
 
-\newpage
+\vspace{1em}
 
-\noindent Achieving this level of efficiency and quality requires a well-structured development process. For this reason, we adopted a systematic methodology to guide the implementation and continuous improvement of our engine.
+\noindent We run the engine under \texttt{perf} using the following commands:
 
-\section{Methodology}
+\begin{lstlisting}[language=bash, caption={Profiling \textit{AlphaDeepChess} with perf}, frame=single, breaklines=true]
+# Record performance data with function stack traces
+sudo perf record -g ./build/release/AlphaDeepChess
 
-Once the basic foundations are established with an initial version of the essential components or modules (which will be described later), our workflow follows an iterative process: first, we search for existing information on each topic, analyze it, implement a solution, and then profile the implementation to identify bottlenecks. After locating performance issues, we optimize the relevant parts, and finally, compare the new version with the previous one to assess improvements.
+# Display interactive report
+sudo perf report -g --no-children
+\end{lstlisting}
 
-\vspace{1em}
+\noindent After recording, \texttt{perf report} opens an interactive terminal interface where functions are sorted by CPU overhead, allowing us to easily identify performance-critical regions.
+
+\vspace{2em}
 
-\noindent Then, at a given moment, we can decide to take action and try to determine the strength of the engine with the last functional version.
+\noindent First, we profile the basic architecture of the engine implemented in~\cref{cap:descripcionTrabajo}, and then evaluate it again after applying the optimizations described in~\cref{cap:ImprovementTechniques}.
 
-\subsection*{Profiler}
+\subsection*{Profiling of basic engine architecture}
 
-First, in order to analyze the performance of our chess engine and identify potential bottlenecks, we used the \texttt{perf} tool available on Linux systems. \texttt{perf} provides robust profiling capabilities by recording CPU events, sampling function execution, and collecting stack traces.
+\noindent As shown in~\cref{tab:profilingBasic}, the profiling results indicate that the majority of the total execution time is spent in the legal move generation function. Specifically, the functions \texttt{generate\_legal\_moves}, \texttt{calculate\_moves\_in\_dir}, and \texttt{update\_danger\_in\_dir} together account for over 72\% of the total overhead. Therefore, the optimizations on this component are expected to yield significant performance improvements.
+
+\begin{table}[H]
+    \centering
+    \begin{tabular}{|l|r|}
+    \hline
+    \textit{Symbol} & \textit{Overhead} \\
+    \hline
+    \texttt{generate\_legal\_moves}       &  36.07\% \\
+    \texttt{calculate\_moves\_in\_dir}    &  19.30\% \\
+    \texttt{evaluate\_position}          &  16.63\% \\
+    \texttt{update\_danger\_in\_dir}      &   16.23\% \\
+    \texttt{calculate\_king\_moves}       &   1.24\% \\
+    \texttt{quiescence\_search}          &   0.96\% \\
+    \texttt{...}                          &   ...     \\
+    \hline
+    \end{tabular}
+    \caption{Profiling results of the basic engine implementation.}
+    \label{tab:profilingBasic}
+\end{table}
 
 \vspace{1em}
 
-\noindent Our profiling goal is to identify which parts of the code consume the most execution time. We run the engine under \texttt{perf} using the following commands:
+\subsection*{Profiling with improvement techniques}
 
-\begin{lstlisting}[language=bash, caption={Profiling \textit{AlphaDeepChess} with perf}, frame=single, breaklines=true]
-# Record performance data with function stack traces
-sudo perf record -g ./build/release/AlphaDeepChess
+\noindent As shown in~\cref{tab:profilingImprovements}, the updated profiling results demonstrate a successful reduction in the computational cost of move generation. The execution time is now more evenly distributed across various modules, with position evaluation emerging as the new primary performance bottleneck. This shift confirms the effectiveness of the implemented optimization techniques.
 
-# Display interactive report
-sudo perf report -g --no-children
-\end{lstlisting}
+\begin{table}[H]
+    \centering
+    \begin{tabular}{|l|r|}
+    \hline
+    \textit{Symbol} & \textit{Overhead} \\
+    \hline
+    \texttt{evaluate\_position}       &  31.90\% \\
+    \texttt{update\_attacks\_bb}    &  22.62\% \\
+    \texttt{generate\_legal\_moves}          &  22.71\% \\
+    \texttt{order\_moves}      &   3.95\% \\
+    \texttt{make\_move}       &   3.83\% \\
+    \texttt{alpha\_beta\_search}          &   1.66\% \\
+    \texttt{...}                          &   ...     \\
+    \hline
+    \end{tabular}
+    \caption{Profiling results after applying optimization techniques.}
+    \label{tab:profilingImprovements}
+\end{table}
+
+\newpage
 
-\noindent After recording, \texttt{perf report} opens an interactive terminal interface where functions are sorted by CPU overhead. This allows us to prioritize which functions to optimize.
+\section{cutechess}
 
 \noindent The most common way to measure the strength of a chess engine is by playing games against other engines and analyzing the results. To quantify this strength, the Elo rating system is used. Elo is a statistical rating system originally developed for chess, which assigns a numerical value to each player (or engine) based on their game results against opponents of known strength. When an engine wins games against higher-rated opponents, its Elo increases; if it loses, its Elo decreases. This allows for an objective comparison of playing strength between different engines.
 
diff --git a/docs/AlphaDeepChess/Capitulos/DescripcionTrabajo.tex b/docs/AlphaDeepChess/Capitulos/DescripcionTrabajo.tex
@@ -1,6 +1,6 @@
 \chapter{Basic engine architecture}\label{cap:descripcionTrabajo}
 
-This chapter documents the development process of the chess engine. The project is organized into the following modules:
+This chapter documents the basic architecture of the chess engine.  The project is organized into the following modules:
 
 \begin{itemize}[itemsep=1pt]
     \item \textit{Board}: Data structures to represent the chess board.
@@ -11,12 +11,14 @@ \chapter{Basic engine architecture}\label{cap:descripcionTrabajo}
     \item \textit{UCI}: Universal Chess Interface implementation.
 \end{itemize}
 
-\noindent First, we describe the implementation of the basic parts of the chess engine, then we introduce and explain in detail the algorithmic techniques developed to improve the engine's performance.
+\noindent First, we describe the implementation of the basic parts of the chess engine, then in the following~\cref{cap:ImprovementTechniques} we introduce and explain in detail the algorithmic techniques developed to improve the engine's performance.
 
 \vspace{1em}
 
 \noindent We begin by examining the fundamental data structure used for chess position representation.
 
+\newpage
+
 \section{Chessboard representation: bitboards}
 
 \noindent The chessboard is represented using a list of \textit{bitboards}. A bitboard is a 64-bit variable in which each bit corresponds to a square on the board. A bit is set to \texttt{1} if a piece occupies the corresponding square and \texttt{0} otherwise. The least significant bit (LSB) represents the \texttt{a1} square, while the most significant bit (MSB) corresponds to \texttt{h8}~\cite{Bitboards}.
@@ -31,36 +33,39 @@ \section{Chessboard representation: bitboards}
 
 \vspace{1em}
 
-\begin{figure}
-    \centering
-    \newchessgame
-    \chessboard[
-        showmover=false,
-        setfen=7k/8/5p2/2p1p1p1/P2p3p/1P1P1P1P/2P1P1P1/R2K3R w KQ - 0 1
-    ]
-
-    \vspace{1.0em}
+\begin{figure}[H]
 
-    \begin{minipage}[c]{0.30\textwidth}
-        \includegraphics[width=\textwidth]{Imagenes/bitboard_white_pawns.png}
-        \caption*{Bitboard of white pawns.}
+    \begin{minipage}[c]{0.35\textwidth}
+        \newchessgame
+        \chessboard[
+            showmover=false,
+            setfen=7k/8/5p2/2p1p1p1/P2p3p/1P1P1P1P/2P1P1P1/R2K3R w KQ - 0 1
+        ]
     \end{minipage}
     \hfill
-    \begin{minipage}[c]{0.30\textwidth}
+    \begin{minipage}[c]{0.36\textwidth}
         \includegraphics[width=\textwidth]{Imagenes/bitboard_black_pawns.png}
         \caption*{Bitboard of black pawns.}
     \end{minipage}
+
+    \vspace{1.1em}
+    \hspace*{0.03\textwidth}
+    \begin{minipage}[c]{0.36\textwidth}
+        \includegraphics[width=\textwidth]{Imagenes/bitboard_white_pawns.png}
+        \caption*{Bitboard of white pawns.}
+    \end{minipage}
     \hfill
-    \begin{minipage}[c]{0.30\textwidth}
+    \begin{minipage}[c]{0.36\textwidth}
         \includegraphics[width=\textwidth]{Imagenes/bitboard_white_rooks.png}
         \caption*{Bitboard of white rooks.}
     \end{minipage}
+
     \caption{List of bitboards data structure example.}\label{fig:bitboardPositionExample}
 \end{figure}
 
 \noindent The main advantages of bitboards is that we can operate on multiple squares simultaneously using bitwise operations. For example, we can determine if there are any black pawns on the fifth rank by performing a bitwise AND operation with the corresponding mask.~\cref{fig:bitboardMaskOperation} illustrates this concept.
 
-\begin{figure}
+\begin{figure}[H]
     \centering
     \begin{minipage}[c]{0.30\textwidth}
         \centering
@@ -170,14 +175,11 @@ \subsection*{Horizon effect problem, quiescence search}\label{sec:horizon-effect
 
 \vspace{1em}
 
-% TODO: MEJOR EXPLICAD LAS DIFERENCIAS QUE REPETIR EL TEXTO
-\noindent The following events occur in a quiescence node:
+\noindent The same events occur in a quiescence node as in a regular search node, with the following key differences in execution steps:
 
 \begin{enumerate}
-    \item \textit{Terminal node verification}: Check for game termination conditions due to checkmate, threefold repetition, the fifty-move rule or reaching a maximum ply.
     \item \textit{Standing pat evaluation}: Also known as static evaluation, this step assigns a preliminary score to the position. This score can serve as a lower bound and is immediately used to determine whether alpha-beta pruning can be applied.
     \item \textit{Selective legal move generation}: Create a list of every possible legal move excluding moves that are not captures.
-    \item \textit{Move ordering}: Sort capture moves by estimated quality (best to worst).
     \item \textit{Move exploration}: Iterate through each of the capture legal moves from the position in order, update the position evaluation, the value of alpha and beta, and check if we can perform pruning.
 \end{enumerate}
 
@@ -404,6 +406,7 @@ \subsection*{Tapered evaluation}
 \end{figure}
 
 \section{Move generator}
+\label{sec:moveGenerator}
 
 Calculating the legal moves in a chess position is a more difficult and tedious task
 than it might seem, mainly due to the unintuitive rules of \textit{en passant} and castling,
diff --git a/docs/AlphaDeepChess/Capitulos/ImprovementTechniques.tex b/docs/AlphaDeepChess/Capitulos/ImprovementTechniques.tex
@@ -145,17 +145,7 @@ \subsection*{Collisions}
 
 \section{Move generator with magic bitboards and pext instructions}
 
-To identify potential performance bottlenecks, we performed profiling on the engine, as shown in~\cref{fig:profiling}.
-
-\vspace{1em}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=1.0\textwidth]{Imagenes/basic_move_generator_profiling.png}
-    \caption{Profiling results.}\label{fig:profiling}
-\end{figure}
-
-\noindent The profiling results indicate that the majority of the total execution time is spent in the legal move generation function. Therefore, optimizing this component is expected to yield significant performance improvements.
+As previously discussed (see ~\cref{sec:moveGenerator}), computing the legal moves for sliding pieces is computationally expensive, as it requires identifying which pieces block their paths within their attack patterns. In this section, we present a technique that enables the precomputation of all possible moves for rooks and bishops, while queen moves can be derived as the union of rook and bishop moves, allowing constant-time O(1) access.
 
 \subsection*{Magic bitboards}
 
@@ -171,12 +161,8 @@ \subsection*{Magic bitboards}
 
 \begin{itemize}[itemsep=1pt]
   \item Preserves relevant blocker information: 
-  The nearest blockers along a piece's movement direction are preserved. 
-  \textit{Example:} Consider a rook with two pawns in its path:
-  \begin{center}
-    Rook $\rightarrow \rightarrow \rightarrow$ [Pawn1][Pawn2]
-  \end{center}
-  In this case, only `Pawn1` blocks the rook's movement, while `Pawn2` is irrelevant.
+    Only the nearest blockers along a sliding piece's movement direction are important. For example, in ~\cref{fig:magics_position}, the pawn on d6 is a relevant blocker because it directly restricts the rook's movement. In contrast, the pawn on d7 is irrelevant, as it lies beyond the first blocker and does not influence the final set of legal moves.
+
   \item Compresses the blocker bitboard, pushing the important bits near the most significant bit.
   \item The final multiplication must produce a unique index for each possible blocker configuration. The way to ensure the uniqueness is by brute force testing.
 \end{itemize}
@@ -193,12 +179,12 @@ \subsection*{Magic bitboards}
         \chessboard[
             showmover=false,
             setfen=n1bk3r/3p4/1p1p2p1/8/3R1p2/8/3p4/7n w - - 0 1,
-            markstyle=circle,
+            markstyle=border,
             color=red, markfields={d6,f4,d2},
             color=green, markfields={c4,b4,a4,e4,d5,d3}
         ]
     \end{minipage}
-    \caption{Initial chess position with white rook and blockers}\label{fig:magics_position}
+    \caption{Chess position with white rook legal moves in green and blockers in red.}\label{fig:magics_position}
 \end{figure}
 
 \subsection*{Magic number generation}
diff --git a/docs/AlphaDeepChess/Imagenes/improvements_profiling.png b/docs/AlphaDeepChess/Imagenes/improvements_profiling.png
diff --git a/docs/AlphaDeepChess/TFGTeXiS.pdf b/docs/AlphaDeepChess/TFGTeXiS.pdf
diff --git a/docs/AlphaDeepChess/biblio.bib b/docs/AlphaDeepChess/biblio.bib
@@ -364,3 +364,14 @@ @book{Russell2021artificial
   publisher={Pearson Education},
   lastaccess = {June, 2024}
 }
+
+@misc{PerfLinux,
+  author       = {kernel.org},
+  title        = {perf: Linux profiling with performance counters },
+  howpublished = {online},
+  year         = {2024},
+  url={https://perfwiki.github.io/main},
+  lastaccess   = {May, 2025}
+}
+
+