adding tables of cont, list, figures, ...

pascalwhoop · pascalwhoop · commit 38904d5a8458 · 2018-04-29T18:35:17.000+02:00
diff --git a/src/abstract.tex b/src/abstract.tex
@@ -1,2 +1,3 @@
 \begin{abstract}
+    TODO end of writing
 \end{abstract}
diff --git a/src/bibliography.bib b/src/bibliography.bib
@@ -29,6 +29,7 @@ @misc{clickcli
     title        = {Click_},
     howpublished = {\url{http://click.pocoo.org/5/}},
     author       = {Ronacher, Armin},
+    year         = {2018},
     note         = {Accessed: 2018-04-22}
 }
 
diff --git a/src/chaps/implementation.tex b/src/chaps/implementation.tex
@@ -436,7 +436,6 @@ \subsection{Wholesale Market}
     P^{r}_{avg} =\frac{\sum ^{1}_{i=24} P^{r}_{i} *Q^{r}_{i}}{\sum ^{1}_{i=24} Q^{r}_{i}}
 
     %TODO encouraging for exploration injecting into rewards
-    %TODO DONE
 
 \end{equation}
 This is also calculated for the market prices $P^m_{i}$ and quantities $Q^m_i$ cleared during each timeslot for the whole market and then the
@@ -445,6 +444,8 @@ \subsection{Wholesale Market}
     %relationship between average price paid by broker and average market price for target timeslot
 R(t) = \frac{P^r_{avg}}{P^m_{avg}}
 \end{equation}
+
+%TODO DONE
 Using \ac {MDP}
 
 \ac {MDP} is actually with infinite states but for analytical concept, its irrelevant. Important is: Continuous states,
diff --git a/src/chaps/powertac.tex b/src/chaps/powertac.tex
@@ -8,7 +8,7 @@
 balancing operations. Figure ~\ref{fig:powertacoverview} summarizes this ecosystem.
 
 \begin{figure}[!h]%!h \centering
-    \includegraphics[width=0.9\textwidth]{powerTACScenarioOverview.png} \caption{\ac{PowerTAC} overview of markets}
+    \includegraphics[width=0.9\textwidth]{powerTACScenarioOverview.png} \caption{PowerTAC overview of markets}
 \label{fig:powertacoverview} \end{figure}
 
 
diff --git a/src/chaps/recurrentnn.tex b/src/chaps/recurrentnn.tex
@@ -19,7 +19,7 @@
 \begin{figure}[]
     \centering
     \includegraphics[width=0.8\linewidth]{img/rnn_concept.png}
-    \caption{A recurrent neural network conceptualized. \emph{Left}: Circuit diagram where the black square represents a
+    \caption[Recurrent Neural Network conceptualized]{. \emph{Left}: Circuit diagram where the black square represents a
     1 time-step delay. \emph{Right:} The same network unfolded where each node represents a particular time instance.
     Taken from \citet{Goodfellow-et-al-2016}.}
     \label{fig:rnn_concept}
diff --git a/src/chaps/reinforcement.tex b/src/chaps/reinforcement.tex
@@ -55,10 +55,9 @@ \subsection{Bellman Equation}%
 The Bellman Equation offers a way to describe the utility of each state in an \ac {MDP}. For this, it defines the
 utility of a state as the reward for the current state plus the sum of all future rewards discounted by $\gamma$. 
 
-\[
+\begin{equation}
 U(s) = R(s) + \gamma \max_{a\in\mathcal{A}(s)} \sum_{s'}{P(s' \mid s,a)U(s')}
-%TODO numbers on equation?
-\]
+\end{equation}
 
 In the above equation, the \emph{max} operation selects the optimal action in regard to all possible actions. The
 Bellman equation is explicitly targeting \emph{discrete} state spaces. If the state transition graph is a cyclic graph
@@ -96,9 +95,9 @@ \subsection{Value and Policy Iteration}%
 assuming both the transition function $P(s' \ mid s,a) \forall s \in \mathcal{S}$ and the reward function $R(s)$ are
 known to the agent.  
 In the algorithm, the utility of each state is updated based on the \emph{Bellman update} rule:
-\[
+\begin{equation}
 U_{i+1}(s) \gets R(s) + \gamma \max_{a \in \mathcal{A}(s)} \sum_{s'}{P(s' \mid s,a) U_i(s')}
-\]
+\end{equation}
 This needs to be performed for \emph{each} state during \emph{each} iteration. It is clear how quickly this becomes
 intractable as well when $\gamma$ is reasonably close to 1, meaning that also long-term rewards are taken into
 consideration. 
@@ -133,10 +132,9 @@ \subsection{Temporal Difference Learning}%
 is called a trial
 \footnote{in newer \ac {RL} literature this is also called a \emph{trajectory} \citep{proximalpolicyopt, heess2017emergence} }. 
 The update rule for the utility of each state is as follows:
-\[
+\begin{equation}
 U^\pi(s) \gets U^\pi(s) + \alpha(R(s) + \gamma U^\pi(s') - U^\pi(s))
-\]
-
+\end{equation}
 Where $\alpha$ is the learning rate and $U^\pi$ the utility under the execution of $\pi(s)$ in state $s$. This only
 updates the utilities based on the observed transitions so if the unknown transition function sometimes leads to
 extremely negative rewards through rare transitions, this is unlikely to be captured. However, with sufficiently many
@@ -148,9 +146,9 @@ \subsection{Exploration}%
 
 The above learning approach has one weakness: It is only based on observed utilities. If $\pi$ follows the pattern of
 always choosing the action that leads to the highest expected $U_{i+1}$, i.e.
-\[
+\begin{equation}
 \pi(s) = \max_{a \in \mathcal{A}(s)}P(s' \mid s, a)U(s')
-\]
+\end{equation}
 then it will never explore possible alternatives and will very quickly get stuck on a rigid action
 pattern mapping each state to a resulting action. To avoid this, the concept of \emph{exploration} has been introduced.
 There are many approaches to encourage exploration. The simplest is to define a factor $\epsilon$ which defines the
@@ -184,18 +182,18 @@ \subsection{Q Learning}%
 (i.e. learn what a good policy is), this becomes problematic if the transition function is not known. An alternative
 model is called \emph{Q-Learning} which is a form of Temporal Difference Learning. It learns an action-utility value
 instead of simply the values. The relationship between this \emph{Q-Value} and the former value of a state is simply 
-\[
+\begin{equation}
 U(s) = \max_{a}Q(s,a)
-\]
+\end{equation}
 so the value of a state is that of the highest Q-Value. The benefit of this approach is that it does not require a model
 of how the world works, it therefore is called a \emph{model-free} method. The update rule for the Q-Values is simply
 the Bellman equation with $U(s)$ and $U(s')$ replaced with $Q(s,a)$ and $Q(s',a')$ respectively. 
 
 The update rules for the Q-Value approach are related to the Temporal Difference Learning rules but include a $\max$
 operator
-\[
+\begin{equation}
 Q(s,a) \gets Q(s,a) + \alpha(R(s) + \gamma \max_{a'}Q(s', a') - Q(s,a))
-\]
+\end{equation}
 An alternative version is the reduction of the above equation by removing the $\max$ operator. This results in the
 \emph{actual} action being considered instead of the one that the policy believes to be the best. Q-Learning is
 \emph{off-policy} while the latter version, called \ac {SARSA}, is \emph{on-policy}. The distinction has a significant
@@ -233,9 +231,9 @@ \subsection{Policy Search and Policy Gradient Methods}%
 $\hat{A}_t$ to create an estimator for the policy gradient:
 
 
-\begin{equation*}
+\begin{equation}
 \hat{g} \ =\ \hat{\mathbb{E}}_{t} \ \left[ \nabla _{\theta }\log \pi _{\theta }( a_{t} \ \mid s_{t})\hat{A}_{t}  \right]
-\end{equation*}
+\end{equation}
 
 where $\hat{A}_t$ describes the advantage of taking one action over another in a given state. It can therefore be
 described as an \emph{actor-critic architecture}, because $A(a_t, s_t) = Q(a_t,s_t) - V(s_t)$, meaning that the
diff --git a/src/head.tex b/src/head.tex
@@ -2,7 +2,7 @@
 \usepackage{tabularx}
 \usepackage{minted}
 \usepackage{amsmath}
-\usepackage[nolist,nohyperlinks]{acronym}
+\usepackage[printonlyused, nohyperlinks]{acronym}
 \usepackage{amssymb}
 \usepackage{listings}
 \input{snippets/tikz.tex}
@@ -17,30 +17,30 @@
 \usepackage[bookmarks]{hyperref}
 \graphicspath{{img/}}
 
-\input{acronyms.tex}
 
 %adapting the article class to Ketter requirements
 %\usepackage{showframe}
 \usepackage[left=5cm, top=2cm, bottom=2cm, right=2cm]{geometry}
 \usepackage{setspace}
 \onehalfspacing
 
-\lstset{ 
-  basicstyle=\footnotesize,        % the size of the fonts that are used for the code
-  breakatwhitespace=false,         % sets if automatic breaks should only happen at whitespace
-  breaklines=true,                 % sets automatic line breaking
-  captionpos=b,                    % sets the caption-position to bottom
- % deletekeywords={...},            % if you want to delete keywords from the given language
- % escapeinside={\%*}{*)},          % if you want to add LaTeX within your code
- % frame=single,                    % adds a frame around the code
-  keepspaces=true,                 % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
-%  keywordstyle=\color{blue},       % keyword style
-  numbers=left,                    % where to put the line-numbers; possible values are (none, left, right)
-  numbersep=5pt,                   % how far the line-numbers are from the code
-  rulecolor=\color{black},         % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
-  showspaces=false,                % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
-  showstringspaces=false,          % underline spaces within strings only
-  showtabs=false,                  % show tabs within strings adding particular underscores
-  stepnumber=1,                    % the step between two line-numbers. If it's 1, each line will be numbered
-  tabsize=2,                       % sets default tabsize to 2 spaces
+\lstset{
+    basicstyle=\footnotesize,        % the size of the fonts that are used for the code
+    breakatwhitespace=false,         % sets if automatic breaks should only happen at whitespace
+    breaklines=true,                 % sets automatic line breaking
+    captionpos=b,                    % sets the caption-position to bottom
+    % deletekeywords={...},            % if you want to delete keywords from the given language
+    % escapeinside={\%*}{*)},          % if you want to add LaTeX within your code
+    % frame=single,                    % adds a frame around the code
+    keepspaces=true,                 % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
+    %  keywordstyle=\color{blue},       % keyword style
+    numbers=left,                    % where to put the line-numbers; possible values are (none, left, right)
+    numbersep=5pt,                   % how far the line-numbers are from the code
+    rulecolor=\color{black},         % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
+    showspaces=false,                % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
+    showstringspaces=false,          % underline spaces within strings only
+    showtabs=false,                  % show tabs within strings adding particular underscores
+    stepnumber=1,                    % the step between two line-numbers. If it's 1, each line will be numbered
+    tabsize=2,                       % sets default tabsize to 2 spaces
 }
+
diff --git a/src/main.lol b/src/main.lol
@@ -0,0 +1,3 @@
+\contentsline {listing}{\numberline {1}{\ignorespaces Basic Keras 2 layer dense NN example}}{23}{listing.1}
+\contentsline {listing}{\numberline {2}{\ignorespaces Click sample declaration}}{23}{listing.2}
+\contentsline {listing}{\numberline {3}{\ignorespaces Turning the current server snapshot into a docker image}}{28}{listing.3}
diff --git a/src/main.tex b/src/main.tex
@@ -7,7 +7,11 @@
 \input{cover.tex}
 \pagenumbering{Roman}
 \input{abstract.tex}
-\printglossaries
+\printacronyms
+\listoffigures
+\listoftables
+\listoflistings
+\input{acronyms.tex}
 \input{content.tex}
 \pagenumbering{arabic}
 \input{chaps/body.tex}

Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,3 @@`
`1`	`1`	`\begin{abstract}`
	`2`	`+ TODO end of writing`
`2`	`3`	`\end{abstract}`
Original file line number	Diff line number	Diff line change
`@@ -29,6 +29,7 @@ @misc{clickcli`
`29`	`29`	`title = {Click_},`
`30`	`30`	`howpublished = {\url{http://click.pocoo.org/5/}},`
`31`	`31`	`author = {Ronacher, Armin},`
	`32`	`+ year = {2018},`
`32`	`33`	`note = {Accessed: 2018-04-22}`
`33`	`34`	`}`
`34`	`35`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+\contentsline {listing}{\numberline {1}{\ignorespaces Basic Keras 2 layer dense NN example}}{23}{listing.1}`
	`2`	`+\contentsline {listing}{\numberline {2}{\ignorespaces Click sample declaration}}{23}{listing.2}`
	`3`	`+\contentsline {listing}{\numberline {3}{\ignorespaces Turning the current server snapshot into a docker image}}{28}{listing.3}`