Skip to content

Commit 38904d5

Browse files
committed
adding tables of cont, list, figures, ...
1 parent 718268a commit 38904d5

File tree

9 files changed

+48
-40
lines changed

9 files changed

+48
-40
lines changed

src/abstract.tex

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
\begin{abstract}
2+
TODO end of writing
23
\end{abstract}

src/bibliography.bib

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ @misc{clickcli
2929
title = {Click_},
3030
howpublished = {\url{http://click.pocoo.org/5/}},
3131
author = {Ronacher, Armin},
32+
year = {2018},
3233
note = {Accessed: 2018-04-22}
3334
}
3435

src/chaps/implementation.tex

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -436,7 +436,6 @@ \subsection{Wholesale Market}
436436
P^{r}_{avg} =\frac{\sum ^{1}_{i=24} P^{r}_{i} *Q^{r}_{i}}{\sum ^{1}_{i=24} Q^{r}_{i}}
437437
438438
%TODO encouraging for exploration injecting into rewards
439-
%TODO DONE
440439
441440
\end{equation}
442441
This is also calculated for the market prices $P^m_{i}$ and quantities $Q^m_i$ cleared during each timeslot for the whole market and then the
@@ -445,6 +444,8 @@ \subsection{Wholesale Market}
445444
%relationship between average price paid by broker and average market price for target timeslot
446445
R(t) = \frac{P^r_{avg}}{P^m_{avg}}
447446
\end{equation}
447+
448+
%TODO DONE
448449
Using \ac {MDP}
449450

450451
\ac {MDP} is actually with infinite states but for analytical concept, its irrelevant. Important is: Continuous states,

src/chaps/powertac.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
balancing operations. Figure ~\ref{fig:powertacoverview} summarizes this ecosystem.
99

1010
\begin{figure}[!h]%!h \centering
11-
\includegraphics[width=0.9\textwidth]{powerTACScenarioOverview.png} \caption{\ac{PowerTAC} overview of markets}
11+
\includegraphics[width=0.9\textwidth]{powerTACScenarioOverview.png} \caption{PowerTAC overview of markets}
1212
\label{fig:powertacoverview} \end{figure}
1313

1414

src/chaps/recurrentnn.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
\begin{figure}[]
2020
\centering
2121
\includegraphics[width=0.8\linewidth]{img/rnn_concept.png}
22-
\caption{A recurrent neural network conceptualized. \emph{Left}: Circuit diagram where the black square represents a
22+
\caption[Recurrent Neural Network conceptualized]{. \emph{Left}: Circuit diagram where the black square represents a
2323
1 time-step delay. \emph{Right:} The same network unfolded where each node represents a particular time instance.
2424
Taken from \citet{Goodfellow-et-al-2016}.}
2525
\label{fig:rnn_concept}

src/chaps/reinforcement.tex

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,9 @@ \subsection{Bellman Equation}%
5555
The Bellman Equation offers a way to describe the utility of each state in an \ac {MDP}. For this, it defines the
5656
utility of a state as the reward for the current state plus the sum of all future rewards discounted by $\gamma$.
5757

58-
\[
58+
\begin{equation}
5959
U(s) = R(s) + \gamma \max_{a\in\mathcal{A}(s)} \sum_{s'}{P(s' \mid s,a)U(s')}
60-
%TODO numbers on equation?
61-
\]
60+
\end{equation}
6261

6362
In the above equation, the \emph{max} operation selects the optimal action in regard to all possible actions. The
6463
Bellman equation is explicitly targeting \emph{discrete} state spaces. If the state transition graph is a cyclic graph
@@ -96,9 +95,9 @@ \subsection{Value and Policy Iteration}%
9695
assuming both the transition function $P(s' \ mid s,a) \forall s \in \mathcal{S}$ and the reward function $R(s)$ are
9796
known to the agent.
9897
In the algorithm, the utility of each state is updated based on the \emph{Bellman update} rule:
99-
\[
98+
\begin{equation}
10099
U_{i+1}(s) \gets R(s) + \gamma \max_{a \in \mathcal{A}(s)} \sum_{s'}{P(s' \mid s,a) U_i(s')}
101-
\]
100+
\end{equation}
102101
This needs to be performed for \emph{each} state during \emph{each} iteration. It is clear how quickly this becomes
103102
intractable as well when $\gamma$ is reasonably close to 1, meaning that also long-term rewards are taken into
104103
consideration.
@@ -133,10 +132,9 @@ \subsection{Temporal Difference Learning}%
133132
is called a trial
134133
\footnote{in newer \ac {RL} literature this is also called a \emph{trajectory} \citep{proximalpolicyopt, heess2017emergence} }.
135134
The update rule for the utility of each state is as follows:
136-
\[
135+
\begin{equation}
137136
U^\pi(s) \gets U^\pi(s) + \alpha(R(s) + \gamma U^\pi(s') - U^\pi(s))
138-
\]
139-
137+
\end{equation}
140138
Where $\alpha$ is the learning rate and $U^\pi$ the utility under the execution of $\pi(s)$ in state $s$. This only
141139
updates the utilities based on the observed transitions so if the unknown transition function sometimes leads to
142140
extremely negative rewards through rare transitions, this is unlikely to be captured. However, with sufficiently many
@@ -148,9 +146,9 @@ \subsection{Exploration}%
148146

149147
The above learning approach has one weakness: It is only based on observed utilities. If $\pi$ follows the pattern of
150148
always choosing the action that leads to the highest expected $U_{i+1}$, i.e.
151-
\[
149+
\begin{equation}
152150
\pi(s) = \max_{a \in \mathcal{A}(s)}P(s' \mid s, a)U(s')
153-
\]
151+
\end{equation}
154152
then it will never explore possible alternatives and will very quickly get stuck on a rigid action
155153
pattern mapping each state to a resulting action. To avoid this, the concept of \emph{exploration} has been introduced.
156154
There are many approaches to encourage exploration. The simplest is to define a factor $\epsilon$ which defines the
@@ -184,18 +182,18 @@ \subsection{Q Learning}%
184182
(i.e. learn what a good policy is), this becomes problematic if the transition function is not known. An alternative
185183
model is called \emph{Q-Learning} which is a form of Temporal Difference Learning. It learns an action-utility value
186184
instead of simply the values. The relationship between this \emph{Q-Value} and the former value of a state is simply
187-
\[
185+
\begin{equation}
188186
U(s) = \max_{a}Q(s,a)
189-
\]
187+
\end{equation}
190188
so the value of a state is that of the highest Q-Value. The benefit of this approach is that it does not require a model
191189
of how the world works, it therefore is called a \emph{model-free} method. The update rule for the Q-Values is simply
192190
the Bellman equation with $U(s)$ and $U(s')$ replaced with $Q(s,a)$ and $Q(s',a')$ respectively.
193191

194192
The update rules for the Q-Value approach are related to the Temporal Difference Learning rules but include a $\max$
195193
operator
196-
\[
194+
\begin{equation}
197195
Q(s,a) \gets Q(s,a) + \alpha(R(s) + \gamma \max_{a'}Q(s', a') - Q(s,a))
198-
\]
196+
\end{equation}
199197
An alternative version is the reduction of the above equation by removing the $\max$ operator. This results in the
200198
\emph{actual} action being considered instead of the one that the policy believes to be the best. Q-Learning is
201199
\emph{off-policy} while the latter version, called \ac {SARSA}, is \emph{on-policy}. The distinction has a significant
@@ -233,9 +231,9 @@ \subsection{Policy Search and Policy Gradient Methods}%
233231
$\hat{A}_t$ to create an estimator for the policy gradient:
234232

235233

236-
\begin{equation*}
234+
\begin{equation}
237235
\hat{g} \ =\ \hat{\mathbb{E}}_{t} \ \left[ \nabla _{\theta }\log \pi _{\theta }( a_{t} \ \mid s_{t})\hat{A}_{t} \right]
238-
\end{equation*}
236+
\end{equation}
239237

240238
where $\hat{A}_t$ describes the advantage of taking one action over another in a given state. It can therefore be
241239
described as an \emph{actor-critic architecture}, because $A(a_t, s_t) = Q(a_t,s_t) - V(s_t)$, meaning that the

src/head.tex

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
\usepackage{tabularx}
33
\usepackage{minted}
44
\usepackage{amsmath}
5-
\usepackage[nolist,nohyperlinks]{acronym}
5+
\usepackage[printonlyused, nohyperlinks]{acronym}
66
\usepackage{amssymb}
77
\usepackage{listings}
88
\input{snippets/tikz.tex}
@@ -17,30 +17,30 @@
1717
\usepackage[bookmarks]{hyperref}
1818
\graphicspath{{img/}}
1919

20-
\input{acronyms.tex}
2120

2221
%adapting the article class to Ketter requirements
2322
%\usepackage{showframe}
2423
\usepackage[left=5cm, top=2cm, bottom=2cm, right=2cm]{geometry}
2524
\usepackage{setspace}
2625
\onehalfspacing
2726

28-
\lstset{
29-
basicstyle=\footnotesize, % the size of the fonts that are used for the code
30-
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
31-
breaklines=true, % sets automatic line breaking
32-
captionpos=b, % sets the caption-position to bottom
33-
% deletekeywords={...}, % if you want to delete keywords from the given language
34-
% escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
35-
% frame=single, % adds a frame around the code
36-
keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
37-
% keywordstyle=\color{blue}, % keyword style
38-
numbers=left, % where to put the line-numbers; possible values are (none, left, right)
39-
numbersep=5pt, % how far the line-numbers are from the code
40-
rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
41-
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
42-
showstringspaces=false, % underline spaces within strings only
43-
showtabs=false, % show tabs within strings adding particular underscores
44-
stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered
45-
tabsize=2, % sets default tabsize to 2 spaces
27+
\lstset{
28+
basicstyle=\footnotesize, % the size of the fonts that are used for the code
29+
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
30+
breaklines=true, % sets automatic line breaking
31+
captionpos=b, % sets the caption-position to bottom
32+
% deletekeywords={...}, % if you want to delete keywords from the given language
33+
% escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
34+
% frame=single, % adds a frame around the code
35+
keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
36+
% keywordstyle=\color{blue}, % keyword style
37+
numbers=left, % where to put the line-numbers; possible values are (none, left, right)
38+
numbersep=5pt, % how far the line-numbers are from the code
39+
rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
40+
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
41+
showstringspaces=false, % underline spaces within strings only
42+
showtabs=false, % show tabs within strings adding particular underscores
43+
stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered
44+
tabsize=2, % sets default tabsize to 2 spaces
4645
}
46+

src/main.lol

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
\contentsline {listing}{\numberline {1}{\ignorespaces Basic Keras 2 layer dense NN example}}{23}{listing.1}
2+
\contentsline {listing}{\numberline {2}{\ignorespaces Click sample declaration}}{23}{listing.2}
3+
\contentsline {listing}{\numberline {3}{\ignorespaces Turning the current server snapshot into a docker image}}{28}{listing.3}

src/main.tex

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,11 @@
77
\input{cover.tex}
88
\pagenumbering{Roman}
99
\input{abstract.tex}
10-
\printglossaries
10+
\printacronyms
11+
\listoffigures
12+
\listoftables
13+
\listoflistings
14+
\input{acronyms.tex}
1115
\input{content.tex}
1216
\pagenumbering{arabic}
1317
\input{chaps/body.tex}

0 commit comments

Comments
 (0)