Skip to content

Commit 4d9c72f

Browse files
Pascal BrokmeierPascal Brokmeier
authored andcommitted
added further tech in tools
1 parent 3fbbba9 commit 4d9c72f

File tree

3 files changed

+113
-29
lines changed

3 files changed

+113
-29
lines changed

src/bibliography.bib

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ @misc{docker
44
author = {Docker Inc},
55
note = {Accessed: 2018-04-20}
66
}
7+
@misc{openai2500,
8+
title = {Scaling Kubernetes to 2,500 Nodes},
9+
howpublished = {\url{https://blog.openai.com/scaling-kubernetes-to-2500-nodes/}},
10+
author = {Berner, Christopher},
11+
note = {Accessed: 2018-04-20}
12+
}
13+
14+
@misc{vagrant,
15+
title = {Vagrant},
16+
howpublished = {\url{https://www.vagrantup.com/intro/index.html}},
17+
author = {HashiCorp},
18+
note = {Accessed: 2018-04-20}
19+
}
720

821
@misc{criu,
922
title = {CRIU},
@@ -538,3 +551,11 @@ @misc{plappert2016kerasrl
538551
journal = {GitHub repository},
539552
howpublished = {\url{https://github.com/keras-rl/keras-rl}},
540553
}
554+
555+
@article{parisotto2015actor,
556+
title = {Actor-mimic: Deep multitask and transfer reinforcement learning},
557+
author = {Parisotto, Emilio and Ba, Jimmy Lei and Salakhutdinov, Ruslan},
558+
journal = {arXiv preprint arXiv:1511.06342},
559+
year = {2015}
560+
}
561+

src/chaps/body.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ \section{Reinforcement Learning}
3737

3838
%\section{Competitive Simulations}%as a tool of experimental research into AI
3939

40-
\section{\ac {PowerTAC}: A Competitive Simulation}
40+
\section{PowerTAC: A Competitive Simulation}
4141
\input{chaps/powertac.tex}
4242

4343
\chapter{Implementation}

src/chaps/implementation.tex

Lines changed: 91 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
The following chapter will describe the concepts and reasons behind various components needed to allow a broker to
2-
leverage modern reinforcement learning tools in the \ac {PowerTAC} environment. Generally, current state-of-the-art
3-
algorithms for \ac {RL} are available in Python \citep{baselines}, leveraging both the TensorFlow library and, in one
4-
project, the Keras high-level abstraction library \citep{plappert2016kerasrl}.
2+
leverage modern reinforcement learning tools in the \ac {PowerTAC} environment. Current state-of-the-art
3+
algorithms for \ac {RL}, available in Python \citep{baselines}, are used. These leverage both the TensorFlow library and, in one project, the Keras high-level abstraction library \citep{plappert2016kerasrl}.
54

6-
In general, a considerable amount of work was invested into allowing for successful communication between an agent
7-
written in Python and the \ac {PowerTAC} systems which are Java based. The research and its results are summarized in
8-
Section~\ref{sec:connecting_python_agents_to_powertac}. Because of this additional complexity, the thesis was
5+
In general, a considerable amount of work was invested enabling communication between an agent
6+
written in Python and the \ac {PowerTAC} systems which are Java based. The preliminary research and its results are summarized in
7+
Section~\ref{sec:connecting_python_agents_to_powertac}. Because of this additional complexity, the practical part of the thesis was
98
restructured to allow for successful contribution to the \ac {RL} field by performing a form of \emph{what-if}
109
analysis in the wholesale market which is described in Section~\ref{sub:wholesale_market}. The Python environment has
1110
been constructed in a way to allow for future developers to leverage it as a framework for developing a fully capable
1211
agent that acts in all markets.
1312

14-
The overall architecture for the agent is composed of three key parts: The environment module which holds all known
15-
information about the environment of the broker. This is used to enable all learning components. A communcation module
16-
bridges the environment module and the \ac {PowerTAC} environment to hide communication overhead from the agent code.
17-
Finally, the agent components module holds all learning components such as the wholesale trader, the demand estimator
13+
The overall architecture for the agent is composed of three key modules.First, the environment module, which hosts all known
14+
information about the environment of the broker. This is used by all learning components. Second, A communication module
15+
bridges the environment module and the \ac {PowerTAC} environment to hide communication overhead from the agent code,
16+
letting the learning components access the environment as if it was not remotely defined.
17+
Third, the agent components module holds all learning components such as the wholesale trader, the demand estimator
1818
and the tariff manager. In the scope of this thesis, only the demand estimator and the wholesale trader were implemented
1919
but the framework allows for the additional components to be easily implemented. The architecture is visualized in
2020
Figure~\ref{fig:agentframework}.
@@ -33,6 +33,7 @@ \section{Tools}
3333
Keras and TensorFlow to allow for easy creation and adaption of the learning models,
3434
\ac {GRPC} to communicate with the Java components of the competition and
3535
\emph{Click} to create a CLI interface that allows the triggering of various components of the broker.
36+
3637
%TODO IF Kubernetes is used, I need to complete it. But what about CRIU?
3738
%Kubernetes to easily scale several instances across the cloud.
3839
%By transfering the components into the cloud, it is also
@@ -98,9 +99,12 @@ \subsection{CRIU}%
9899
of the \ac {PowerTAC} simulation in a given point in time. Because \ac{CRIU} is also integrated into Docker, creating
99100
containers for various components of the competition (i.e. server and brokers) and freezing all of them in a coordinated
100101
manner is very helpful. This allows for two "what if" scenarios to play out at a given point in time where the results
101-
can be compared \citep{criu}.
102+
can be compared \citep{criu}. A typical scenario for the technology is the live migration of running applications across
103+
server infrastructures. In theory, a checkpoint of an application allows the perfect recreation of the application state
104+
even after a complete reboot of the machine or the moving of the application to a different host with identical
105+
environment settings.
102106

103-
\subsection{Docker}%
107+
\subsection{Docker}
104108
\label{sub:docker}
105109

106110
Docker allows to create isolated, transferable images that include everything an application requires to run. A
@@ -110,11 +114,7 @@ \subsection{Docker}%
110114
application layer, letting all containers run in the same kernel and therefore makes use of the existing ressources in a
111115
more efficient way. Because \ac{CRIU} is integrated into Docker
112116
\footnote{at the time of writing, CRIU support is experimental in Docker},
113-
containers can be
114-
%TODO continue
115-
116-
117-
Docker
117+
containers can be stored to disk using the \emph{checkpoint} feature.
118118

119119
\section{Preprocessing}
120120

@@ -213,18 +213,78 @@ \section{Connecting Python agents to PowerTAC}%
213213
started again today, it might have been simpler to first define a set of message types in a language such as Protocoll
214214
Buffers, the underlying technology of \ac {GRPC}, but because all current systems rely on \ac {JMI} communication, it is
215215
better to manually recreate these translators. The \ac {XML} parsing libraries provided by Python can be used to parse
216-
the \ac {XML} that is received. \section{Parallelizing environments with Kubernetes}
216+
the \ac {XML} that is received.
217217

218-
\section{Learning Components}
218+
%\section{Parallelizing environments with Kubernetes}
219+
220+
\section{Using Docker to make the competition portable}%
221+
\label{sec:using_docker_to_make_the_competition_portable}
222+
223+
To run a competition on a local machine, one must install several components: Maven, Java 8 and all of the brokers as
224+
well as ones own technology stack. If the scale of this set of components exceeds the local computation power available,
225+
the stack needs to be moved to a machine in a server with sufficient computation power. While tools like Vagrant allow
226+
the configuration and setup of environments to quickly allow new developers to start working with a set of tools in a
227+
given project \citep{vagrant} , it requires virtual machines which have significant overhead in comparison to container
228+
technologies. If the competition is abstracted into docker images, tools like Kubernetes or Docker Compose can quickly
229+
instantiate a competition on any machine, given it has enough resources and a docker runtime installed \citep{docker}.
230+
231+
To create a Docker image for the server, the \texttt{Dockerfile} listed in Listing~\ref{lst:servertodocker} can be used
232+
\footnote{All resources regarding the container technologies can be found under \url{https://github.com/pascalwhoop/powertac-kubernetes}.
233+
234+
\begin{listing}[h]
235+
236+
\begin{minted}[linenos,numbersep=5pt,frame=lines,framesep=2mm]{Dockerfile}
237+
FROM openjdk:alpine
238+
LABEL maintainer=pascalwhoop
239+
LABEL name=powertac-server
240+
241+
#adding all the needed dependencies
242+
RUN apk add --no-cache bash vim git maven python
243+
244+
#download the server-distribution from github
245+
RUN mkdir data && \
246+
git clone https://github.com/powertac/server-distribution
247+
248+
#build once, saves all maven dependencies in image
249+
RUN cd server-distribution && \
250+
mvn -Pcli
251+
252+
WORKDIR /powertac/server-distribution
253+
COPY bootstrap-data.xml ./
254+
COPY init.sh ./
255+
COPY server.properties ./
256+
257+
EXPOSE 8080 61616
258+
#and start it up
259+
CMD /powertac/server-distribution/init.sh
260+
\end{minted}
261+
\caption{Turning the current server snapshot into a docker image}
262+
\label{lst:servertodocker}
263+
\end{listing}
219264

265+
The benefit of this: Tools like Kubernetes or Docker Swarm, both being open source enterprise level container management
266+
software, seamlessly allow for the creation of 1, 10 or 1000 instances. OpenAI, a deep learning research company, has
267+
successfully scaled Kubernetes to 2500 nodes to run their deep \ac{RL} learning systems \citep{openai2500}. As
268+
previously mentioned, Docker also integrates \ac{CRIU} which is required for the creation of snapshots of competition
269+
states.
270+
271+
%TODO implement redis as base for communication between components
272+
%\subsection{Redis and component messaging}%
273+
%\label{sub:redis_and_messaging}
274+
275+
276+
\section{Learning Components}
277+
\label{sec:learning_components}
220278
The components of the agent that have learning capabilities include:
221279

222280
\begin{description}
281+
%TODO will i still get to implement this? simply mimick an agents tariffs ... shouldn't be hard
223282
\item[Customer Market]: Generates actions in respect to the tariff market such as publishing,
224283
adapting and revoking tariffs. While the component is expected to have a positive impact on the performance of
225284
the broker, it was just implemented with a basic functionality of publishing the same tariffs as a selected
226285
competitive broker, mimicking the competing brokers portfolio. It also creates usage predictions for a set of
227-
customers for other components.
286+
customers for other components. Generally, the framework intends a tariff fitness evaluation as well as tariff
287+
selection component that weighs both competitiveness and expected profitability.
228288

229289
\item[Wholesale Market]: Places bids and asks for energy in the periodic double
230290
auction type market \citep{ketter2018powertac}. The component employs \ac {RL} techniques and uses the
@@ -235,14 +295,17 @@ \section{Learning Components}
235295

236296
While \citep{tactexurieli2016mdp} have defined the entire simulation as a \ac {POMDP} (although they interpret it as a
237297
\ac {MDP} for ease of implementation) with all three markets integrated into one problem, I believe breaking the problem
238-
into disjunct subproblems is a better approach as each of them can be looked at in separation and a learning algorithm
239-
can be applied to improve performance without needing to consider potentially other areas of decision making. One such
240-
example is the estimation of fitness for a given tariff in a given environment. A tariffs' competitiveness in a given
241-
environment is independent of the wholesale or balancing trading strategy of the agent since the customers do not care
242-
about the profitability of the agent or how often it receives balancing penalties. While the broker might incur large
243-
losses if a tariff is too competitive (by offering prices that are below the profitablity line of the broker), such a
244-
tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the tariffs
245-
to actually offer on the market is a separate problem.
298+
into disjunct sub-problems is a better approach as each of them can be looked at in separation and a learning algorithm
299+
can be applied to improve performance without needing to consider potentially other areas of decision making. A
300+
subsequent algorithm could then be trained to perform the same actions as one unified decision making system according
301+
to the concepts of \emph{Curriculum Learning}\citep{matiisen2017teacher} and \emph{Transfer Learning}
302+
\citep{parisotto2015actor}. Such a unified algorithm is not part of this work.
303+
To justify this separation of concerns, I refer to the estimation of fitness for a given tariff in a given environment. A tariffs' competitiveness in a
304+
given environment is independent of the wholesale or balancing trading strategy of the agent since the customers do not
305+
care about the profitability of the agent or how often it receives balancing penalties. While the broker might incur
306+
large losses if a tariff is too competitive (by offering prices that are below the profitability line of the broker),
307+
such a tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the
308+
tariffs to actually offer on the market is a separate problem, that balances competitiveness against profitability.
246309

247310
\subsection{Tariff Market}
248311

0 commit comments

Comments
 (0)