You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/chaps/implementation.tex
+91-28Lines changed: 91 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
The following chapter will describe the concepts and reasons behind various components needed to allow a broker to
2
-
leverage modern reinforcement learning tools in the \ac {PowerTAC} environment. Generally, current state-of-the-art
3
-
algorithms for \ac {RL} are available in Python \citep{baselines}, leveraging both the TensorFlow library and, in one
4
-
project, the Keras high-level abstraction library \citep{plappert2016kerasrl}.
2
+
leverage modern reinforcement learning tools in the \ac {PowerTAC} environment. Current state-of-the-art
3
+
algorithms for \ac {RL}, available in Python \citep{baselines}, are used. These leverage both the TensorFlow library and, in one project, the Keras high-level abstraction library \citep{plappert2016kerasrl}.
5
4
6
-
In general, a considerable amount of work was invested into allowing for successful communication between an agent
7
-
written in Python and the \ac {PowerTAC} systems which are Java based. The research and its results are summarized in
8
-
Section~\ref{sec:connecting_python_agents_to_powertac}. Because of this additional complexity, the thesis was
5
+
In general, a considerable amount of work was invested enabling communication between an agent
6
+
written in Python and the \ac {PowerTAC} systems which are Java based. The preliminary research and its results are summarized in
7
+
Section~\ref{sec:connecting_python_agents_to_powertac}. Because of this additional complexity, the practical part of the thesis was
9
8
restructured to allow for successful contribution to the \ac {RL} field by performing a form of \emph{what-if}
10
9
analysis in the wholesale market which is described in Section~\ref{sub:wholesale_market}. The Python environment has
11
10
been constructed in a way to allow for future developers to leverage it as a framework for developing a fully capable
12
11
agent that acts in all markets.
13
12
14
-
The overall architecture for the agent is composed of three key parts: The environment module which holds all known
15
-
information about the environment of the broker. This is used to enable all learning components. A communcation module
16
-
bridges the environment module and the \ac {PowerTAC} environment to hide communication overhead from the agent code.
17
-
Finally, the agent components module holds all learning components such as the wholesale trader, the demand estimator
13
+
The overall architecture for the agent is composed of three key modules.First, the environment module, which hosts all known
14
+
information about the environment of the broker. This is used by all learning components. Second, A communication module
15
+
bridges the environment module and the \ac {PowerTAC} environment to hide communication overhead from the agent code,
16
+
letting the learning components access the environment as if it was not remotely defined.
17
+
Third, the agent components module holds all learning components such as the wholesale trader, the demand estimator
18
18
and the tariff manager. In the scope of this thesis, only the demand estimator and the wholesale trader were implemented
19
19
but the framework allows for the additional components to be easily implemented. The architecture is visualized in
20
20
Figure~\ref{fig:agentframework}.
@@ -33,6 +33,7 @@ \section{Tools}
33
33
Keras and TensorFlow to allow for easy creation and adaption of the learning models,
34
34
\ac {GRPC} to communicate with the Java components of the competition and
35
35
\emph{Click} to create a CLI interface that allows the triggering of various components of the broker.
36
+
36
37
%TODO IF Kubernetes is used, I need to complete it. But what about CRIU?
37
38
%Kubernetes to easily scale several instances across the cloud.
38
39
%By transfering the components into the cloud, it is also
@@ -98,9 +99,12 @@ \subsection{CRIU}%
98
99
of the \ac {PowerTAC} simulation in a given point in time. Because \ac{CRIU} is also integrated into Docker, creating
99
100
containers for various components of the competition (i.e. server and brokers) and freezing all of them in a coordinated
100
101
manner is very helpful. This allows for two "what if" scenarios to play out at a given point in time where the results
101
-
can be compared \citep{criu}.
102
+
can be compared \citep{criu}. A typical scenario for the technology is the live migration of running applications across
103
+
server infrastructures. In theory, a checkpoint of an application allows the perfect recreation of the application state
104
+
even after a complete reboot of the machine or the moving of the application to a different host with identical
105
+
environment settings.
102
106
103
-
\subsection{Docker}%
107
+
\subsection{Docker}
104
108
\label{sub:docker}
105
109
106
110
Docker allows to create isolated, transferable images that include everything an application requires to run. A
@@ -110,11 +114,7 @@ \subsection{Docker}%
110
114
application layer, letting all containers run in the same kernel and therefore makes use of the existing ressources in a
111
115
more efficient way. Because \ac{CRIU} is integrated into Docker
112
116
\footnote{at the time of writing, CRIU support is experimental in Docker},
113
-
containers can be
114
-
%TODO continue
115
-
116
-
117
-
Docker
117
+
containers can be stored to disk using the \emph{checkpoint} feature.
118
118
119
119
\section{Preprocessing}
120
120
@@ -213,18 +213,78 @@ \section{Connecting Python agents to PowerTAC}%
213
213
started again today, it might have been simpler to first define a set of message types in a language such as Protocoll
214
214
Buffers, the underlying technology of \ac {GRPC}, but because all current systems rely on \ac {JMI} communication, it is
215
215
better to manually recreate these translators. The \ac {XML} parsing libraries provided by Python can be used to parse
216
-
the \ac {XML} that is received. \section{Parallelizing environments with Kubernetes}
216
+
the \ac {XML} that is received.
217
217
218
-
\section{Learning Components}
218
+
%\section{Parallelizing environments with Kubernetes}
219
+
220
+
\section{Using Docker to make the competition portable}%
While \citep{tactexurieli2016mdp} have defined the entire simulation as a \ac {POMDP} (although they interpret it as a
237
297
\ac {MDP} for ease of implementation) with all three markets integrated into one problem, I believe breaking the problem
238
-
into disjunct subproblems is a better approach as each of them can be looked at in separation and a learning algorithm
239
-
can be applied to improve performance without needing to consider potentially other areas of decision making. One such
240
-
example is the estimation of fitness for a given tariff in a given environment. A tariffs' competitiveness in a given
241
-
environment is independent of the wholesale or balancing trading strategy of the agent since the customers do not care
242
-
about the profitability of the agent or how often it receives balancing penalties. While the broker might incur large
243
-
losses if a tariff is too competitive (by offering prices that are below the profitablity line of the broker), such a
244
-
tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the tariffs
245
-
to actually offer on the market is a separate problem.
298
+
into disjunct sub-problems is a better approach as each of them can be looked at in separation and a learning algorithm
299
+
can be applied to improve performance without needing to consider potentially other areas of decision making. A
300
+
subsequent algorithm could then be trained to perform the same actions as one unified decision making system according
301
+
to the concepts of \emph{Curriculum Learning}\citep{matiisen2017teacher} and \emph{Transfer Learning}
302
+
\citep{parisotto2015actor}. Such a unified algorithm is not part of this work.
303
+
To justify this separation of concerns, I refer to the estimation of fitness for a given tariff in a given environment. A tariffs' competitiveness in a
304
+
given environment is independent of the wholesale or balancing trading strategy of the agent since the customers do not
305
+
care about the profitability of the agent or how often it receives balancing penalties. While the broker might incur
306
+
large losses if a tariff is too competitive (by offering prices that are below the profitability line of the broker),
307
+
such a tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the
308
+
tariffs to actually offer on the market is a separate problem, that balances competitiveness against profitability.
0 commit comments