Skip to content

Commit 2afa67a

Browse files
committed
added implementation of demand estimator
1 parent e9ecb7e commit 2afa67a

File tree

11 files changed

+82
-13
lines changed

11 files changed

+82
-13
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ tex/MITVersion/
88
.DS_STORE
99
node_modules/
1010
*.swp
11+
*.swo
1112

1213
#latex ignores
1314
*.glsdefs

src/chaps/backpropagation.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@
2929
concept of gradient descent algorithms. Because the activation function is most often \emph{soft}, to ensure
3030
differentiability and because a hard threshold creates a non-continuous function, the process of fitting the weights to
3131
minimize loss is called logistic regression \cite[p.729f.]{russell2016artificial}. For a detailed explanation of the gradient
32-
descent approach, I will refer to the works of \citeauthor{russell2016artificial} as well as
33-
\citeauthor{Goodfellow-et-al-2016}.
32+
descent approach, I will refer to the works of \citet{russell2016artificial} as well as
33+
\citet{Goodfellow-et-al-2016}.
3434

3535

3636

src/chaps/body.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ \subsection{Learning Neural Networks and Backpropagation}
2020
\label{sec:Backpropagation}
2121
\input{chaps/backpropagation.tex}
2222

23+
\section{Recurrent Neural Networks}%
24+
\label{sec:recurrent_neural_networks}
25+
\input{chaps/recurrentnn.tex}
26+
2327
%TODO is this part of AI?
2428
\chapter{Reinforcement Learning}
2529
\section{Policy Search}

src/chaps/implementation.tex

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,17 @@ \section{Preprocessing}
3535
After the translation, the data is usually structured in a multi-dimensional array which can be read by numpy and
3636
processed with Keras. First, some preprocessing can be applied with scikit-learn to analyze the structure of the data as
3737
well as ensure the values that are fed to the \ac {NN} don't negatively impact the learning progress. The overall
38-
approach follows the recommendations of \citeauthor{Goodfellow-et-al-2016}.
38+
approach follows the recommendations of \citet{Goodfellow-et-al-2016}.
3939

4040
\section{Connecting Python agents to PowerTAC}
4141

42-
To connect an agent based on Python to the \ac{PowerTAC} systems, a new adapter needs to be developed. In 2018, a simple bridge was provided by the team that allowed external processes to communicate with the system through a bridge via the provided sample-broker. All messages received by the broker are written to a First in First Out pipe on the local file system and a second pipe is created to read messages from the external process. To also allow network based access, I created an alternative which is based on \ac{GRPC} to transmit the messages between the adapter and the final client. This lets many different languages communicate with the adapter via network connections \footnote{https://github.com/powertac/broker-adapter}
42+
To connect an agent based on Python to the \ac{PowerTAC} systems, a new adapter needs to be developed. In 2018, a simple
43+
bridge was provided by the team that allowed external processes to communicate with the system through a bridge via the
44+
provided sample-broker. All messages received by the broker are written to a First in First Out pipe on the local file
45+
system and a second pipe is created to read messages from the external process. To also allow network based access, I
46+
created an alternative which is based on \ac{GRPC} to transmit the messages between the adapter and the final client.
47+
This lets many different languages communicate with the adapter via network connections
48+
\footnote{\url{https://github.com/powertac/broker-adapter-grpc} }
4349

4450
Because the programming language is different from the supplied sample-broker, many of the domain objects need to be redefined and some code redeveloped. The classes in \ac {PowerTAC} which are transfered between the client and the server are all annotated so that the xml serializer can translate between the xml and object variants without errors. This helps to recreate a similar functionality for the needed classes in the python environment. If the project was started again today, it might have been simpler to first define a set of message types in a language such as Protocoll Buffers, the underlying technology of \ac {GRPC}, but because all current systems rely on \ac {JMS} communication, it is better to manually recreate these translators. The \ac {XML} parsing libraries provided by Python can be used to parse the \ac {XML} that is received.
4551
\section{Paralleling environments with Kubernetes}
@@ -56,15 +62,15 @@ \section{Agent Models}
5662

5763
-- high-level agent \ac {RL} problem
5864

59-
While \citeauthor{tactexurieli2016mdp} have defined the entire simulation as a \ac {POMDP} with all three markets
65+
While \citet{tactexurieli2016mdp} have defined the entire simulation as a \ac {POMDP} (although they interpret it as a \ac {MDP} for ease of implementation) with all three markets
6066
integrated into one problem, I believe breaking the problem into disjunct subproblems is a better approach as each of
6167
them can be looked at in separation and a learning algorithm can be applied to improve performance without needing to
6268
consider potentially other areas of decision making. One such example is the estimation of fitness for a given tariff in
6369
a given environment. A tariffs' competitiveness in a given environment is independent of the wholesale or balancing
6470
trading strategy of the agent since the customers do not care about the profitability of the agent or how often it
65-
receives balancing penalties.
71+
receives balancing penalties. While the broker might incur large losses if a tariff is too competitive (by offering prices that are below the profitablity line of the broker), such a tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the tariffs to actually offer on the market is a separate problem.
6672

67-
\subsection{Customer Market}
73+
\subsection{Tariff Market}
6874

6975
The goal of the customer market is to get as many subscribers as possible for the most profitable tariffs the broker
7076
offers on the market. The tariffs offered in the market compete for the limited number of customers available and every
@@ -91,10 +97,11 @@ \subsection{Customer Market}
9197
%tariffs %TODO really, I go genetic?
9298
\end{enumerate}
9399

100+
%TODO not yet actually realized, still applicable?
94101
\subsubsection{Tariff fitness learning}
95102
To learn the fitness of a tariff while considering its environment, supervised learning techniques can be applied. To do
96103
this, features need to be created from the tariffs specifications and its competitive environment. Similar work has been
97-
done by \citeauthor{cuevas2015distributed} who discretized the tariff market in four variables describing the
104+
done by \citet{cuevas2015distributed} who discretized the tariff market in four variables describing the
98105
relationships between the competitors and their broker.
99106

100107
For my broker, because \ac {NN} can handle a large state spaces, I create a more detailed description of the
@@ -124,6 +131,26 @@ \subsubsection{Tariff fitness learning}
124131
% how to avoid overwhelming of agent? output layer must be fairly large.
125132
%
126133
% time, energy, money, communication dimensions (and subdimensions)
134+
\subsubsection{Customer demand estimation}%
135+
\label{ssub:customer_demand_estimation}
136+
137+
The simplest learning component is the demand estimator. This component has no dependencies onto the other learning components and can easily be trained using historical data. This is due to the fact that the demand of a customer is only dependent on variables that are already provided in the state files of previous simulations. A customer will not use a different amount of energy if the broker implementation changes but all other variables (such as subscribed tariff, weather etc.) remain equal .
138+
139+
To train a model that predicts the demand amounts of customers under various conditions, a dataset of features and labels needs to be created. Because the model may also learn during the course of a running competition (allowing the model to adapt to new customer patterns), a generator based structure should be preferred. This means that a generator exists that creates $x, y$ pairs for the model to train on.
140+
141+
According to the simulation specification, the customer models generate their demand pattern based on their internal structure, broker factors and game factors \cite[]{ketter2018powertac}. The preprocessing pipeline therefore generates feature-label pairs that include: Customer, tariff, weather, time and demand information. The realized demand is the label while all other components are part of the features that are used to train the model. The intuitive model class for demand patterns prediction are \ac {RNN} due to the sequential nature of the problem \cite[]{EvalGRU2014}. However, as will later be shown, the implementation of relatively shallow dense classic \ac {NN} also results in decent results.
142+
143+
\begin{figure}[h]
144+
\centering
145+
\includegraphics[width=0.8\linewidth]{img/UsageEstimator.png}
146+
\caption{Demand Estimator structure}
147+
\label{fig:DemandEstimator}
148+
\end{figure}
149+
150+
151+
The overall structure of the demand estimator component is shown in Figure~\ref{fig:DemandEstimator}. The model can be both trained offline based on the state files as well as online during the competition. This is possible because in both situations, the environment model of the agent is a continuous representation of the agents knowledge about the world. In fact, during the state file parsing, the environment may even hold information that the agent usually cannot observe in a competition environment. This is also the case for the demand learning, as the state files hold the demand realizations of all customers while the server during the competition only transmits the usage realizations of the customers that are subscribed to the agents tariffs. Regardless, this does not affect the ability to learn from the customers usage patterns in either setting. During a competition, the agent may learn from the realized usage of customers after each time slot is completed. Because this process may require some ressources, it is advantageous to first perform the prediction of the subscribed customers demands for the current time slot to pass this information to the wholesale component before training the model on the received meter readings
152+
\footnote{The component code can be found under \url{https://github.com/pascalwhoop/broker-python/tree/master/agent_components/demand}}.
153+
127154

128155
\subsection{Wholesale Market}
129156
\subsection{Balancing Market}

src/chaps/learning.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
According to \citeauthor{russell2016artificial}, learning agents are those that
1+
According to \citet{russell2016artificial}, learning agents are those that
22
\emph{improve their performance on future tasks after making observations about
33
the world} \cite[p.693]{russell2016artificial}. Learning behavior is present in
44
many species most notably humans. To create a learning algorithm means that the

src/chaps/neuralnetworks.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
\begin{figure}[]
1111
\centering
1212
\includegraphics[width=0.8\linewidth]{img/perceptron.png}
13-
\caption{Model of the perceptron, taken from \citeauthor{russell2016artificial}.}
13+
\caption{Model of the perceptron, taken from \citet{russell2016artificial}.}
1414
\label{fig:perceptron}
1515
\end{figure}
1616

@@ -34,7 +34,7 @@
3434
\begin{figure}[]
3535
\centering
3636
\includegraphics[width=0.3\linewidth]{img/multilayer_nn.png}
37-
\caption{Multi-layer neural network from \citeauthor{bengio2009learning} }
37+
\caption{Multi-layer neural network from \citet{bengio2009learning} }
3838
\label{fig:multilayernn}
3939
\end{figure}
4040

src/chaps/recurrentnn.tex

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
As was already noted in the previous chapter, \ac {NN} can be both acyclic and cyclic graphs. The
2+
\emph{vanilla} \ac {NN} is usually considered to be a acyclic feed-forward network, as it has no internal state and is
3+
therefore more suited to describe the concepts of how the networks operate. Especially in translation and text to speech
4+
applications though, \ac {RNN} are very popular as they are able to act on previously seen information in a sequence of
5+
data. Generally they are suitable for many applications where the data has some kind of time-dependent embedding
6+
\cite[p.373]{Goodfellow-et-al-2016}.
7+
8+
A \ac {RNN}, therefore computes its output based on the weights $w_i$, commonly noted as $\theta$, it's current input
9+
$x^t$ and it's previous hidden units internal states $h^(t-1)$.
10+
11+
\[
12+
h^t = f(h^(t-1), x^t, \theta)
13+
\]
14+
15+
The network generally learns to use $h^t$ to encode previously seen aspects relevant to the current task, although this
16+
is inherently lossy as the previous number of inputs (i.e. $\mid t-1\mid$) is arbitrary. Figure~\ref{fig:rnn_concept}
17+
shows this concept.
18+
19+
\begin{figure}[]
20+
\centering
21+
\includegraphics[width=0.8\linewidth]{img/rnn_concept.png}
22+
\caption{A recurrent neural network conceptualized. \emph{Left}: Circuit diagram where the black square represents a
23+
1 time-step delay. \emph{Right:} The same network unfolded where each node represents a particular time instance.
24+
Taken from \citet{Goodfellow-et-al-2016}.}
25+
\label{fig:rnn_concept}
26+
\end{figure}
27+
28+
The network structure has two benefits: Firstly, it allows for arbitrary sequence length, as the network size is
29+
dependent on the time-step specific input and not on the number of previous timesteps. Secondly, the same network with
30+
the same weights (or in mathematical terms the same transition function $f$) can be used during each time-step. This
31+
means: When a \ac {RNN} is fed a sequence of data, the weights will stay the same throughout the sequence. They can be
32+
updated after the entire sequence has been processed.

src/chaps/supervisedlearning.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
future examples that might be of the same kind but not identical. Common
33
examples of this form of learning include object recognition in images or
44
time-series prediction. One of the most known examples to date is the Imagenet
5-
classification algorithm by \citeauthor{krizhevsky2012imagenet} which was one of
5+
classification algorithm by \citet{krizhevsky2012imagenet} which was one of
66
the first \ac {NN} based algorithms to break a classificatin high-score on a
77
popular image classification database. The goal is to correctly classify images
88
according to a set of defined labels. If a picture of a dog is read by the \ac

src/head.tex

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
\usepackage[nolist,nohyperlinks]{acronym}
44
\usepackage{listings}
55
\input{snippets/tikz.tex}
6-
\usepackage[numbers]{natbib}
6+
\usepackage[]{natbib}
77
\usepackage{float}
88
\usepackage{glossaries}
99
\usepackage[hyphens]{url}
@@ -16,3 +16,8 @@
1616

1717
\input{acronyms.tex}
1818

19+
%adapting the article class to Ketter requirements
20+
%\usepackage{showframe}
21+
\usepackage[left=5cm, top=2cm, bottom=2cm, right=2cm]{geometry}
22+
\usepackage{setspace}
23+
\onehalfspacing

src/img/Agent.png

136 KB
Loading

0 commit comments

Comments
 (0)