Skip to content

Commit fd31615

Browse files
committed
added formula, changing workplace
1 parent 2ccde3d commit fd31615

File tree

2 files changed

+35
-8
lines changed

2 files changed

+35
-8
lines changed

src/chaps/implementation.tex

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,8 @@ \subsubsection{Customer demand estimation}% \label{ssub:customer_demand_estimati
369369

370370
This component has no dependencies onto the other learning components and can easily be trained using historical data.
371371
It is therefore a supervised learning algorithm, matching known information in timestep $t-n$ to a prediction for the
372-
expected energy usage at timestep $t$. Known information includes: Weather forecasts, historical usages, time, tariff
372+
expected energy usage at timestep $t$. Because there are several games on record, the historical realized usages are the
373+
labels for the supervised learning problem. Known information includes: Weather forecasts, historical usages, time, tariff
373374
and customer metadata.
374375
If the customer models change across games (e.g. if a customer suddenly uses 10x the energy on rainy days), the learning
375376
model will have to learn to adapt to this change. This can be achieved by letting the model both learn from historical
@@ -379,15 +380,17 @@ \subsubsection{Customer demand estimation}% \label{ssub:customer_demand_estimati
379380
To train a model that predicts the demand amounts of customers under various conditions, a dataset of features and
380381
labels needs to be created. Because the model may also learn during the course of a running competition, a generator based structure should be preferred. This means that a generator
381382
exists that creates $x, y$ pairs for the model to train on, instead of creating a large batch of learning data ahead of
382-
the learning processing.
383+
the learning processing, which is otherwise a common practice. Whenever a round completes and new information is
384+
available, the demand estimator is asked to estimate the demand for all customers subscribed to the tariffs of the
385+
broker for the next 24 timesteps. These estimations are then saved (i.e. they replace any previous estimations) and the
386+
wholesale component as well as other components can act on this newly created estimations.
383387

384-
%TODO STOP
385388
According to the simulation specification, the customer models generate their demand pattern based on their internal
386-
structure, broker factors and game factors \citep[]{ketter2018powertac}. The preprocessing pipeline therefore generates
389+
structure, broker factors and game factors \citep[]{ketter2018powertac}. The preprocessing pipeline of the generator therefore generates
387390
feature-label pairs that include: Customer, tariff, weather, time and demand information. The realized demand is the
388391
label while all other components are part of the features that are used to train the model. The intuitive model class
389392
for demand patterns prediction are \ac {RNN} due to the sequential nature of the problem \citep[]{EvalGRU2014}. However,
390-
as will later be shown, the implementation of relatively shallow dense classic \ac {NN} also results in decent results.
393+
as will be shown later, the implementation of relatively shallow dense classic \ac {NN} also results in decent results.
391394

392395
\begin{figure}[h] \centering \includegraphics[width=0.8\linewidth]{img/UsageEstimator.png} \caption{Demand Estimator
393396
structure} \label{fig:DemandEstimator} \end{figure}
@@ -403,14 +406,38 @@ \subsubsection{Customer demand estimation}% \label{ssub:customer_demand_estimati
403406
customers usage patterns in either setting. During a competition, the agent may learn from the realized usage of
404407
customers after each time slot is completed. Because this process may require some ressources, it is advantageous to
405408
first perform the prediction of the subscribed customers demands for the current time slot to pass this information to
406-
the wholesale component before training the model on the received meter readings \footnote{The component code can be
409+
the wholesale component before training the model on the received meter readings. While the broker is waiting for the
410+
server to process a step in the game, it can perform any learning on newly received information. \footnote{The component code can be
407411
found under \url{https://github.com/pascalwhoop/broker-python/tree/master/agent_components/demand}}.
408412

413+
%TODO write final model structure
409414

410415
\subsection{Wholesale Market}
411416
\label{sub:wholesale_market}
412417

413-
418+
To approach the wholesale trading problem, the definition of the trading problem developed by
419+
\citet{tactexurieli2016mdp} is assumed. This defines each of the target timeslots as a \ac{MDP} with 24 states before
420+
termination. As previously mentioned, \citet{tactexurieli2016mdp} define the entire simulation as a unified \ac{MDP}.
421+
For the wholesale market, I therefore consider a subset variant of this \ac{MDP} where the action space is defined as
422+
two continuous variables, the energy limit price $P^o_i$ and the energy amount $Q^o_i$. Each target timeslot is regarded as
423+
an independent \ac{MDP}. The agent progresses through the states towards the terminal state which is the step at which
424+
the balancing market determines an ultimate balancing charge. The reward for the agent is received at the terminal state
425+
and is defined as the relation of the average price paid per kWh by the agent in relation to the average market kWh
426+
price for a given timestep. This removes any bias in the reward introduced by market price fluctuations. To calculate
427+
the reward, the following function is used:
428+
429+
\begin{equation}
430+
\label{eq:Reward function for agent in wholesale trading environment}
431+
%average price paid per kWh by broker
432+
\overline{P^{r}} =\frac{\sum ^{1}_{i=24} P^{r}_{i} *Q^{r}_{i}}{\sum ^{1}_{i=24} Q^{r}_{i}}
433+
\\
434+
%relationship between average price paid by broker and average market price for target timeslot
435+
\frac{\overline{P^{r}}}{\overline{P^{m}}}
436+
437+
%TODO encouraging for exploration injecting into rewards
438+
%TODO DONE
439+
440+
\end{equation}
414441

415442
Using \ac {MDP}
416443

src/chaps/reinforcement.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
\citep[p.830f.]{russell2016artificial}.
1313

1414
%TODO still up to date with the following subsections?
15-
This chapter will first introduce the concepts of a \ac {MDP}, then introduce different concepts of \ac {RL} agents,
15+
This section will first introduce the concepts of a \ac {MDP}, then introduce different concepts of \ac {RL} agents,
1616
describe approaches to encourage exploration of its options and finally describe how \ac {NN} can be used to create
1717
state-of-the-art agents that can solve complex tasks. The majority of the chapter is based on
1818
chapters 17 and 21 of \citet[]{russell2016artificial} unless otherwise marked.

0 commit comments

Comments
 (0)