@@ -307,74 +307,81 @@ \section{Learning Components}
307307such a tariff would theoretically be quiet competitive and should therefore be rated as such. The question which of the
308308tariffs to actually offer on the market is a separate problem, that balances competitiveness against profitability.
309309
310- \subsection {Tariff Market }
311-
312- The goal of the customer market is to get as many subscribers as possible for the most profitable tariffs the broker
313- offers on the market. The tariffs offered in the market compete for the limited number of customers available and every
314- customer must be subscribed to some tariff. The profitability of tariffs is limited by the base tariff which is offered
315- by the simulation as a constant offering creating an upper bound on profitability.
316-
317- To succeed in the customer market, the agents needs to be able to generate tariffs that are competitive. This can be
318- broken down into two subtasks: Generating valid tariffs and evaluating their competitiveness. A tariff can be verified
319- by passing it to the \ac {PowerTAC} server which verifies the tariff. Hence, a \ac {RL} algorithm that is tasked with
320- creating competitive tariffs can be given feedback by penalizing non-conclusive tariffs. An invalid tariff could be one
321- that contains overlapping rates leading to an ambivalent status. The competitiveness of a tariff depends not only on the
322- attributes of the tariff but also on the competition environment. If the broker only competes against the default
323- tariffs, even many mediocre tariff offerings would perform well. In an environment with many competitors on the other
324- hand, a tariff needs to be well designed to generate profits.
325-
326- The agents learning task for the customer market is therefore designed in the following way:
327-
328- \begin {enumerate } \item Learning to evaluate a tariffs competitiveness in relation to the competitive environment
329- through supervised learning on the historical state logs of previous competitions \item Running a \ac {RL}
330- algorithm which learns to choose parameters for tariffs that are valid and profitable in a given environment
331- % \item Learning to generate valid tariff specifications through a genetic algorithm strategy, penalizing invalid
332- % tariffs %TODO really, I go genetic?
333- \end {enumerate }
334-
335- % TODO not yet actually realized, still applicable?
336- \subsubsection {Tariff fitness learning } To learn the fitness of a tariff while considering its environment, supervised
337- learning techniques can be applied. To do this, features need to be created from the tariffs specifications and its
338- competitive environment. Similar work has been done by \citep {cuevas2015distributed } who discretized the tariff market
339- in four variables describing the relationships between the competitors and their broker.
340-
341- For my broker, because \ac {NN} can handle a large state spaces, I create a more detailed description of the
342- environment. I still have to ensure the number of input features is fixed though, so a simple copy of all competing
343- tariffs is not a valid input for the environment description. Instead I create the following features from the tariff
344- market:
345-
346- \begin {description } \item [Average Charge per hour of week Timeslot]: According to \\
347- \texttt {TariffEvaluationHelper.java }, customer models evaluate tariffs on an per-hour basis. This means they are
348- very precise in the evaluation of potential tariff alternatives (before the application of an irrationality
349- factor). Hence, a per-hour precision in the input is needed. \item [Variance of Charge per hour of week
350- Timeslot] Variance of the tariffs charges per each timeslot in a week among all competitors. \item [Average and
351- Variance of periodic payments] Description of the markets periodic payments landscape \item [Average and Variance
352- of one-time payments] Description of the markets one-time payments landscape \item [Average and Variance of
353- Up/Down regulation payments] 0 for tariffs without regulation capabilities \end {description }
354-
355- Because the \ac {PowerTAC} simulation does not return profits of brokers on a per-tariff basis and because the reasons
356- for why a broker purchased a specific amount of energy on the wholesale market are not known, it is hard to put a
357- profitability value on a brokers tariff if said broker offers more than one tariff on the market. Therefore the
358- evaluation of the tariff does not include the profitability of the tariff but merely the competitiveness in regards to
359- the attractiveness of the offer from the perspective of the customers
310+ \subsection {Customer Market }
311+ \label {sub:customer_market }
312+
313+ % TODO background?
314+ % TODO not implemented
315+ % The goal of the customer market is to get as many subscribers as possible for the most profitable tariffs the broker
316+ % offers on the market. The tariffs offered in the market compete for the limited number of customers available and every
317+ % customer must be subscribed to some tariff. The profitability of tariffs is limited by the base tariff which is offered
318+ % by the simulation as a constant offering creating an upper bound on profitability.
319+ %
320+ % To succeed in the customer market, the agents needs to be able to generate tariffs that are competitive. This can be
321+ % broken down into two subtasks: Generating valid tariffs and evaluating their competitiveness. A tariff can be verified
322+ % by passing it to the \ac {PowerTAC} server which verifies the tariff. Hence, a \ac {RL} algorithm that is tasked with
323+ % creating competitive tariffs can be given feedback by penalizing non-conclusive tariffs. An invalid tariff could be one
324+ % that contains overlapping rates leading to an ambivalent status. The competitiveness of a tariff depends not only on the
325+ % attributes of the tariff but also on the competition environment. If the broker only competes against the default
326+ % tariffs, even many mediocre tariff offerings would perform well. In an environment with many competitors on the other
327+ % hand, a tariff needs to be well designed to generate profits.
328+ %
329+ % The agents learning task for the customer market is therefore designed in the following way:
330+ %
331+ % \begin{enumerate} \item Learning to evaluate a tariffs competitiveness in relation to the competitive environment
332+ % through supervised learning on the historical state logs of previous competitions \item Running a \ac {RL}
333+ % algorithm which learns to choose parameters for tariffs that are valid and profitable in a given environment
334+ % %\item Learning to generate valid tariff specifications through a genetic algorithm strategy, penalizing invalid
335+ % %tariffs %TODO really, I go genetic?
336+ % \end{enumerate}
337+ %
338+ % %TODO not yet actually realized, still applicable?
339+ % \subsubsection{Tariff fitness learning} To learn the fitness of a tariff while considering its environment, supervised
340+ % learning techniques can be applied. To do this, features need to be created from the tariffs specifications and its
341+ % competitive environment. Similar work has been done by \citep{cuevas2015distributed} who discretized the tariff market
342+ % in four variables describing the relationships between the competitors and their broker.
343+ %
344+ % For my broker, because \ac {NN} can handle a large state spaces, I create a more detailed description of the
345+ % environment. I still have to ensure the number of input features is fixed though, so a simple copy of all competing
346+ % tariffs is not a valid input for the environment description. Instead I create the following features from the tariff
347+ % market:
348+ %
349+ % \begin{description} \item[Average Charge per hour of week Timeslot]: According to \\
350+ % \texttt{TariffEvaluationHelper.java}, customer models evaluate tariffs on an per-hour basis. This means they are
351+ % very precise in the evaluation of potential tariff alternatives (before the application of an irrationality
352+ % factor). Hence, a per-hour precision in the input is needed. \item[Variance of Charge per hour of week
353+ % Timeslot] Variance of the tariffs charges per each timeslot in a week among all competitors. \item[Average and
354+ % Variance of periodic payments] Description of the markets periodic payments landscape \item[Average and Variance
355+ % of one-time payments] Description of the markets one-time payments landscape \item[Average and Variance of
356+ % Up/Down regulation payments] 0 for tariffs without regulation capabilities \end{description}
357+ %
358+ % Because the \ac {PowerTAC} simulation does not return profits of brokers on a per-tariff basis and because the reasons
359+ % for why a broker purchased a specific amount of energy on the wholesale market are not known, it is hard to put a
360+ % profitability value on a brokers tariff if said broker offers more than one tariff on the market. Therefore the
361+ % evaluation of the tariff does not include the profitability of the tariff but merely the competitiveness in regards to
362+ % the attractiveness of the offer from the perspective of the customers
360363% large space of decision variables / dimensions
361364%
362365% how to avoid overwhelming of agent? output layer must be fairly large.
363366%
364367% time, energy, money, communication dimensions (and subdimensions)
365368\subsubsection {Customer demand estimation }% \label{ssub:customer_demand_estimation}
366369
367- The simplest learning component is the demand estimator. This component has no dependencies onto the other learning
368- components and can easily be trained using historical data. This is due to the fact that the demand of a customer is
369- only dependent on variables that are already provided in the state files of previous simulations. A customer will not
370- use a different amount of energy if the broker implementation changes but all other variables (such as subscribed
371- tariff, weather etc.) remain equal .
370+ This component has no dependencies onto the other learning components and can easily be trained using historical data.
371+ It is therefore a supervised learning algorithm, matching known information in timestep $ t-n$ to a prediction for the
372+ expected energy usage at timestep $ t$ . Known information includes: Weather forecasts, historical usages, time, tariff
373+ and customer metadata.
374+ If the customer models change across games (e.g. if a customer suddenly uses 10x the energy on rainy days), the learning
375+ model will have to learn to adapt to this change. This can be achieved by letting the model both learn from historical
376+ data initially (i.e. form the state files) and also let it learn online during the competition, based on the new
377+ customer models.
372378
373379To train a model that predicts the demand amounts of customers under various conditions, a dataset of features and
374- labels needs to be created. Because the model may also learn during the course of a running competition (allowing the
375- model to adapt to new customer patterns), a generator based structure should be preferred. This means that a generator
376- exists that creates $ x, y $ pairs for the model to train on .
380+ labels needs to be created. Because the model may also learn during the course of a running competition, a generator based structure should be preferred. This means that a generator
381+ exists that creates $ x, y $ pairs for the model to train on, instead of creating a large batch of learning data ahead of
382+ the learning processing .
377383
384+ % TODO STOP
378385According to the simulation specification, the customer models generate their demand pattern based on their internal
379386structure, broker factors and game factors \citep []{ketter2018powertac }. The preprocessing pipeline therefore generates
380387feature-label pairs that include: Customer, tariff, weather, time and demand information. The realized demand is the
0 commit comments