0. Related Work

Trade execution optimization: https://www.cis.upenn.edu/~mkearns/papers/rlexec.pdf (M.Kearns)

Citations: https://www.researchgate.net/publication/221345789_Reinforcement_learning_for_optimized_trade_execution

Electronic Trading in Order-Driven Markets: Efficient Execution (M.Kearns)

Method

Expected execution price
- x-axis: limit order relative to its own side of the market
- y-axis: “return” (difference between the execution price and the mid-spread price at the beginning of the time period) e.g. return = mid-spread - (ex.price/mid-spread)
Risk
- x-axis: every limit order price
- y-axis: Standard deviation of returns
- Market order: sweep the sell book for the entire size at once
- Marketable limit order: transact with top of the sell book and then leave the residual shares sitting on top of of the buy book.
Efficient Pricing Frontier
- Markowitz efficient frontier: shows trade-off between risk and return in an investment
- Risk-return profile: every possible execution strategy on a two-dimensional graph
- x-axis: standard deviation
- y-axis: returns
- Efficient pricing frontier --> top part of risk-return

Results

Order size
- More expensive to trade lager orders
- More risky (not getting executed) to trade lager orders
- Large orders require more aggressive pricing
- Possible improvement by splitting into several pieces
Time Window
- Shorter time interval is more expensive as it requires more aggressive order pricing
- Longer time interval is less expensive but riskier
Time of the day
- Only relevant if transacting over a long time period
- Otherwise generalization impossible
Market Conditions
- Cheaper to trade on high-volume days, but also riskier (surges in volume -> higher volatility -> adverse price movements more likely)
- More aggressive pricing on low-volume days
- Depth of a book may not be as significant as volume, when it comes to limit order pricing

Algorithmic Challenges in Modern Financial Markets (M.Kearns) http://www.eecs.harvard.edu/~cat/cs/diss/paperlinks/ectutorial2006.pdf

Deep Reinforcement Learning Based Trading Application at JP Morgan Chase https://medium.com/@ranko.mosic/reinforcement-learning-based-trading-application-at-jp-morgan-chase-f829b8ec54f2

A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets

Optimal Trade Execution: An Evolutionary Approach

Impact cost: Moving the price up by executing large buy orders (e.g. down by sell orders) at once. By splitting up a big order (e.g. V shares) into smaller pieces and spreading the execution over a time horizon H the impact cost can be lessened.

Opportunity cost: Arises when the price moves against our favour while splitting a big order into pieces and delaying execution. Therefore the opportunity to execute at a better price.

Trade execution strategy: Optimizes trade-off between impact cost and opportunity cost and therefore trying to find best execution.

Measuring execution performance:

Bid-ask mid-spread at t of execution initialization [Kearns]
Volume Weighted Average Price (VWAP): vwap = sum(price*volume) / sum(volume)

Backtesting: Process of executing a given strategy on historical data do determine what its performance would have been had it been used on a certain time t in past.

Price-only would not incorporate volume and limit orders (liquidity) available.
Limit orders allow for an educated guess whereby it is assumed that trades are filled by those and therefore ignores the time priority of all other limit orders at the same price level

Paper Book

Deep Reinforcement Learning for Pairs Trading

Reinforcement Learning For Automated Trading

Algorithm Trading using Q-Learning and Recurrent Reinforcement Learning

Modeling Stock Order Flows and Learning Market-Making from Data

T: 1hr basis

Multiple Kernel Learning on the Limit Order Book

Purpose: Investigates currency order books to find patterns which can be exploited with the aim of forecasting movement. SVM classification techniques with different kernels along with two Multiple Kernel Learning (MKL) techniques, SimpleMKL, are being used.

Simulating and analyzing order book data: The queue-reactive model

“Market making” in an order book model and its impact on the spread

Deep Reinforcement Learning for Pairs Trading

A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem https://arxiv.org/pdf/1706.10059v2.pdf

Cryptocurrency Portfolio Management with Deep Reinforcement Learning https://arxiv.org/pdf/1612.01277v5.pdf

Modeling Stock Order Flows and Learning Market-Making from Data

Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks

Purpose Buy/Hold/Sell

Valuable Information

Presence of large amounts of noise and non-stationarity in the datasets, which could cause severe problems for a value function approach.
Recurrent reinforcement learning
- provides immedieate feedback to optimize the strategy
- has ability to produce real valued actions or weights naturally without resorting to the discretization (which is necessary for value function approaches)
- Sharpe Ration and Downside Deviation Ratio can be formulated to enable on-line learning with recurrent RL
- Uses gradient ascent to optimize
LSTM handles deep structure on feature learning and the time expansion parts
Agent
- Risk-adjusted return using Sharp Ratio (return / std(return), given trading period t) or Downside Deviation Ratio

Deep Direct Reinforcement Learning for Financial Signal Representation and Trading

Robust Optimization of Order Execution http://www.ece.ust.hk/~palomar/Publications_files/2015/FengPalomarRubio-TSP2015%20-%20Robust_Order_Execution.pdf

Purpose

We propose the use of the conditional value-at-risk (CVaR) of the execution cost as risk measure, which allows to take into consideration only the unfavorable part of the return distribution, or, equivalently, unwanted high cost.
Due to the parameter estimation errors in the price model, the naive strategies given by the nominal problem may perform badly in the real market, and hence it is extremely important to take such parameters estimation errors into consideration. To deal with this, we extend both the traditional mean-variance approach and our proposed CVaR approach to their robust design counterparts.

Statements

Variance:

However, the variance has been recognized not to be practical since it is a symmetric measure of risk and, hence, penalizes the low-cost events.
However, it is well known that variance is not an appropriate risk measure when dealing with financial returns from non-normal, negatively skewed, and leptokurtic distributions [22]

Value-at-risk:

VaR is also known to have the limitations of lacking subadditivity and not properly describing the losses in the tail of concern [22].
In order to overcome the inadequacy of variance or VaR, Conditional VaR (CVaR, also known in the literature as Expected Shortfall, Expected Tail Loss, Tail Conditional Expectation, and Tail VaR) has been proposed as an alternative risk measurement [23] which has the desired properties e.g., convexity and coherence, [22], and thus has been employed significantly in financial engineering, see [24]–[27] for portfolio or risk management

Parameter estimation: *

Learning to Trade via Direct Reinforcement http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=935097

Optimal Trading Strategy in a Limit Order Market with Imperfect Liquidity https://editorialexpress.com/cgi-bin/conference/download.cgi?db_name=res_phd_2013&paper_id=271

Optimal order placement in limit order markets https://arxiv.org/abs/1210.1625

Introduction to Learning to Trade with Reinforcement Learning http://www.wildml.com/2018/02/introduction-to-learning-to-trade-with-reinforcement-learning/

Sharp ratio or Drawdown as reward functions.
Reinforcement Learning allows for end-to-end optimization and maximizes (potentially delayed) rewards.
a strategy may work well in a bearish environment, but lose money in a bullish environment. Partly, this is due to the simplistic nature of the policy, which does not have a parameterization powerful enough to learn to adapt to changing market conditions.
However, if we explicitly modeled the other agents in the environment, our agent could learn to exploit their strategies. In essence, we are reformulating the problem from “market prediction” to “agent exploitation”. This is much more similar to what we are doing in multiplayer games, like DotA.
in the trading case, most states in the environment are bad, and there are only a few good ones. A naive random approach to exploration will almost never stumble upon those good state-actions pairs. A new approach is necessary here.
There are many ways to speed up the training of Reinforcement Learning agents, including transfer learning, and using auxiliary tasks. For example, we could imagine pre-training an agent with an expert policy, or adding auxiliary tasks, such as price prediction

Why is machine learning in finance so hard? https://www.hardikp.com/2018/02/11/why-is-machine-learning-in-finance-so-hard/

Limit Order Book Visualisation http://parasec.net/transmission/order-book-visualisation/

Limit Order Book reconstruction, visualization and statistical analysis of the order flow https://www.ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/thesis_schroeter.pdf

Optimal Placement in a Limit Order Book Roughly speaking, algorithmic trading is based on two different time scales: the daily or weekly scale, and a smaller (ten to hundred seconds) time scale. The first step is to optimally slice big orders into smaller ones on a daily basis with the goal to minimize the price impact and/or to maximize the expected utility; the second step is to optimally place the orders within seconds. The former is the well-known optimal execution problem and the latter is the much less-studied optimal placement problem.

Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Book https://videos.re-work.co/videos/426-deep-reinforcement-learning-for-optimal-order-placement-in-a-limit-order-book https://docs.google.com/presentation/d/1bsK-3GTvgtpE0WJOrue1u7ZsacftzSi_JGNSnLTdayY/edit#slide=id.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0. Related Work

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Analysis

Reinforcement Learning

Documents

Meetings

Clone this wiki locally