- 1. The Solution
- 2. The Problem
- 3. Limited order book basics
- 4. Factors to consider before implementing an LOB
- 5. Candidate data structures for LOB implementation
- Three implementations of the OrderBook:
OrderBookArray
:bid_prices[p]
stores the bid orders at pricep
in cents, i.e., bid_prices[4412] stores the bid orders at price 44.12. The same applies to ask_prices.- Fast (O(1)) in order addition/amendment/removal;
- Slow (O(N) where N is the depth of the order book) in order book traversal (as there are many holes);
- Performs better if the price levels are close to each other (e.g., the product is liquid).
OrderBookBst
: Each binary search tree node stores one bid/ask price level.- Relatively slow (O(logN)) in order addition/amendment/removal
- Relatively fase (O(logN)) in order book traversal
- Perform better if the pricer levels are sparse (e.g., the product is illiquid).
OrderBookStdMap
: Usesstd::map
, which is STL's implementation of a binary search tree, mostly the same asOrderBookBst
just slower lol
- Three implementations of price level (i.e., a collection of orders with the same price):
std::list
(doubly-linked list),std::unordered_map
(hash table) andstd::vector
-
Build
mkdir build && cd build cmake ../ -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release -j 10
-
Unzip
pricer.in
from./assets/test-data/pricer.in.gz
-
Run (results are written to stdout and debug info is written to stderr)
./pricer-array 1 < ./pricer.in 1>./stdout1.log 2>./stderr1.log ./pricer-bst 200 < ./pricer.in 1>./stdout200.log 2>./stderr200.log ./pricer-std-map 10000 < ./pricer.in 1>./stdout10000.log 2>./stderr10000.log
-
Check outputs against test cases:
- stdout1.log vs pricer.out.1 (perfectly matching)
- stdout200.log vs pricer.out.200 (perfectly matching)
- stdout10000.log vs pricer.out.10000 (there seems to be a precision issue that causes output to be off by ~0.001% for some lines)
-
The string representation of the order book (showing ten levels only)
Ask:
....
Level: 10, Price: 44.37, Size: 100, AccuSize: 1510, AccuVolume: 6680730, Orders: { { id: pghe, size: 100 } }
Level: 9, Price: 44.31, Size: 100, AccuSize: 1410, AccuVolume: 6237030, Orders: { { id: hwte, size: 100 } }
Level: 8, Price: 44.29, Size: 100, AccuSize: 1310, AccuVolume: 5793930, Orders: { { id: qbgg, size: 100 } }
Level: 7, Price: 44.28, Size: 100, AccuSize: 1210, AccuVolume: 5351030, Orders: { { id: plfg, size: 100 } }
Level: 6, Price: 44.25, Size: 400, AccuSize: 1110, AccuVolume: 4908230, Orders: { { id: lfse, size: 100 }, { id: ozaf, size: 100 }, { id: mplg, size: 200 } }
Level: 5, Price: 44.23, Size: 10, AccuSize: 710, AccuVolume: 3138230, Orders: { { id: ymye, size: 10 } }
Level: 4, Price: 44.22, Size: 50, AccuSize: 700, AccuVolume: 3094000, Orders: { { id: bckg, size: 50 } }
Level: 3, Price: 44.21, Size: 100, AccuSize: 650, AccuVolume: 2872900, Orders: { { id: vhcg, size: 100 } }
Level: 2, Price: 44.20, Size: 350, AccuSize: 550, AccuVolume: 2430800, Orders: { { id: lbbf, size: 100 }, { id: hwlg, size: 250 } }
Level: 1, Price: 44.19, Size: 200, AccuSize: 200, AccuVolume: 883800, Orders: { { id: izlg, size: 100 }, { id: xzlg, size: 100 } }
Bid:
Level: 1, Price: 44.17, Size: 156, AccuSize: 156, AccuVolume: 689052, Orders: { { id: qzlg, size: 14 }, { id: rzlg, size: 14 }, { id: szlg, size: 7 }, { id: tzlg, size: 21 }, { id: wzlg, size: 100 } }
Level: 2, Price: 44.16, Size: 400, AccuSize: 556, AccuVolume: 2455452, Orders: { { id: mxlg, size: 100 }, { id: ezlg, size: 100 }, { id: qwlg, size: 200 } }
Level: 3, Price: 44.15, Size: 550, AccuSize: 1106, AccuVolume: 4883702, Orders: { { id: zrlg, size: 300 }, { id: pzlg, size: 250 } }
Level: 4, Price: 44.14, Size: 1000, AccuSize: 2106, AccuVolume: 9297702, Orders: { { id: urlg, size: 1000 } }
Level: 5, Price: 44.12, Size: 48, AccuSize: 2154, AccuVolume: 9509478, Orders: { { id: ellg, size: 48 } }
Level: 6, Price: 44.05, Size: 100, AccuSize: 2254, AccuVolume: 9949978, Orders: { { id: rtkg, size: 100 } }
Level: 7, Price: 44.04, Size: 10, AccuSize: 2264, AccuVolume: 9994018, Orders: { { id: xrlg, size: 10 } }
Level: 8, Price: 44.03, Size: 200, AccuSize: 2464, AccuVolume: 10874618, Orders: { { id: wvlg, size: 200 } }
Level: 9, Price: 43.99, Size: 29, AccuSize: 2493, AccuVolume: 11002189, Orders: { { id: phcg, size: 29 } }
Level: 10, Price: 43.92, Size: 960, AccuSize: 3453, AccuVolume: 15218509, Orders: { { id: sczf, size: 100 }, { id: odzf, size: 10 }, { id: yyyf, size: 100 }, { id: yxzf, size: 150 }, { id: umag, size: 500 }, { id: pbgg, size: 100 } }
...
-
Prepare ramdisk and input data
sudo mkdir /tmp/tmpfs sudo mount -o size=128M -t tmpfs none /tmp/tmpfs cp ./assets/test-data/pricer.in.gz /tmp/tmpfs/ gzip -d /tmp/tmpfs/pricer.in.gz
-
Build with stdout/stderr turned off
mkdir build && cd build cmake ../ -DBENCHMARK_PERFORMANCE=1 -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release -j 10
-
Time the execution with minimal output
> time ./pricer-bst 1 < /tmp/tmpfs/pricer.in ./pricer-bst exited gracefully, price changed 75335 time(s) real 0m0.666s user 0m0.630s sys 0m0.036s # ~1.8M orders/sec > time ./pricer-array 200 < /tmp/tmpfs/pricer.in ./pricer-array exited gracefully, price changed 211539 time(s) real 0m0.551s user 0m0.526s sys 0m0.024s # ~2.2M orders/sec > time ./pricer-bst 10000 < /tmp/tmpfs/pricer.in ./pricer-bst exited gracefully, price changed 777063 time(s) real 0m1.680s user 0m1.647s sys 0m0.032s # ~711K orders/sec
-
The problem, originally called "Order Book Programming Problem", was organized by RGM Advisors (now part of DRW).
-
The problem's official website has been taken down since a while ago, the below duplicate is sourced from web.archive.org/
Suppose your great aunt Gertrude dies and leaves you $3000 which you decide you want to invest in the Acme Internet Widget Company (stock symbol:AIWC). You are willing to pay up to $30 per share of AIWC. So you log in to your online trading account and enter a limit order: "BUY 100 AIWC @ $30". It's a limit order because that's most you're willing to pay. You'd be willing to pay less than $30, but not more than $30. Your order will sit around in the market until you get your 100 shares. A limit order to buy is called a "bid".
But you're not the only prospective buyer for AIWC stock. Others want to buy it too. One is bidding $31/share for 200 shares, while another is bidding $29/share for 300 shares. When Warren Buffett wants to sell 225 shares, he's obviously going to take the highest price he can get for each share. So he hits the $31 bid first, selling 200 shares. Then he sells his remaining 25 shares to you at $30/share. Your bid size reduced by 25, leaving 75 shares still to be bought.
Suppose you eventually get the full 100 shares at some price. Next year, you decide to buy a new computer and you need $4500 for it, and luckily the value of AIWC has appreciated by 50%. So you want to sell your 100 shares of AIWC stock for at least $45/share. So you enter this limit order: "SELL 100 AIWC @ $45". A limit order to sell is called an "ask".
But you're not the only prospective seller of AIWC stock. There's also an ask for $44/share and an ask for $46/share. If Alan Greenspan wants to buy AIWC, he's obviously going to pay as little as possible. So he'll take the $44 offer first, and only buy from you at $45 if he can't buy as much as he wants at $44.
The set of all standing bids and asks is called a "limit order book", or just a "book". You can buy a data feed from the stock market, and they will send you messages in real time telling you about changes to the book. Each message either adds an order to the book, or reduces the size of an order in the book (possibly removing the order entirely). You can record these messages in a log file, and later you can go back and analyze your log file.
Your task is to write a program, Pricer, that analyzes such a log file. Pricer takes one command-line argument: target-size. Pricer then reads a market data log on standard input. As the book is modified, Pricer prints (on standard output) the total expense you would incur if you bought target-size shares (by taking as many asks as necessary, lowest first), and the total income you would receive if you sold target-size shares (by hitting as many bids as necessary, highest first). Each time the income or expense changes, it prints the changed value.
Editor's Note: It is slightly confusing whether my Pricer will cause a change to the order book being maintained (i.e., if Pricer sells 200 shares, does it reduce the best bid's size by 200?). If you read through the Example Input and Output section below you can draw the conclusion that the answer is negative, Priceronly calculates such hypothetical price, but it does not impact the order book in any manner.
The market data log contains one message per line (terminated by a
single linefeed character, '\n
'), and each
message is a series of fields separated by spaces.
An "Add Order to Book" message looks like this:
timestamp A order-id side price size
Field | Meaning |
---|---|
timestamp | The time when this message was generated by the market, as milliseconds since midnight. |
A |
A literal string identifying this as an "Add Order to Book" message. |
order-id | A unique string that subsequent "Reduce Order" messages will use to modify this order. |
side | A 'B ' if this is a buy order (a bid), and a
'S ' if this is a sell order (an ask).
|
price | The limit price of this order. |
size | The size in shares of this order, when it was initially sent to the market by some stock trader. |
A "Reduce Order" message looks like this:
timestamp R order-id size
Field | Meaning |
---|---|
timestamp | The time when this message was generated by the market, as milliseconds since midnight. |
R |
A literal string identifying this as an "Reduce Order" message. |
order-id | The unique string that identifies the order to be reduced. |
size | The amount by which to reduce the size of the order. This is not the new size of the order. If size is equal to or greater than the existing size of the order, the order is removed from the book. |
The log file messages are sorted by timestamp by the time Pricer receives them.
If Pricer encounters an error in an input message, it must print a warning to standard error and proceed to the next message.
Pricer's output consists of one message per line in this format:
timestamp action total
Field | Meaning |
---|---|
timestamp | The timestamp from the input message that caused this output message to be generated. |
action | A string: 'B ' if this message contains the new
expense to buy target-size shares, and
'S ' if this message contains the new income for
selling target-size shares.
|
total | The total expense (if action is ' If the book does not contain target-size shares in
the appropriate type of order (asks for expense; bids for
income), the total field contains the string
' |
Here is an example run of Pricer with a target-size of 200. Input messages are in the left column. Notes and output messages are in the right column.
Standard Input | Standard Output/Notes |
---|---|
28800538 A b S 44.26 100 | No output yet because neither the bids nor the asks in the book have a total of 200 shares yet. |
28800562 A c B 44.10 100 | Still not enough shares on either side of the book. |
28800744 R b 100 | This reduces order 'b ' to zero shares, which
removes it from the book, so now the book contains no asks. But
there's still no change to the total income or expense on 200
shares. |
28800758 A d B 44.18 157 | The bid sizes now total 257, which is more than the target size of 200. To sell 200 shares, you would first hit the bid at 44.18 for 157 shares, spending $6936.26. Then you would hit the bid at 44.10 for the remaining 43 shares, spending another $1896.30. Your total income would be $8832.56, so Pricer emits this message: |
28800758 S 8832.56 | |
28800773 A e S 44.38 100 | The book now contains a single ask of size 100, which is
still not enough to change the target size expense from
'NA '. |
28800796 R d 157 | This removes bid 'd ' from the book, leaving
just one bid with a size of 100 on the book, so the income from
selling changes to 'NA ': |
28800796 S NA | |
28800812 A f B 44.18 157 | This new bid brings the total bid size back over 200, so the
selling income is no longer 'NA ': |
28800812 S 8832.56 | |
28800974 A g S 44.27 100 | This ask brings the total ask size up to 200, exactly the target size. The total expense for buying 200 shares would be 100 * $44.27 + 100 * $44.38: |
28800974 B 8865.00 | |
28800975 R e 100 | Removing ask 'e ' from the book leaves less than
200 shares on the ask side, so the buying expense changes back
to 'NA ':
|
28800975 B NA | |
28812071 R f 100 | Reducing bid 'f ' by 100 shares leaves only 157
shares on the bid side, so the selling income changes to
'NA ': |
28812071 S NA | |
28813129 A h B 43.68 50 | This new bid makes it possible to sell 200 shares: 57 at $44.18, 100 at $44.10, and the last 43 at $43.68. |
28813129 S 8806.50 | |
28813300 R f 57 | This removes bid 'f ' from the book, so it is
no longer possible to sell 200 shares:
|
28813300 S NA | |
28813830 A i S 44.18 100 | This ask makes it possible to buy 200 shares again: 100 at $44.18 and 100 at $44.27. |
28813830 B 8845.00 | |
28814087 A j S 44.18 1000 | This ask has the same price as an existing ask, and these two asks are tied for the best asking price. This means you could now buy all 200 shares at $44.18 instead of buying half of them at $44.27, so the buying expense decreases: |
28814087 B 8836.00 | |
28814834 R c 100 | This leaves only 50 shares on the bid side (all in order
'h '), so it is still not possible to sell 200
shares. The selling income is therefore unchanged from
'NA ' and Pricer prints no output message.
|
28814864 A k B 44.09 100 | Only 150 shares on the bid side, so no output needed. |
28815774 R k 100 | Back to 50 shares on the bid side; still no output needed. |
28815804 A l B 44.07 175 | There are now more than 200 shares on the bid side. You could sell 175 shares at $44.07 each, and the remaining 25 shares at $43.68 each: |
28815804 S 8804.25 | |
28815937 R j 1000 | After ask 'j ' is removed from the book, you can
still buy 200 shares: 100 at $44.18 each, and 100 at the worse
price of $44.27.
|
28815937 B 8845.00 | |
28816245 A m S 44.22 100 | Since $44.22 is a better price than $44.27, the buying expense decreases: |
28816245 B 8840.00 |
Note that the book initially contains no orders, and that the buying
expense and selling income are both considered to start at
'NA
'. Since Pricer only produces output when the
income or expense changes, it does not print anything until the total
size of all bids or the total size of all asks meets or exceeds
target-size.
Please write the Pricer program in one of C, C++, Java, or Scala, and send us the source code. You do not need to send us any compiler output. We encourage you to take advantage of the language's standard libraries and latest versions. Please do not use other code that you would have to download separately from your chosen language's normal distribution package.
We're looking for evidence that you can produce code that others would be able to understand, fix, and extend. At the same time, we do have real-time requirements for production code, so we frown on gratuitous inefficiency. Here are some qualities we look for:
- Correctness. Obviously, the fewer bugs you write, the fewer you'll have to fix.
- Clarity. If only you can understand your code, then only you can maintain it. Good code speaks for itself - a good programmer should be able to understand your implementation details without extensive comments.
- Conciseness. The less code you write, the less code your coworkers need to puzzle out.
- Coefficiency. OK, just efficiency, but wouldn't it be cool if all of these qualities started with a 'C'? Anyway, using less time and space is generally better than otherwise.
Of course, there are often trade-offs between each of these properties (except correctness, which is pretty much an absolute).
Also, we are a Linux shop and we like Unix-style tools: programs that process the output of other programs. So make sure your implementation of Pricer is suitable for use in a shell pipeline. Follow the I/O specifications: don't mix prompts in with your output or demand that the input come from a disk file.
In addition to supplying us with your source code, please answer these questions:
- How did you choose your implementation language?
- What is the time complexity for processing an Add Order message?
- What is the time complexity for processing a Reduce Order message?
- If your implementation were put into production and found to be too slow, what ideas would you try out to improve its performance? (Other than reimplementing it in a different language such as C or C++.)
You can download a large amount of test input data here: pricer.in.gz. This file is compressed using gzip. You should uncompress it before feeding it to your program. This is real market data, collected from a live system, so you might want to analyze it to decide what algorithms are most appropriate to use in Pricer.
You can also download the corresponding output of our reference implementation of Pricer with various target sizes: target size 1, target size 200, target size 10000. These files are also compressed with gzip.
In case you want to test your implementation on a smaller sample of data, here is a snippet in easy-to-cut-and-paste format:
28800538 A b S 44.26 100 28800562 A c B 44.10 100 28800744 R b 100 28800758 A d B 44.18 157 28800773 A e S 44.38 100 28800796 R d 157 28800812 A f B 44.18 157 28800974 A g S 44.27 100 28800975 R e 100 28812071 R f 100 28813129 A h B 43.68 50 28813300 R f 57 28813830 A i S 44.18 100 28814087 A j S 44.18 1000 28814834 R c 100 28814864 A k B 44.09 100 28815774 R k 100 28815804 A l B 44.07 175 28815937 R j 1000 28816245 A m S 44.22 100
And here is the corresponding output:
28800758 S 8832.56 28800796 S NA 28800812 S 8832.56 28800974 B 8865.00 28800975 B NA 28812071 S NA 28813129 S 8806.50 28813300 S NA 28813830 B 8845.00 28814087 B 8836.00 28815804 S 8804.25 28815937 B 8845.00 28816245 B 8840.00
- A limit order book is a record of outstanding limit orders. A limit order is a type of order to buy or sell a security at a specific price or better.[What Is a Limit Order Book?]
-
L1 market data: Provides more comprehensive by displaying the full depth of the order book. This includes multiple bid and ask prices at different levels, not just the best available prices [Market Data]
-
L2 market data: Provides more comprehensive by displaying the full depth of the order book. This includes multiple bid and ask prices at different levels, not just the best available prices. [Market Data] L2 data is also sometimes referred to as market-by-price (MBP), since the updates to book depth are usually keyed by price or price level. It may also be called market depth or depth of market ( DoM). [L2]
-
L3 market data: Level 3 (L3) refers to market data that provides every individual buy and sell order at every price level. This is often also the highest granularity of data available. L3 data is also called market by order or full order book data. [L3]
- To give some idea of the data volumes, the Nasdaq TotalView ITCH feed, which is every event in every instrument traded on the Nasdaq, can have data rates of 20+ gigabytes/day with spikes of 3 megabytes/second or more. The individual messages average about 20 bytes each so this means handling 100,000-200,000 messages per second during high volume periods. [How to Build a Fast Limit Order Book]
-
Consider the below factors:
- What queries need to be optimized for your application?
- Sparsity of the book.
- Statistical distribution of book events. [Red Black Trees for Limit Order Book]
-
Markets/products to trade/liquidity impacts engineering decisions:
- In illiquid options, there may be very few resting orders on the book, so it may be cheaper to just store everything in arrays and linearly walk through them.
- In liquid futures, most events only affect a few hundred price levels, and price bands might give you a bound on levels that you actually care about, so it is possible to preallocate the levels in an array and represent index prices as an offset from some initial state in number of ticks. [Red Black Trees for Limit Order Book]
-
LOB's behavior can differ depending on whether you are implementing an LOB for equities (order-based, a.k.a. Market by Order, MBO or L3) [Market by Order (MBO)], [L3] or futures (level-based, a.k.a., Market by Price or MBP) [What is an efficient data structure to model order book?]
-
There are three main operations that an LOB has to implement: add, cancel, and execute. The goal is to implement these operations in O(1) time while making it possible for the trading model to efficiently ask questions like:
- what are the best bid and offer?
- how much volume is there between prices A and B?
- what is order X's current position in the book? [How to Build a Fast Limit Order Book]
-
If you can, you should also optimize for the particular exchange. For instance, it turns out that, last I checked, Nasdaq generates order IDs incrementally starting from a small number, so you can store all the orders in a giant array instead of a hashtable. This is really cache- and TLB-friendly compared to a hashtable because most updates tend to happen to recently-dereferenced orders. [What is an efficient data structure to model order book?]
-
A Binary Search Tree (BST) is a type of Binary Tree data structure, where the following properties must be true for any node "X" in the tree [1]:
-
Insert
-
Search (omitted for being too simple)
-
Delete
- A Min-Heap is a Data Structure with the following
properties.[2]
- It is a complete Complete Binary Tree.
- The value of the root node must be the smallest among all its descendant nodes and the same thing must be done for its left and right sub-tree also.