Order Book Programming Problem

1. The Solution

1.1 Design

Three implementations of the OrderBook:
- OrderBookArray: bid_prices[p] stores the bid orders at price p in cents, i.e., bid_prices[4412] stores the bid orders at price 44.12. The same applies to ask_prices.
  - Fast (O(1)) in order addition/amendment/removal;
  - Slow (O(N) where N is the depth of the order book) in order book traversal (as there are many holes);
  - Performs better if the price levels are close to each other (e.g., the product is liquid).
- OrderBookBst: Each binary search tree node stores one bid/ask price level.
  - Relatively slow (O(logN)) in order addition/amendment/removal
  - Relatively fase (O(logN)) in order book traversal
  - Perform better if the pricer levels are sparse (e.g., the product is illiquid).
- OrderBookStdMap: Uses std::map, which is STL's implementation of a binary search tree, mostly the same as OrderBookBst just slower lol
Three implementations of price level (i.e., a collection of orders with the same price): std::list (doubly-linked list), std::unordered_map(hash table) and std::vector

1.2 Build and run

Build

mkdir build && cd build
cmake ../  -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j 10

Unzip pricer.in from ./assets/test-data/pricer.in.gz

Run (results are written to stdout and debug info is written to stderr)

./pricer-array 1 < ./pricer.in 1>./stdout1.log 2>./stderr1.log
./pricer-bst 200 < ./pricer.in 1>./stdout200.log 2>./stderr200.log
./pricer-std-map 10000 < ./pricer.in 1>./stdout10000.log 2>./stderr10000.log

Check outputs against test cases:
- stdout1.log vs pricer.out.1 (perfectly matching)
- stdout200.log vs pricer.out.200 (perfectly matching)
- stdout10000.log vs pricer.out.10000 (there seems to be a precision issue that causes output to be off by ~0.001% for some lines)
The string representation of the order book (showing ten levels only)

Ask:
....
Level: 10, Price: 44.37, Size:   100, AccuSize:  1510, AccuVolume:  6680730, Orders: { { id:  pghe, size:  100 } }
Level:  9, Price: 44.31, Size:   100, AccuSize:  1410, AccuVolume:  6237030, Orders: { { id:  hwte, size:  100 } }
Level:  8, Price: 44.29, Size:   100, AccuSize:  1310, AccuVolume:  5793930, Orders: { { id:  qbgg, size:  100 } }
Level:  7, Price: 44.28, Size:   100, AccuSize:  1210, AccuVolume:  5351030, Orders: { { id:  plfg, size:  100 } }
Level:  6, Price: 44.25, Size:   400, AccuSize:  1110, AccuVolume:  4908230, Orders: { { id:  lfse, size:  100 },  { id:  ozaf, size:  100 },  { id:  mplg, size:  200 } }
Level:  5, Price: 44.23, Size:    10, AccuSize:   710, AccuVolume:  3138230, Orders: { { id:  ymye, size:   10 } }
Level:  4, Price: 44.22, Size:    50, AccuSize:   700, AccuVolume:  3094000, Orders: { { id:  bckg, size:   50 } }
Level:  3, Price: 44.21, Size:   100, AccuSize:   650, AccuVolume:  2872900, Orders: { { id:  vhcg, size:  100 } }
Level:  2, Price: 44.20, Size:   350, AccuSize:   550, AccuVolume:  2430800, Orders: { { id:  lbbf, size:  100 },  { id:  hwlg, size:  250 } }
Level:  1, Price: 44.19, Size:   200, AccuSize:   200, AccuVolume:   883800, Orders: { { id:  izlg, size:  100 },  { id:  xzlg, size:  100 } }
Bid:
Level:  1, Price: 44.17, Size:   156, AccuSize:   156, AccuVolume:   689052, Orders: { { id:  qzlg, size:   14 },  { id:  rzlg, size:   14 },  { id:  szlg, size:    7 },  { id:  tzlg, size:   21 },  { id:  wzlg, size:  100 } }
Level:  2, Price: 44.16, Size:   400, AccuSize:   556, AccuVolume:  2455452, Orders: { { id:  mxlg, size:  100 },  { id:  ezlg, size:  100 },  { id:  qwlg, size:  200 } }
Level:  3, Price: 44.15, Size:   550, AccuSize:  1106, AccuVolume:  4883702, Orders: { { id:  zrlg, size:  300 },  { id:  pzlg, size:  250 } }
Level:  4, Price: 44.14, Size:  1000, AccuSize:  2106, AccuVolume:  9297702, Orders: { { id:  urlg, size: 1000 } }
Level:  5, Price: 44.12, Size:    48, AccuSize:  2154, AccuVolume:  9509478, Orders: { { id:  ellg, size:   48 } }
Level:  6, Price: 44.05, Size:   100, AccuSize:  2254, AccuVolume:  9949978, Orders: { { id:  rtkg, size:  100 } }
Level:  7, Price: 44.04, Size:    10, AccuSize:  2264, AccuVolume:  9994018, Orders: { { id:  xrlg, size:   10 } }
Level:  8, Price: 44.03, Size:   200, AccuSize:  2464, AccuVolume: 10874618, Orders: { { id:  wvlg, size:  200 } }
Level:  9, Price: 43.99, Size:    29, AccuSize:  2493, AccuVolume: 11002189, Orders: { { id:  phcg, size:   29 } }
Level: 10, Price: 43.92, Size:   960, AccuSize:  3453, AccuVolume: 15218509, Orders: { { id:  sczf, size:  100 },  { id:  odzf, size:   10 },  { id:  yyyf, size:  100 },  { id:  yxzf, size:  150 },  { id:  umag, size:  500 },  { id:  pbgg, size:  100 } }
...

1.3 Performance benchmark

Prepare ramdisk and input data

sudo mkdir /tmp/tmpfs
sudo mount -o size=128M -t tmpfs none /tmp/tmpfs
cp ./assets/test-data/pricer.in.gz /tmp/tmpfs/
gzip -d /tmp/tmpfs/pricer.in.gz

Build with stdout/stderr turned off

mkdir build && cd build
cmake ../ -DBENCHMARK_PERFORMANCE=1 -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j 10

Time the execution with minimal output

> time ./pricer-bst 1 < /tmp/tmpfs/pricer.in
./pricer-bst exited gracefully, price changed 75335 time(s)

real    0m0.666s
user    0m0.630s
sys     0m0.036s
# ~1.8M orders/sec

>  time ./pricer-array 200 < /tmp/tmpfs/pricer.in
./pricer-array exited gracefully, price changed 211539 time(s)

real    0m0.551s
user    0m0.526s
sys     0m0.024s
# ~2.2M orders/sec 

> time ./pricer-bst 10000 < /tmp/tmpfs/pricer.in
./pricer-bst exited gracefully, price changed 777063 time(s)

real    0m1.680s
user    0m1.647s
sys     0m0.032s
# ~711K orders/sec

2. The Problem

The problem, originally called "Order Book Programming Problem", was organized by RGM Advisors (now part of DRW).
The problem's official website has been taken down since a while ago, the below duplicate is sourced from web.archive.org/

Background

Suppose your great aunt Gertrude dies and leaves you $3000 which you decide you want to invest in the Acme Internet Widget Company (stock symbol:AIWC). You are willing to pay up to $30 per share of AIWC. So you log in to your online trading account and enter a limit order: "BUY 100 AIWC @ $30". It's a limit order because that's most you're willing to pay. You'd be willing to pay less than $30, but not more than $30. Your order will sit around in the market until you get your 100 shares. A limit order to buy is called a "bid".

But you're not the only prospective buyer for AIWC stock. Others want to buy it too. One is bidding $31/share for 200 shares, while another is bidding $29/share for 300 shares. When Warren Buffett wants to sell 225 shares, he's obviously going to take the highest price he can get for each share. So he hits the $31 bid first, selling 200 shares. Then he sells his remaining 25 shares to you at $30/share. Your bid size reduced by 25, leaving 75 shares still to be bought.

Suppose you eventually get the full 100 shares at some price. Next year, you decide to buy a new computer and you need $4500 for it, and luckily the value of AIWC has appreciated by 50%. So you want to sell your 100 shares of AIWC stock for at least $45/share. So you enter this limit order: "SELL 100 AIWC @ $45". A limit order to sell is called an "ask".

But you're not the only prospective seller of AIWC stock. There's also an ask for $44/share and an ask for $46/share. If Alan Greenspan wants to buy AIWC, he's obviously going to pay as little as possible. So he'll take the $44 offer first, and only buy from you at $45 if he can't buy as much as he wants at $44.

The set of all standing bids and asks is called a "limit order book", or just a "book". You can buy a data feed from the stock market, and they will send you messages in real time telling you about changes to the book. Each message either adds an order to the book, or reduces the size of an order in the book (possibly removing the order entirely). You can record these messages in a log file, and later you can go back and analyze your log file.

Problem

Your task is to write a program, Pricer, that analyzes such a log file. Pricer takes one command-line argument: target-size. Pricer then reads a market data log on standard input. As the book is modified, Pricer prints (on standard output) the total expense you would incur if you bought target-size shares (by taking as many asks as necessary, lowest first), and the total income you would receive if you sold target-size shares (by hitting as many bids as necessary, highest first). Each time the income or expense changes, it prints the changed value.

Editor's Note: It is slightly confusing whether my Pricer will cause a change to the order book being maintained (i.e., if Pricer sells 200 shares, does it reduce the best bid's size by 200?). If you read through the Example Input and Output section below you can draw the conclusion that the answer is negative, Priceronly calculates such hypothetical price, but it does not impact the order book in any manner.

Input Format

The market data log contains one message per line (terminated by a single linefeed character, '\n'), and each message is a series of fields separated by spaces.

An "Add Order to Book" message looks like this:

timestamp A order-id side price size

Field	Meaning
`timestamp`	The time when this message was generated by the market, as milliseconds since midnight.
`A`	A literal string identifying this as an "Add Order to Book" message.
`order-id`	A unique string that subsequent "Reduce Order" messages will use to modify this order.
`side`	A '`B`' if this is a buy order (a bid), and a '`S`' if this is a sell order (an ask).
`price`	The limit price of this order.
`size`	The size in shares of this order, when it was initially sent to the market by some stock trader.

A "Reduce Order" message looks like this:

timestamp R order-id size

Field	Meaning
`timestamp`	The time when this message was generated by the market, as milliseconds since midnight.
`R`	A literal string identifying this as an "Reduce Order" message.
`order-id`	The unique string that identifies the order to be reduced.
`size`	The amount by which to reduce the size of the order. This is not the new size of the order. If `size` is equal to or greater than the existing size of the order, the order is removed from the book.

The log file messages are sorted by timestamp by the time Pricer receives them.

If Pricer encounters an error in an input message, it must print a warning to standard error and proceed to the next message.

Output Format

Pricer's output consists of one message per line in this format:

timestamp action total

Field Meaning

timestamp The timestamp from the input message that caused this output message to be generated.

action A string: 'B' if this message contains the new expense to buy target-size shares, and 'S' if this message contains the new income for selling target-size shares.

total

The total expense (if action is 'B') to buy target-size shares, or the total income (if action is 'S') for selling target-size shares.

If the book does not contain target-size shares in the appropriate type of order (asks for expense; bids for income), the total field contains the string 'NA'.

Example Input and Output

Here is an example run of Pricer with a target-size of 200. Input messages are in the left column. Notes and output messages are in the right column.

Standard Input	Standard Output/Notes
28800538 A b S 44.26 100	No output yet because neither the bids nor the asks in the book have a total of 200 shares yet.
28800562 A c B 44.10 100	Still not enough shares on either side of the book.
28800744 R b 100	This reduces order '`b`' to zero shares, which removes it from the book, so now the book contains no asks. But there's still no change to the total income or expense on 200 shares.
28800758 A d B 44.18 157	The bid sizes now total 257, which is more than the target size of 200. To sell 200 shares, you would first hit the bid at 44.18 for 157 shares, spending $6936.26. Then you would hit the bid at 44.10 for the remaining 43 shares, spending another $1896.30. Your total income would be $8832.56, so Pricer emits this message:
	28800758 S 8832.56
28800773 A e S 44.38 100	The book now contains a single ask of size 100, which is still not enough to change the target size expense from '`NA`'.
28800796 R d 157	This removes bid '`d`' from the book, leaving just one bid with a size of 100 on the book, so the income from selling changes to '`NA`':
	28800796 S NA
28800812 A f B 44.18 157	This new bid brings the total bid size back over 200, so the selling income is no longer '`NA`':
	28800812 S 8832.56
28800974 A g S 44.27 100	This ask brings the total ask size up to 200, exactly the target size. The total expense for buying 200 shares would be 100 * $44.27 + 100 * $44.38:
	28800974 B 8865.00
28800975 R e 100	Removing ask '`e`' from the book leaves less than 200 shares on the ask side, so the buying expense changes back to '`NA`':
	28800975 B NA
28812071 R f 100	Reducing bid '`f`' by 100 shares leaves only 157 shares on the bid side, so the selling income changes to '`NA`':
	28812071 S NA
28813129 A h B 43.68 50	This new bid makes it possible to sell 200 shares: 57 at $44.18, 100 at $44.10, and the last 43 at $43.68.
	28813129 S 8806.50
28813300 R f 57	This removes bid '`f`' from the book, so it is no longer possible to sell 200 shares:
	28813300 S NA
28813830 A i S 44.18 100	This ask makes it possible to buy 200 shares again: 100 at $44.18 and 100 at $44.27.
	28813830 B 8845.00
28814087 A j S 44.18 1000	This ask has the same price as an existing ask, and these two asks are tied for the best asking price. This means you could now buy all 200 shares at $44.18 instead of buying half of them at $44.27, so the buying expense decreases:
	28814087 B 8836.00
28814834 R c 100	This leaves only 50 shares on the bid side (all in order '`h`'), so it is still not possible to sell 200 shares. The selling income is therefore unchanged from '`NA`' and Pricer prints no output message.
28814864 A k B 44.09 100	Only 150 shares on the bid side, so no output needed.
28815774 R k 100	Back to 50 shares on the bid side; still no output needed.
28815804 A l B 44.07 175	There are now more than 200 shares on the bid side. You could sell 175 shares at $44.07 each, and the remaining 25 shares at $43.68 each:
	28815804 S 8804.25
28815937 R j 1000	After ask '`j`' is removed from the book, you can still buy 200 shares: 100 at $44.18 each, and 100 at the worse price of $44.27.
	28815937 B 8845.00
28816245 A m S 44.22 100	Since $44.22 is a better price than $44.27, the buying expense decreases:
	28816245 B 8840.00

Note that the book initially contains no orders, and that the buying expense and selling income are both considered to start at 'NA'. Since Pricer only produces output when the income or expense changes, it does not print anything until the total size of all bids or the total size of all asks meets or exceeds target-size.

What We're Looking For

Please write the Pricer program in one of C, C++, Java, or Scala, and send us the source code. You do not need to send us any compiler output. We encourage you to take advantage of the language's standard libraries and latest versions. Please do not use other code that you would have to download separately from your chosen language's normal distribution package.

We're looking for evidence that you can produce code that others would be able to understand, fix, and extend. At the same time, we do have real-time requirements for production code, so we frown on gratuitous inefficiency. Here are some qualities we look for:

Correctness. Obviously, the fewer bugs you write, the fewer you'll have to fix.
Clarity. If only you can understand your code, then only you can maintain it. Good code speaks for itself - a good programmer should be able to understand your implementation details without extensive comments.
Conciseness. The less code you write, the less code your coworkers need to puzzle out.
Coefficiency. OK, just efficiency, but wouldn't it be cool if all of these qualities started with a 'C'? Anyway, using less time and space is generally better than otherwise.

Of course, there are often trade-offs between each of these properties (except correctness, which is pretty much an absolute).

Also, we are a Linux shop and we like Unix-style tools: programs that process the output of other programs. So make sure your implementation of Pricer is suitable for use in a shell pipeline. Follow the I/O specifications: don't mix prompts in with your output or demand that the input come from a disk file.

In addition to supplying us with your source code, please answer these questions:

How did you choose your implementation language?
What is the time complexity for processing an Add Order message?
What is the time complexity for processing a Reduce Order message?
If your implementation were put into production and found to be too slow, what ideas would you try out to improve its performance? (Other than reimplementing it in a different language such as C or C++.)

Test Data

You can download a large amount of test input data here: pricer.in.gz. This file is compressed using gzip. You should uncompress it before feeding it to your program. This is real market data, collected from a live system, so you might want to analyze it to decide what algorithms are most appropriate to use in Pricer.

You can also download the corresponding output of our reference implementation of Pricer with various target sizes: target size 1, target size 200, target size 10000. These files are also compressed with gzip.

In case you want to test your implementation on a smaller sample of data, here is a snippet in easy-to-cut-and-paste format:

28800538 A b S 44.26 100
28800562 A c B 44.10 100
28800744 R b 100
28800758 A d B 44.18 157
28800773 A e S 44.38 100
28800796 R d 157
28800812 A f B 44.18 157
28800974 A g S 44.27 100
28800975 R e 100
28812071 R f 100
28813129 A h B 43.68 50
28813300 R f 57
28813830 A i S 44.18 100
28814087 A j S 44.18 1000
28814834 R c 100
28814864 A k B 44.09 100
28815774 R k 100
28815804 A l B 44.07 175
28815937 R j 1000
28816245 A m S 44.22 100

And here is the corresponding output:

28800758 S 8832.56
28800796 S NA
28800812 S 8832.56
28800974 B 8865.00
28800975 B NA
28812071 S NA
28813129 S 8806.50
28813300 S NA
28813830 B 8845.00
28814087 B 8836.00
28815804 S 8804.25
28815937 B 8845.00
28816245 B 8840.00

3. Limited order book basics

3.1 What is a limited order book (LOB)

A limit order book is a record of outstanding limit orders. A limit order is a type of order to buy or sell a security at a specific price or better.^{[What Is a Limit Order Book?]}

2.2 L1/L2/L3 market data

L1 market data: Provides more comprehensive by displaying the full depth of the order book. This includes multiple bid and ask prices at different levels, not just the best available prices ^{[Market Data]}
L2 market data: Provides more comprehensive by displaying the full depth of the order book. This includes multiple bid and ask prices at different levels, not just the best available prices. ^{[Market Data]} L2 data is also sometimes referred to as market-by-price (MBP), since the updates to book depth are usually keyed by price or price level. It may also be called market depth or depth of market ( DoM). ^[L2]
L3 market data: Level 3 (L3) refers to market data that provides every individual buy and sell order at every price level. This is often also the highest granularity of data available. L3 data is also called market by order or full order book data. ^[L3]

4. Factors to consider before implementing an LOB

To give some idea of the data volumes, the Nasdaq TotalView ITCH feed, which is every event in every instrument traded on the Nasdaq, can have data rates of 20+ gigabytes/day with spikes of 3 megabytes/second or more. The individual messages average about 20 bytes each so this means handling 100,000-200,000 messages per second during high volume periods. [How to Build a Fast Limit Order Book]

4.1 The business use case

Consider the below factors:
- What queries need to be optimized for your application?
- Sparsity of the book.
- Statistical distribution of book events. ^{[Red Black Trees for Limit Order Book]}
Markets/products to trade/liquidity impacts engineering decisions:
- In illiquid options, there may be very few resting orders on the book, so it may be cheaper to just store everything in arrays and linearly walk through them.
- In liquid futures, most events only affect a few hundred price levels, and price bands might give you a bound on levels that you actually care about, so it is possible to preallocate the levels in an array and represent index prices as an offset from some initial state in number of ticks. ^{[Red Black Trees for Limit Order Book]}
LOB's behavior can differ depending on whether you are implementing an LOB for equities (order-based, a.k.a. Market by Order, MBO or L3) ^{[Market by Order (MBO)]}, ^[L3] or futures (level-based, a.k.a., Market by Price or MBP) ^{[What is an efficient data structure to model order book?]}

4.2 Engineering considerations

There are three main operations that an LOB has to implement: add, cancel, and execute. The goal is to implement these operations in O(1) time while making it possible for the trading model to efficiently ask questions like:
- what are the best bid and offer?
- how much volume is there between prices A and B?
- what is order X's current position in the book? ^{[How to Build a Fast Limit Order Book]}
If you can, you should also optimize for the particular exchange. For instance, it turns out that, last I checked, Nasdaq generates order IDs incrementally starting from a small number, so you can store all the orders in a giant array instead of a hashtable. This is really cache- and TLB-friendly compared to a hashtable because most updates tend to happen to recently-dereferenced orders. ^{[What is an efficient data structure to model order book?]}
https://github.com/da-bao-jian/fast_limit_orderbook

5. Candidate data structures for LOB implementation

5.1 Binary search tree

A Binary Search Tree (BST) is a type of Binary Tree data structure, where the following properties must be true for any node "X" in the tree [1]:
- The X node's left child and all of its descendants (children, children's children, and so on) have lower values than X's value.
- The right child, and all its descendants have higher values than X's value.
- Left and right subtrees must also be Binary Search Trees.

Common operations

Insert
Search (omitted for being too simple)
Delete
- If the node to be deleted is a leaf (i.e., it has no children): omitted
- If the node to be deleted has one child: omitted
- If the node to be deleted has two children:

5.2 Heap

A Min-Heap is a Data Structure with the following properties.[2]
- It is a complete Complete Binary Tree.
- The value of the root node must be the smallest among all its descendant nodes and the same thing must be done for its left and right sub-tree also.

Common operations

Insert
Delete

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Order Book Programming Problem

Table of contents

1. The Solution

1.1 Design

1.2 Build and run

1.3 Performance benchmark

2. The Problem

Background

Problem

Input Format

Output Format

Example Input and Output

What We're Looking For

Test Data

3. Limited order book basics

3.1 What is a limited order book (LOB)

2.2 L1/L2/L3 market data

4. Factors to consider before implementing an LOB

4.1 The business use case

4.2 Engineering considerations

5. Candidate data structures for LOB implementation

5.1 Binary search tree

Common operations

5.2 Heap

Common operations

About

Uh oh!

Releases

Packages

Languages

alex-lt-kong/order-book-programming-problem

Folders and files

Latest commit

History

Repository files navigation

Order Book Programming Problem

Table of contents

1. The Solution

1.1 Design

1.2 Build and run

1.3 Performance benchmark

2. The Problem

Background

Problem

Input Format

Output Format

Example Input and Output

What We're Looking For

Test Data

3. Limited order book basics

3.1 What is a limited order book (LOB)

2.2 L1/L2/L3 market data

4. Factors to consider before implementing an LOB

4.1 The business use case

4.2 Engineering considerations

5. Candidate data structures for LOB implementation

5.1 Binary search tree

Common operations

5.2 Heap

Common operations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages