-
Notifications
You must be signed in to change notification settings - Fork 0
Memory size calculation
There is a frequently asked question about how much memory Tarantool will use with a specified data set. In this page I'll try to give a small guide about memory usage.
Data memory consists of two parts - a tuple arena and index data. Tuple arena stores tuples that shared between all indexes of the same space.
First of all, Tarantool stores given data in msgpack arrays (http://msgpack.org/). Here are some examples of memory cost for some values:
- 42 - 1 byte (prefixless optimization for short integers)
- 100000 - 5 bytes (1 byte prefix + uint32)
- 10000000000 - 9 bytes (1 byte prefix + uint64, the largest size of msgpacked integer)
- "abc" - 4 bytes (1 byte prefix + 3 symbols string)
- {42} - 2 bytes (1 byte array prefix + 1 byte 42)
Thus the size of msgpack (bsize) of a tuple {"abc", 100000, 42} will be 1 + 4 + 5 + 1 = 11 bytes.
Then every tuple has 12 byte system prefix ( that is sizeof(struct tuple) ).
Then every tuple stores 4 byte offsets for every indexed field except the first field. Here are some examples of offset cost:
- One index parts = {1, 'uint'} - 0
- One index parts = {2, 'uint'} - 4
- One index parts = {1, 'uint', 2, 'uint'} - 4
- One index parts = {2, 'uint', 3, 'uint'} - 8
- Primary index parts = {1, 'uint', 2, 'uint'}, secondary parts = {2, 'uint', 1, 'uint'} - 4
- Primary index parts = {2, 'uint', 3, 'uint'}, secondary parts = {3, 'uint', 2, 'uint'} - 8
Then the total size is rounded up to cfg.slab_alloc_minimal (usually 16, so that doesn't matter)
Then the tuples less than 128 bytes are aligned up to 8 (i.e. rounded up to the closest multiple of 8). The tuples with greater sizes are rounded up in some complex way, usual losses are about 5%
Indexes stores pointers to tuples in shared tuple arena and thus memory cost depends only on number of records in index. Tree index costs about 10 bytes per record Hash index costs about 16 bytes per record Note that the both indexes reserve 48kB during first insert, but asymptotically that is negligible.
During a snapshot process Tarantool does not delete tuples needed for snapshot read view. One can calculate the number of replaces/updates/etc that could be done during the snapshot process and calculate tuple/index cost for that amount.
We store about 500 000 000 records in Tarantool that consist of tuples like {ID, email}, where ID usually fits to uint32 and email is a string with 20 characters on average.
We have hash index by ID and tree index by email.
mgpack will be 1 + (1 + 4) + (1 + 20) = 27 bytes on average
+12 bytes header = 39
+4 bytes as a cost of an offset for the second index = 43
round up - 48 bytes
+16 bytes for hash index
+10 bytes for tree index
Total 74 bytes
We have about 10 krps and even if the snapshot process lasted for 5 minutes, we would have to store additionally 5 * 60 * 10000 = 3 000 000 records.
Total cost is: 74 * (500 000 000 + 3 000 000) = 38GB
It's better to reserve about 10% and allow tarantool to use 42GB
Architecture Specifications
- Server architecture
- Feature specifications
- What's in a good specification
- Functional indexes
- Space _index structure
- R tree index quick start and usage
- LuaJIT
- Vinyl
- SQL
- Testing
- Performance
How To ...?
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug Lua state with GDB
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
Lua modules
Useful links