Skip to content

Vinyl Disk Layout

Roman Tsisyk edited this page Mar 28, 2017 · 45 revisions

Tarantool 1.7.4 has the following disk layout:

├── <wal_dir>
    ├── 00000000000000000000.xlog
    ├── 00000000000000000047.xlog
    ├── 00000000000000000050.xlog
    ├── <wal_lsn>.xlog
    ├── 00000000000000000000.xctl
    ├── 00000000000000000050.xctl
    ├── <checkpoint_lsn>.xctl

├── <memtx_dir>
    ├── 00000000000000000000.snap
    ├── 00000000000000000050.snap
    └── <checkpoint_lsn>.snap

├── <vinyl_dir>
    └── 512 <!-- space_id
        ├── 0 <!-- primary key
        |    ├── 00000000000000000000.index
        |    ├── 00000000000000000000.run
        |    ├── 00000000000000000055.index
        |    ├── 00000000000000000055.run
        |    ├── <dump_lsn>.index
        |    └── <dump_lsn>.run
        ├── 1 <!-- secondary index
        |    ├── 00000000000000000000.index
        |    ├── 00000000000000000000.run
        |    ├── 00000000000000000032.index
        |    ├── 00000000000000000032.run
        |    ├── <dump_lsn>.index
        |    └── <dump_lsn>.run
  • .xlog - write-ahead-log (common for all storage engines).
  • .snap - consistent snapshot of all tuples from all Memtx spaces.
  • .run - consistent snapshot of all tuples from a Vinyl range, like SST in LevelDB terminology. Contains tuples ordered by the key definition and grouped by pages.
  • .index - contains the index of all pages in corresponding .run file and general information about this run.
  • .xctl - physical journal of all operations with .run and .index files. .xlog, .snap, .index files will be stored in this journal in the future versions of Tarantool.
  • .xctlsnap - consistent snapshot of .xctl journal.

The .index file

Current format:

INDEX
0.13
Server: 39887eac-7447-4d74-bd54-485484b9887a
VClock: {}

<FIXHEADER>
<run_info>
<page_info>
...
<page_info>
<EOF>

Proposed format:

INDEX
0.13
Version: 1.7.4
Server: 39887eac-7447-4d74-bd54-485484b9887a

<FIXHEADER>
<run_info>
<page_info>
...
<page_info>
<EOF>

Changes:

  • Add Version: 1.7.4;
  • Remove VClock:;
  • Move <run_info> into a separate <FIXHEADER>.

run_info

run_info is a xrow which contains general information about a Vinyl's run.

Current format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_REPLACE
    • IPROTO_LSN: unsigned = run_info->min_lsn
  • xrow body: map
    • IPROTO_TUPLE: array
      • 0: map:
        • VY_RUN_MIN_LSN: unsigned = run_info->min_lsn
        • VY_RUN_MAX_LSN: unsinged = run_info->max_lsn
        • VY_RUN_PAGE_COUNT: unsinged = run_info->cou
        • VY_RUN_BLOOM: map
          • VY_RUN_BLOOM_TABLE_SIZE: unsinged
          • VY_RUN_BLOOM_HASH_COUNT: unsinged
          • VY_RUN_BLOOM_VERSION: unsinged
          • VY_RUN_BLOOM_TABLE: raw

Proposed format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = VY_INDEX_RUN_INFO = 100
  • xrow body: map
    • VY_RUN_MIN_LSN = 1: unsigned = run_info->min_lsn
    • VY_RUN_MAX_LSN = 2: unsinged = run_info->max_lsn
    • VY_RUN_PAGE_COUNT = 3: unsinged = run_info->count
    • VY_RUN_BLOOM = 4: array
      • 0: unsigned = bloom->table_size
      • 1: unsigned = bloom->hash_count
      • 2: raw = raw bloom filter table in bigindian format

Changes:

  • Remove a map in a map in an array in a map overengineering;
  • Re-enumerate xrow body keys;
  • Convert VY_RUN_BLOOM into an array.

page_info

page_info is a xrow which contains information about a page in .run file.

Current format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_REPLACE
  • xrow body: map
    • IPROTO_TUPLE: array
      • 0: unsigned = page_info->offset;
      • 1: unsigned = page_info->size;
      • 2: map:
        • VY_PAGE_REQUEST_COUNT: unsigned = page_info->request_count
        • VY_PAGE_MIN_KEY: array
        • VY_PAGE_DATA_SIZE: unsigned
        • VY_PAGE_ROW_INDEX_OFFSET: unsigned

Proposed format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = VY_INDEX_PAGE_INFO = 101
  • xrow body: map
    • VY_PAGE_OFFSET: unsigned = page_info->offset;
    • VY_PAGE_SIZE: unsigned = page_info->size;
    • VY_PAGE_UNPACKED_SIZE: unsigned = page_info->unpacked_size;
    • VY_PAGE_REQUEST_COUNT: unsigned = page_info->request_count
    • VY_PAGE_MIN_KEY: array
    • VY_PAGE_DATA_SIZE: unsigned
    • VY_PAGE_INDEX_OFFSET: unsigned <!-- an offset to row index, see below

Changes:

  • Remove a map in a map in an array in a map overengineering;
  • Re-enumerate xrow body keys;
  • Rename VY_PAGE_ROW_INDEX_OFFSET into VY_PAGE_INDEX_OFFSET.

The .run file

Current format:

RUN
0.13
Server: 39887eac-7447-4d74-bd54-485484b9887a
VClock: {}

<FIXHEADER> <!-- a page
<stmt>
..
<stmt>
<page_index>
...
<FIXHEADER>
<stmt>
..
<stmt>
<page_index>
<EOF>

Proposed format:

RUN
0.13
Version: 1.7.4
Server: 39887eac-7447-4d74-bd54-485484b9887a

<FIXHEADER> <!-- a page
<stmt>
..
<stmt>
<page_index>
...
<FIXHEADER>
<stmt>
..
<stmt>
<page_index>
<EOF>

Changes:

  • Add Version: 1.7.4;
  • Remove VClock: {}.

stmt

stmt is a xrow which contains a single database operation in the format similar to WAL.

Current format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_REPLACE|IPROTO_UPSERT|IPROTO_DELETE
    • IPROTO_LSN: stmt->lsn
  • xrow body: map
    • IPROTO_SPACE_ID: unsigned = key_def->space_id;
    • IPROTO_INDEX_ID: unsigned = key_def->id;
    • IPROTO_TUPLE: array -- REPLACE or UPSERT
    • IPROTO_KEY: array -- DELETE only
    • IPROTO_OPS: array -- UPSERT only

Proposed format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_REPLACE|UPSERT|DELETE
    • IPROTO_LSN: stmt->lsn
  • xrow body: map
    • IPROTO_TUPLE: array -- for REPLACE or UPSERT
    • IPROTO_KEY: array -- for DELETE only
    • IPROTO_OPS: array -- for UPSERT only

Changes:

  • Remove IPROTO_SPACE_ID and IPROTO_INDEX_ID to save space.

page_index

page_index - page index is a xrow which contains offsets for the current Vinyl page.

Current format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_REPLACE
  • xrow body: map
    • IPROTO_TUPLE: array
      • 0: raw = row index in big endian

Proposed format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = VY_RUN_PAGE_INDEX = 102
  • xrow body: raw = row index in big endian

Changes:

  • Remove a raw in an array in a map overengineering.

The .xctl file

Current format:

VYMETA
0.13
Server: 39887eac-7447-4d74-bd54-485484b9887a

<FIXHEADER>
<xctl_request>
...
<FIXHEADER>
<xctl_request>
<EOF>

Proposed format:

XCTL
0.13
Version: 1.7.4
Server: 39887eac-7447-4d74-bd54-485484b9887a

<FIXHEADER>
<xctl_request>
...
<FIXHEADER>
<xctl_request>
<EOF>

Changes:

  • Add Version: 1.7.4 instead of v13;
  • Remove VClock: {};
  • Rename VYMETA to XCTL.

xctl_request

Current format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = IPROTO_INSERT
  • xrow body: map
    • IPROTO_TUPLE: array
      • 0: unsigned = record->type;
      • 1: map:
        • VY_LOG_KEY_INDEX_ID: unsigned = record->index_id
        • VY_LOG_KEY_RANGE_ID: unsigned = record->range_id
        • VY_LOG_KEY_RUN_ID: unsigned = record->run_id
        • VY_LOG_KEY_RANGE_BEGIN: tuple
        • VY_LOG_KEY_RANGE_END: tuple

Proposed format:

  • xrow header: map
    • IPROTO_REQUEST_TYPE: unsigned = record->type
  • xrow body: map
    • VY_XCTL_PATH: unsigned = record->path <!-- a relative path to the file
    • VY_XCTL_RUN_ID: unsigned = record->run_id
    • VY_XCTL_SPACE_ID: unsigned = record->space_id
    • VY_XCTL_INDEX_ID: unsigned = record->index_id
    • VY_XCTL_RANGE_ID: unsigned = record->range_id
    • VY_XCTL_RANGE_BEGIN: array
    • VY_XCTL_RANGE_END: array

Changes:

  • Remove a map in an array in a map overengineering;
  • Re-enumerate xrow body keys;
  • Rename VY_LOG_KEY to VY_XCTL.

TODO

  • Assign numbers for all VY_XXX keys:
    • Can VY_XXX keys intersect with IPROTO_XXX keys?
  • Think how to use .mlog for Memtx and WAL;
  • Think how to re-design or remove page_index;
  • Patch xlog module to support RUN, INDEX, MLOG, MSNAP files;
  • Add a test case in the same way as xlog/upgrade.test.lua.

Clone this wiki locally