Skip to content

Posting List Concepts

Michał Siedlaczek edited this page Jun 9, 2018 · 7 revisions

Introduction

For simplicity and flexibility, we distinct two different types of inverted lists:

  • document list,
  • payload list.

A document list contains document IDs (implementation of the underlying structure may vary and is beyond the scope of this document). Typically, that means we can perform operations such as next or nextgeq (details below).

A payload list contains any data related to the documents in a particular document list. Examples are: document frequency list, BM25 score list, and query likelihood list. As opposed to a document list, a payload list does not support nexteq but must provide an align method that moves the pointer to where the document list points.

Polymorphism

For many reasons (which might be written down another time), no inheritance is used. Instead, we rely on templates for compile-time polymorphism, and type erasure (if needed) for run-time polymorphism.

Document List

A document list view is an object containing information needed to traverse a document list. For example, such view might decode a list block boundaries to later decode one block at a time when needed. (However, it might as well decompress all data right away, this is an implementation detail.)

A document list defines the following typedefs.

Type Description
value_type the type of values stored (usually compressed) in the list, i.e., the type of document ID, typically long
iterator_type the type of iterator (more below)

A document list provides the following operations.

Operation Returns
begin() an iterator_type pointing to the first document
end() an iterator_type pointing to the position after the last document
lookup(value_type doc) an iterator_type pointing to doc

Document Iterators

A document iterator takes care of traversing, skipping, decoding blocks, etc. Since it is read-only, this is a const iterator.

All iterators define the following types:

Type Description
value_type the type of values being iterated, typically long
reference the type of dereferenced value, e.g., const long&

We distinguish two concepts of iterators (more to come?).

Forward Document Iterator

Let i and j be instances of It, which implements the forward document iterator concept.

Operation Returns
i != j bool value that is true when i and j are at different positions
*i a reference to the current value
++i an It& to i moved by one position
i++ (see comment below) a copy of i before moving it by one position

Comment regarding i++: If we agreed to use only ++i, i++ could become optional; not sure if we need it or not, but also such agreement could quickly get confused. One thing I know is that if one implements i++, it should follow the usual copy-before-increment semantic.

Skipping Document Iterator

It implements all of the above, plus the following:

Operation Returns
i.moveto(value_type doc) an It& to i after moving it to doc or first greater, or position after last element if no elements greater or equal to doc exist
i.nextgeq(value_type doc) equivalent to copying j = i and moving j.moveto(doc)

Clone this wiki locally