-
Notifications
You must be signed in to change notification settings - Fork 2
Posting List Concepts
For simplicity and flexibility, we distinct two different types of inverted lists:
- document list,
- payload list.
A document list contains document IDs (implementation of the underlying structure may vary and is beyond the scope of this document). Typically, that means we can perform operations such as next or nextgeq (details below).
A payload list contains any data related to the documents in a particular document list. Examples are: document frequency list, BM25 score list, and query likelihood list. As opposed to a document list, a payload list does not support nexteq but must provide an align method that moves the pointer to where the document list points.
For many reasons (which might be written down another time), no inheritance is used. Instead, we rely on templates for compile-time polymorphism, and type erasure (if needed) for run-time polymorphism.
A document list view is an object containing information needed to traverse a document list. For example, such view might decode a list block boundaries to later decode one block at a time when needed. (However, it might as well decompress all data right away, this is an implementation detail.)
A document list defines the following typedefs.
| Type | Description |
|---|---|
value_type |
the type of values stored (usually compressed) in the list, i.e., the type of document ID, typically long
|
iterator_type |
the type of iterator (more below) |
A document list provides the following operations.
| Operation | Returns |
|---|---|
begin() |
an iterator_type pointing to the first document |
end() |
an iterator_type pointing to the position after the last document |
lookup(value_type doc) |
an iterator_type pointing to doc
|
A document iterator takes care of traversing, skipping, decoding blocks, etc. Since it is read-only, this is a const iterator.
All iterators define the following types:
| Type | Description |
|---|---|
value_type |
the type of values being iterated, typically long
|
reference |
the type of dereferenced value, e.g., const long&
|
We distinguish two concepts of iterators (more to come?).
Let i and j be instances of It, which implements the forward document iterator concept.
| Operation | Returns |
|---|---|
i != j |
bool value that is true when i and j are at different positions |
*i |
a reference to the current value |
++i |
an It& to i moved by one position |
i++ (see comment below) |
a copy of i before moving it by one position |
Comment regarding i++: If we agreed to use only ++i, i++ could become optional; not sure if we need it or not, but also such agreement could quickly get confused. One thing I know is that if one implements i++, it should follow the usual copy-before-increment semantic.
It implements all of the above, plus the following:
| Operation | Returns |
|---|---|
i.moveto(value_type doc) |
an It& to i after moving it to doc or first greater, or position after last element if no elements greater or equal to doc exist |
i.nextgeq(value_type doc) |
equivalent to copying j = i and moving j.moveto(doc)
|