Skip to content

Commit 02b91cd

Browse files
author
Arseny Kositsyn
committed
[PGPRO-12159] Added a description of rum_debug_funcs in README.md
Tags: rum
1 parent 14bfe58 commit 02b91cd

File tree

1 file changed

+128
-0
lines changed

1 file changed

+128
-0
lines changed

README.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,134 @@ For type: `anyarray`
306306
This operator class stores `anyarray` elements with any supported by module
307307
field.
308308

309+
## Functions for low-level inspect of the RUM index pages
310+
311+
The RUM index provides several functions for low-level research of all types of its pages:
312+
313+
### `rum_metapage_info(rel_name text, blk_num int4) returns record`
314+
315+
`rum_metapage_info` returns information about a RUM index metapage. For example:
316+
317+
```SQL
318+
SELECT * FROM rum_metapage_info('rum_index', 0);
319+
-[ RECORD 1 ]----+-----------
320+
pending_head | 4294967295
321+
pending_tail | 4294967295
322+
tail_free_size | 0
323+
n_pending_pages | 0
324+
n_pending_tuples | 0
325+
n_total_pages | 87
326+
n_entry_pages | 80
327+
n_data_pages | 6
328+
n_entries | 1650
329+
version | 0xC0DE0002
330+
```
331+
332+
### `rum_page_opaque_info(rel_name text, blk_num int4) returns record`
333+
334+
`rum_page_opaque_info` returns information about a RUM index opaque area: `left` and `right` links, `maxoff` -- the number of elements that are stored on the page (this parameter is used differently for different types of pages), `freespace` -- free space on the page.
335+
336+
For example:
337+
338+
```SQL
339+
SELECT * FROM rum_page_opaque_info('rum_index', 10);
340+
leftlink | rightlink | maxoff | freespace | flags
341+
----------+-----------+--------+-----------+--------
342+
6 | 11 | 0 | 0 | {leaf}
343+
```
344+
345+
### `rum_internal_entry_page_items(rel_name text, blk_num int4) returns set of record`
346+
347+
`rum_internal_entry_page_items` returns information that is stored on the internal pages of the entry tree (it is extracted from `IndexTuples`). For example:
348+
349+
```SQL
350+
SELECT * FROM rum_internal_entry_page_items('rum_index', 1);
351+
key | attrnum | category | down_link
352+
---------------------------------+---------+------------------+-----------
353+
3d | 1 | RUM_CAT_NORM_KEY | 3
354+
6k | 1 | RUM_CAT_NORM_KEY | 2
355+
a8 | 1 | RUM_CAT_NORM_KEY | 4
356+
...
357+
Tue May 10 21:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 83
358+
Sat May 14 19:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 84
359+
Wed May 18 17:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 85
360+
+inf | | | 86
361+
(79 rows)
362+
```
363+
364+
RUM (like GIN) on the internal pages of the entry tree packs the downward link and the key in pairs of the following type: `(P_n, K_{n+1})`. It turns out that there is no key for `P_0` (it is assumed to be equal to `-inf`), and for the last key `K_{n+1}` there is no downward link (it is assumed that it is the largest key (or high key) in the subtree to which the `P_n` link leads). For this reason (the key is `+inf` because it is the rightmost page at the current level of the tree), in the example above, the last line contains the key `+inf` (this key does not have a downward link).
365+
366+
### `rum_leaf_data_page_items(rel_name text, blk_num int4) returns set of record`
367+
368+
`rum_leaf_data_page_items` returns information that is stored on the entry tree leaf pages (it is extracted from compressed posting lists). For example:
369+
370+
```SQL
371+
SELECT * FROM rum_leaf_entry_page_items('rum_index', 10);
372+
key | attrnum | category | tuple_id | add_info_is_null | add_info | is_postring_tree | postring_tree_root
373+
-----+---------+------------------+----------+------------------+----------+------------------+--------------------
374+
ay | 1 | RUM_CAT_NORM_KEY | (0,16) | t | | f |
375+
ay | 1 | RUM_CAT_NORM_KEY | (0,23) | t | | f |
376+
ay | 1 | RUM_CAT_NORM_KEY | (2,1) | t | | f |
377+
...
378+
az | 1 | RUM_CAT_NORM_KEY | (0,15) | t | | f |
379+
az | 1 | RUM_CAT_NORM_KEY | (0,22) | t | | f |
380+
az | 1 | RUM_CAT_NORM_KEY | (1,4) | t | | f |
381+
...
382+
b9 | 1 | RUM_CAT_NORM_KEY | | | | t | 7
383+
...
384+
(1602 rows)
385+
```
386+
387+
Each posting list is an `IndexTuple` that stores the key value and a compressed list of `tids`. In the function `rum_leaf_data_page_items`, the key value is attached to each `tid` for convenience, but on the page it is stored in a single instance.
388+
389+
If the number of `tids` is too large, then instead of a posting list, a posting tree will be used for storage. In the example above, a posting tree was created (the key in the posting tree is the `tid`) for the key with the value `b9`. In this case, instead of the posting list, the magic number and the page number, which is the root of the posting tree, are stored inside the `IndexTuple`.
390+
391+
### `rum_internal_data_page_items(rel_name text, blk_num int4) returns set of record`
392+
393+
`rum_internal_data_page_items` returns information that is stored on the internal pages of the posting tree (it is extracted from arrays of `PostingItem` structures). For example:
394+
395+
```SQL
396+
SELECT * FROM rum_internal_data_page_items('rum_index', 7);
397+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
398+
-------------+--------------+----------+------------------+----------
399+
t | | (0,0) | t |
400+
f | 9 | (138,79) | t |
401+
f | 8 | (0,0) | t |
402+
(3 rows)
403+
```
404+
405+
Each element on the internal pages of the posting tree contains the high key (`tid`) value for the child page and a link to this child page (as well as additional information if it was added when creating the index).
406+
407+
At the beginning of the internal pages of the posting tree, the high key of this page is always stored (if it has the value `(0,0)`, this is equivalent to `+inf`; this is always performed if the page is the rightmost).
408+
409+
At the moment, RUM does not support storing (as additional information) the data type that is pass by reference on the internal pages of the posting tree. Therefore, this output is possible:
410+
411+
```SQL
412+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
413+
-------------+--------------+----------+------------------+------------------------------------------------
414+
...
415+
f | 23 | (39,43) | f | varlena types in posting tree is not supported
416+
f | 22 | (74,9) | f | varlena types in posting tree is not supported
417+
...
418+
```
419+
420+
### `rum_leaf_entry_page_items(rel_name text, blk_num int4) returns set of record`
421+
422+
`rum_leaf_entry_page_items` the function returns information that is stored on the leaf pages of the postnig tree (it is extracted from compressed posting lists). For example:
423+
424+
```SQL
425+
SELECT * FROM rum_leaf_data_page_items('rum_idx', 9);
426+
is_high_key | tuple_id | add_info_is_null | add_info
427+
-------------+-----------+------------------+----------
428+
t | (138,79) | t |
429+
f | (0,9) | t |
430+
f | (1,23) | t |
431+
f | (3,5) | t |
432+
f | (3,22) | t |
433+
```
434+
435+
Unlike entry tree leaf pages, on posting tree leaf pages, compressed posting lists are not stored in an `IndexTuple`. The high key is the largest key on the page.
436+
309437
## Todo
310438

311439
- Allow multiple additional information (lexemes positions + timestamp).

0 commit comments

Comments
 (0)