|
| 1 | +// Package indexes provides the index management subsystem for Chotki. |
| 2 | +// |
| 3 | +// # Overview |
| 4 | +// |
| 5 | +// IndexManager keeps two kinds of indexes for class data: |
| 6 | +// |
| 7 | +// 1. Fullscan index (implicit per class) |
| 8 | +// A chronological list of all object IDs that belong to a class. It is |
| 9 | +// optimized for scanning a whole class. No range or value queries; O(n). |
| 10 | +// |
| 11 | +// 2. Hashtable index (opt-in per field) |
| 12 | +// A hash from a field value to the object ID that holds it. Lookups are |
| 13 | +// O(1) on average. Only one object per class+field value is allowed; |
| 14 | +// inserting a duplicate returns ErrHashIndexUinqueConstraintViolation. |
| 15 | +// |
| 16 | +// # Key layout in Pebble |
| 17 | +// |
| 18 | +// All keys start with 'I'. |
| 19 | +// |
| 20 | +// - Fullscan index: "IF" + class_id + object_id + 'T' -> empty value |
| 21 | +// |
| 22 | +// - Hashtable index: "IH" + class_id + field_id(u32, BE) + hash(u64, BE) + |
| 23 | +// 'E' -> TLV-encoded set of object RIDs. We still enforce uniqueness, so at |
| 24 | +// most one RID remains for a value. |
| 25 | +// |
| 26 | +// - Reindex tasks: "IT" + class_id + 'M' -> TLV-encoded map |
| 27 | +// Keys are field indices (u32). Values store task state and last update. |
| 28 | +// |
| 29 | +// # Integration with writes |
| 30 | +// |
| 31 | +// IndexManager runs on the write path. Index updates are written in the same |
| 32 | +// Pebble batch as the object change. A write either commits object+index |
| 33 | +// together or not at all. This keeps data and index consistent, even on |
| 34 | +// crashes. Changes can arrive in two ways: |
| 35 | +// |
| 36 | +// Realtime synchronization (live) |
| 37 | +// |
| 38 | +// - Object create/update events arrive in order on an already |
| 39 | +// initialized replica. |
| 40 | +// - AddFullScanIndex appends a class membership entry for a new object. |
| 41 | +// - OnFieldUpdate checks if a field is indexed. If yes, it hashes the FIRST |
| 42 | +// payload and merges the mapping into the hashtable index. Both changes are |
| 43 | +// in the same batch, so the index matches the object data. |
| 44 | +// |
| 45 | +// Diff synchronization (bootstrap/catch-up) |
| 46 | +// |
| 47 | +// - During bootstrap or a large diff, classes and objects may arrive in the |
| 48 | +// same window. |
| 49 | +// - Early in sync, ClassFields may not be readable yet. Then OnFieldUpdate |
| 50 | +// cannot tell if a field is indexed, so it cannot update the hashtable |
| 51 | +// safely. |
| 52 | +// - In that case we enqueue a reindex task for class+field (in the same |
| 53 | +// batch as the object write) and skip the hashtable write. Fullscan entries |
| 54 | +// are still added for new objects. |
| 55 | +// - After sync completes, CheckReindexTasks runs the task. It scans objects |
| 56 | +// via fullscan and rebuilds the hashtable. Indexes catch up without ever |
| 57 | +// writing partial entries. |
| 58 | + |
| 59 | +// Consistency guarantee |
| 60 | +// |
| 61 | +// - Realtime: object data and indexes commit in one batch. |
| 62 | +// - Diff sync: if inline indexing is not possible, we only enqueue a reindex |
| 63 | +// task (in the same batch) and defer index writes. The background reindex |
| 64 | +// rebuilds from a snapshot. |
| 65 | +// |
| 66 | +// The index never contradicts committed object data. |
| 67 | +// |
| 68 | +// Query helpers |
| 69 | +// |
| 70 | +// - SeekClass iterates object IDs that belong to a class using the fullscan |
| 71 | +// index. |
| 72 | +// |
| 73 | +// - GetByHash resolves a class+field+value (FIRST payload) to the object ID |
| 74 | +// using the hashtable index. A small in-memory LRU cache accelerates |
| 75 | +// repeat lookups. |
| 76 | +// |
| 77 | +// # Reindexing lifecycle |
| 78 | +// |
| 79 | +// Index definitions are part of class definitions. When a class is created or |
| 80 | +// updated (e.g., an index is added or removed for a field), HandleClassUpdate |
| 81 | +// emits reindex tasks that are persisted under the task key. A background |
| 82 | +// scanner (CheckReindexTasks) monitors tasks and runs them via runReindexTask. |
| 83 | +// |
| 84 | +// Task states are stored as bytes and surfaced via Prometheus metrics. The |
| 85 | +// lifecycle is: |
| 86 | +// |
| 87 | +// - Pending: task is scheduled and will be picked up |
| 88 | +// - InProgress: task is running |
| 89 | +// - Done: task finished successfully |
| 90 | +// - Remove: index was deleted; task stays in this state and is ignored |
| 91 | +// |
| 92 | +// A reindex pass operates on a consistent snapshot and performs two phases: |
| 93 | +// |
| 94 | +// 1. Repair missing entries: for every object in the class (via fullscan), |
| 95 | +// compute the hash for the field and ensure the corresponding hashtable |
| 96 | +// entry exists. |
| 97 | +// |
| 98 | +// 2. Remove stale entries: scan the index keys for the class+field and |
| 99 | +// drop entries that point to non-existent objects, to non-FIRST values, or |
| 100 | +// to values whose hash no longer matches the object field. |
| 101 | +// |
| 102 | +// When an index is removed from a field definition, the manager deletes the |
| 103 | +// corresponding IH range and then sets the task to Remove. We keep the task |
| 104 | +// (do not delete it) and simply ignore it later. "Done" tasks may be |
| 105 | +// periodically rescheduled for self-healing; "Remove" tasks are not. |
| 106 | +// |
| 107 | +// # Caching and concurrency |
| 108 | +// |
| 109 | +// The manager keeps small caches: |
| 110 | +// - classCache: object_id -> class_id |
| 111 | +// - hashIndexCache: (class_id, field_id, value) -> object_id |
| 112 | +// |
| 113 | +// Writes to the same class field index are serialized with a per-field mutex. |
| 114 | +// |
| 115 | +// # Metrics |
| 116 | +// |
| 117 | +// Prometheus metrics report task counts, states, durations, and results. |
| 118 | +package indexes |
0 commit comments