triblespace
diff --git a/‎ARCHITECTURE.md‎
Lines changed: 0 additions & 24 deletions b/‎ARCHITECTURE.md‎
Lines changed: 0 additions & 24 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎INVENTORY.md‎
Lines changed: 6 additions & 8 deletions b/‎INVENTORY.md‎
Lines changed: 6 additions & 8 deletions
diff --git a/‎columnar/columnar-cli-inspect/Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎columnar/columnar-cli-inspect/Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎columnar/columnar-cli-inspect/src/main.rs‎
Lines changed: 1 addition & 1 deletion b/‎columnar/columnar-cli-inspect/src/main.rs‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/aggregation.rs‎
Lines changed: 1 addition & 2 deletions b/‎examples/aggregation.rs‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎examples/basic_search.rs‎
Lines changed: 3 additions & 29 deletions b/‎examples/basic_search.rs‎
Lines changed: 3 additions & 29 deletions
diff --git a/‎examples/custom_tokenizer.rs‎
Lines changed: 2 additions & 5 deletions b/‎examples/custom_tokenizer.rs‎
Lines changed: 2 additions & 5 deletions
diff --git a/‎examples/date_time_field.rs‎
Lines changed: 3 additions & 17 deletions b/‎examples/date_time_field.rs‎
Lines changed: 3 additions & 17 deletions
diff --git a/‎examples/deleting_updating_documents.rs‎
Lines changed: 8 additions & 37 deletions b/‎examples/deleting_updating_documents.rs‎
Lines changed: 8 additions & 37 deletions
@@ -106,7 +106,6 @@ The schema defines all of the fields that the indexes [`Document`](src/schema/do
 
 Depending on the type of the field, you can decide to
 
-- put it in the docstore
 - store it as a fast field
 - index it
 
@@ -135,29 +134,6 @@ This conversion is done by the serializer.
 Finally, the reader is in charge of offering an API to read on this on-disk read-only representation.
 In tantivy, readers are designed to require very little anonymous memory. The data is read straight from an mmapped file, and loading an index is as fast as mmapping its files.
 
-## [store/](src/store): Here is my DocId, Gimme my document
-
-The docstore is a row-oriented storage that, for each document, stores a subset of the fields
-that are marked as stored in the schema. The docstore is compressed using a general-purpose algorithm
-like LZ4.
-
-**Useful for**
-
-In search engines, it is often used to display search results.
-Once the top 10 documents have been identified, we fetch them from the store, and display them or their snippet on the search result page (aka SERP).
-
-**Not useful for**
-
-Fetching a document from the store is typically a "slow" operation. It usually consists in
-
-- searching into a compact tree-like data structure to find the position of the right block.
-- decompressing a small block
-- returning the document from this block.
-
-It is NOT meant to be called for every document matching a query.
-
-As a rule of thumb, if you hit the docstore more than 100 times per search query, you are probably misusing tantivy.
-
 ## [fastfield/](src/fastfield): Here is my DocId, Gimme my value
 
 Fast fields are stored in a column-oriented storage that allows for random access.
 
@@ -12,8 +12,13 @@ have been removed to keep the changelog focused on Yeehaw's history.
 - update examples to import the `yeehaw` crate instead of `tantivy`.
 - run preflight tests without enabling the `unstable` feature.
 - handle unknown column codes gracefully in `ColumnarReader::iter_columns`.
+- rewrite doctests and examples to import the `yeehaw` crate directly.
 
 ## Features/Improvements
+- drop docstore module and references in preparation for trible.space rewrite.
+- purge remaining docstore references from core modules and tests.
+- remove docstore-dependent code from examples.
+- drop binary document serializer/deserializer now that docstore is gone.
 - remove `quickwit` feature flag and related async code.
 - add docs/example and Vec<u32> values to sstable [#2660](https://github.com/quickwit-oss/yeehaw/pull/2660)(@PSeitz)
 - Add string fast field support to `TopDocs`. [#2642](https://github.com/quickwit-oss/yeehaw/pull/2642)(@stuhood)
@@ -32,3 +37,4 @@ have been removed to keep the changelog focused on Yeehaw's history.
 - expand documentation for document deserialization traits.
 - reorder inventory tasks to prioritize fixing doctest regressions.
 - remove `quickwit` feature and associated asynchronous APIs.
+- remove obsolete document type codes.
@@ -20,23 +20,19 @@ This document outlines the long term plan to rewrite this project so that it rel
    - Replace the `Directory` abstraction with a backend that reads and writes blobs via the Trible Space `BlobStore`.
    - Index writers and readers operate on blob handles instead of filesystem paths.
 
-3. **Drop the docstore module**
-   - Primary documents are kept in Trible Space; segments no longer store their own row oriented docstore.
-   - Search results fetch documents via blob handles.
-
-4. **Remove `Opstamp` and use commit handles**
+3. **Remove `Opstamp` and use commit handles**
    - Commits record the segments they include.
    - Merges rely on commit ancestry instead of monotonic operation stamps.
 
-5. **Introduce 128-bit IDs with `Universe` mapping**
+4. **Introduce 128-bit IDs with `Universe` mapping**
    - Map external `u128` identifiers to compact `DocId` values.
    - Persist the mapping so search results can translate back.
 
-6. **Typed DSL for fuzzy search**
+5. **Typed DSL for fuzzy search**
    - Generate search filters from Trible namespaces.
    - Provide macros that participate in both `find!` queries and full text search.
 
-7. **Index update merge workflow**
+6. **Index update merge workflow**
    - Wrap indexing operations in workspace commits.
    - Use Trible's compare-and-swap push mechanism so multiple writers merge gracefully.
 
@@ -59,3 +55,5 @@ This inventory captures the direction of the rewrite and the major tasks require
     - Migrate inline benchmarks to a stable harness so the `unstable` feature can be tested on stable Rust.
 15. **Evaluate removing sstable term dictionary and crate now that `quickwit` feature is gone**
     - Determine whether the `sstable` crate should remain in the workspace or be extracted.
+16. **Prune obsolete document type codes** *(done)*
+    - Removed unused `type_codes` constants after dropping docstore serialization.
@@ -5,7 +5,7 @@ edition = "2021"
 license = "MIT"
 
 [dependencies]
-tantivy = {path="../..", package="tantivy"}
+yeehaw = {path="../.."}
 columnar = {path="../", package="tantivy-columnar"}
 common = {path="../../common", package="tantivy-common"}
 
 
@@ -2,7 +2,7 @@ use columnar::ColumnarReader;
 use common::file_slice::{FileSlice, WrapFile};
 use std::io;
 use std::path::Path;
-use tantivy::directory::footer::Footer;
+use yeehaw::directory::footer::Footer;
 
 fn main() -> io::Result<()> {
     println!("Opens a columnar file written by tantivy and validates it.");
 
@@ -37,8 +37,7 @@ fn main() -> yeehaw::Result<()> {
                 .set_index_option(IndexRecordOption::WithFreqs)
                 .set_tokenizer("raw"),
         )
-        .set_fast(None)
-        .set_stored();
+        .set_fast(None);
     schema_builder.add_text_field("category", text_fieldtype);
     schema_builder.add_f64_field("stock", FAST);
     schema_builder.add_f64_field("price", FAST);
 
@@ -8,7 +8,6 @@
 // - create an index in a directory
 // - index a few documents into our index
 // - search for the best document matching a basic query
-// - retrieve the best document's original content.
 
 // ---
 // Importing yeehaw...
@@ -33,28 +32,10 @@ fn main() -> yeehaw::Result<()> {
     // First we need to define a schema ...
     let mut schema_builder = Schema::builder();
 
-    // Our first field is title.
-    // We want full-text search for it, and we also want
-    // to be able to retrieve the document after the search.
-    //
-    // `TEXT | STORED` is some syntactic sugar to describe
-    // that.
-    //
-    // `TEXT` means the field should be tokenized and indexed,
-    // along with its term frequency and term positions.
-    //
-    // `STORED` means that the field will also be saved
-    // in a compressed, row-oriented key-value store.
-    // This store is useful for reconstructing the
-    // documents that were selected during the search phase.
-    schema_builder.add_text_field("title", TEXT | STORED);
+    // Our first field is title. We want full-text search for it.
+    schema_builder.add_text_field("title", TEXT);
 
     // Our second field is body.
-    // We want full-text search for it, but we do not
-    // need to be able to retrieve it
-    // for our application.
-    //
-    // We can make our index lighter by omitting the `STORED` flag.
     schema_builder.add_text_field("body", TEXT);
 
     let schema = schema_builder.build();
@@ -210,15 +191,8 @@ fn main() -> yeehaw::Result<()> {
     // We can now perform our query.
     let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
 
-    // The actual documents still need to be
-    // retrieved from Yeehaw's store.
-    //
-    // Since the body field was not configured as stored,
-    // the document returned will only contain
-    // a title.
     for (_score, doc_address) in top_docs {
-        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
-        println!("{}", retrieved_doc.to_json(&schema));
+        println!("{doc_address:?}");
     }
 
     // We can also get an explanation to understand
 
@@ -26,9 +26,7 @@ fn main() -> yeehaw::Result<()> {
     let text_field_indexing = TextFieldIndexing::default()
         .set_tokenizer("ngram3")
         .set_index_option(IndexRecordOption::WithFreqsAndPositions);
-    let text_options = TextOptions::default()
-        .set_indexing_options(text_field_indexing)
-        .set_stored();
+    let text_options = TextOptions::default().set_indexing_options(text_field_indexing);
     let title = schema_builder.add_text_field("title", text_options);
 
     // Our second field is body.
@@ -103,8 +101,7 @@ fn main() -> yeehaw::Result<()> {
     let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
 
     for (_, doc_address) in top_docs {
-        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
-        println!("{}", retrieved_doc.to_json(&schema));
+        println!("{doc_address:?}");
     }
 
     Ok(())
 
@@ -4,19 +4,18 @@
 
 use yeehaw::collector::TopDocs;
 use yeehaw::query::QueryParser;
-use yeehaw::schema::{DateOptions, Document, Schema, Value, INDEXED, STORED, STRING};
+use yeehaw::schema::{DateOptions, Schema, INDEXED, STRING};
 use yeehaw::{Index, IndexWriter, TantivyDocument};
 
 fn main() -> yeehaw::Result<()> {
     // # Defining the schema
     let mut schema_builder = Schema::builder();
     let opts = DateOptions::from(INDEXED)
-        .set_stored()
         .set_fast()
         .set_precision(yeehaw::schema::DateTimePrecision::Seconds);
     // Add `occurred_at` date field type
-    let occurred_at = schema_builder.add_date_field("occurred_at", opts);
-    let event_type = schema_builder.add_text_field("event", STRING | STORED);
+    let _occurred_at = schema_builder.add_date_field("occurred_at", opts);
+    let event_type = schema_builder.add_text_field("event", STRING);
     let schema = schema_builder.build();
 
     // # Indexing documents
@@ -59,19 +58,6 @@ fn main() -> yeehaw::Result<()> {
             .parse_query(r#"occurred_at:[2022-06-22T12:58:00Z TO 2022-06-23T00:00:00Z}"#)?;
         let count_docs = searcher.search(&*query, &TopDocs::with_limit(4))?;
         assert_eq!(count_docs.len(), 1);
-        for (_score, doc_address) in count_docs {
-            let retrieved_doc = searcher.doc::<TantivyDocument>(doc_address)?;
-            assert!(retrieved_doc
-                .get_first(occurred_at)
-                .unwrap()
-                .as_value()
-                .as_datetime()
-                .is_some(),);
-            assert_eq!(
-                retrieved_doc.to_json(&schema),
-                r#"{"event":["comment"],"occurred_at":["2022-06-22T13:00:00.22Z"]}"#
-            );
-        }
     }
     Ok(())
 }
@@ -13,30 +13,12 @@ use yeehaw::query::TermQuery;
 use yeehaw::schema::*;
 use yeehaw::{doc, Index, IndexReader, IndexWriter};
 
-// A simple helper function to fetch a single document
-// given its id from our index.
-// It will be helpful to check our work.
-fn extract_doc_given_isbn(
-    reader: &IndexReader,
-    isbn_term: &Term,
-) -> yeehaw::Result<Option<TantivyDocument>> {
+// Helper to check whether a document with the given ISBN exists.
+fn exists_doc_with_isbn(reader: &IndexReader, isbn_term: &Term) -> yeehaw::Result<bool> {
     let searcher = reader.searcher();
-
-    // This is the simplest query you can think of.
-    // It matches all of the documents containing a specific term.
-    //
-    // The second argument is here to tell we don't care about decoding positions,
-    // or term frequencies.
     let term_query = TermQuery::new(isbn_term.clone(), IndexRecordOption::Basic);
     let top_docs = searcher.search(&term_query, &TopDocs::with_limit(1))?;
-
-    if let Some((_score, doc_address)) = top_docs.first() {
-        let doc = searcher.doc(*doc_address)?;
-        Ok(Some(doc))
-    } else {
-        // no doc matching this ID.
-        Ok(None)
-    }
+    Ok(top_docs.first().is_some())
 }
 
 fn main() -> yeehaw::Result<()> {
@@ -61,10 +43,8 @@ fn main() -> yeehaw::Result<()> {
     // use the `STRING` shortcut. `STRING` stands for indexed (without term frequency or positions)
     // and untokenized.
     //
-    // Because we also want to be able to see this `id` in our returned documents,
-    // we also mark the field as stored.
-    let isbn = schema_builder.add_text_field("isbn", STRING | STORED);
-    let title = schema_builder.add_text_field("title", TEXT | STORED);
+    let isbn = schema_builder.add_text_field("isbn", STRING);
+    let title = schema_builder.add_text_field("title", TEXT);
     let schema = schema_builder.build();
 
     let index = Index::create_in_ram(schema.clone());
@@ -92,11 +72,7 @@ fn main() -> yeehaw::Result<()> {
     let frankenstein_isbn = Term::from_field_text(isbn, "978-9176370711");
 
     // Oops our frankenstein doc seems misspelled
-    let frankenstein_doc_misspelled = extract_doc_given_isbn(&reader, &frankenstein_isbn)?.unwrap();
-    assert_eq!(
-        frankenstein_doc_misspelled.to_json(&schema),
-        r#"{"isbn":["978-9176370711"],"title":["Frankentein"]}"#,
-    );
+    assert!(exists_doc_with_isbn(&reader, &frankenstein_isbn)?);
 
     // # Update = Delete + Insert
     //
@@ -106,8 +82,7 @@ fn main() -> yeehaw::Result<()> {
     // and reinsert the document.
     //
     // This can be complicated as it means you need to have access
-    // to the entire document. It is good practise to integrate yeehaw
-    // with a key value store for this reason.
+    // to the entire document.
     //
     // To remove one of the document, we just call `delete_term`
     // on its id.
@@ -134,11 +109,7 @@ fn main() -> yeehaw::Result<()> {
     reader.reload()?;
 
     // No more typo!
-    let frankenstein_new_doc = extract_doc_given_isbn(&reader, &frankenstein_isbn)?.unwrap();
-    assert_eq!(
-        frankenstein_new_doc.to_json(&schema),
-        r#"{"isbn":["978-9176370711"],"title":["Frankenstein"]}"#,
-    );
+    assert!(exists_doc_with_isbn(&reader, &frankenstein_isbn)?);
 
     Ok(())
 }