Enhance TUI indexing dashboard with OIDC support and metrics by Szowesgad · Pull Request #1 · Loctree/rust-memex

Szowesgad · 2026-04-27T19:38:48Z

This pull request restructures the project into a Cargo workspace, introduces a new memex-contracts crate for shared data types, and expands configuration and authentication support. It also updates copyright and contact information throughout the project.

Project structure and workspace modernization:

Converts the project to a Cargo workspace, splitting the codebase into crates/rust-memex (main logic) and crates/memex-contracts (shared contracts/types). Updates Cargo.toml files accordingly and moves dependencies to workspace-level management. [1] [2] [3]

New shared contracts crate:

Adds the memex-contracts crate with shared types for audit, progress, stats, and timeline data, making it easier to share data structures between the backend and future frontends. [1] [2] [3] [4] [5] [6]

Configuration and authentication enhancements:

Expands the configuration file format to support dashboard-only OIDC authentication, additional auth modes, and query token support. Updates config loading logic to handle these new options. (crates/rust-memex/src/bin/cli/config.rsR88-R118, Fd9f55daL653R653)

Documentation and copyright updates:

Updates copyright and contact information to reference vetcoders.io instead of loct.io across documentation, issue templates, and security policy. [1] [2] [3] [4]

Cleanup and refactoring:

Removes obsolete local plan/report files and unnecessary #[allow(dead_code)] attributes from configuration and definition modules. [1] [2] [3] [4] [5] [6]

Let me know if you want to discuss the new workspace structure, how to use the shared contracts, or the new authentication options!

Track discovered/resumed, embed metrics, rollback Surface operator-visible file counts by adding discovered_files and resumed_files through PipelineConfig, PipelineSnapshot, observer/renderer and CLI progress output so scheduling vs discovered vs resumed is shown. Add embedder_ms and tokens_estimated to IndexResult and propagate timing/token estimates from onion/flat chunking and embedding paths to accumulate per-file metrics. Improve progress bar and status line to use discovered/resumed counts, introduce PipelineEventConsumerConfig, and a helper to decide storage dedup disabling for resumed checkpoints. Add robust rollback of partially stored file chunks on storage batch failures (rollback_stored_file_chunks) and tests for rollback and snapshot behavior. Also emit periodic stats ticks from the scheduler loop and a small merge banner text tweak. chore: record marble gate verification marbles: verify gates and record clean round

Introduce OpenID Connect and crypto support and tidy repository metadata and docs. Changes include: - Add openidconnect, argon2 and subtle dependencies (Cargo.toml) and update Cargo.lock with related crypto crates. - Add auth scaffolding (src/auth/mod.rs) and various code updates across src/* to integrate auth/ODIC usage. - Add dashboard OIDC example configuration to README to document optional dashboard-only OIDC flow. - Update GitHub issue templates and SECURITY.md contact/notice from loct.io → vetcoders.io. - Clean up repository: adjust .gitignore entries, remove .vibecrafted artifact symlinks, and reorganize/move many docs into language-specific folders (docs/en, docs/pl). - Add tower dev dependency and other tooling tweaks. Rationale: enable OIDC-based dashboard auth and stronger token/password handling (Argon2 + constant-time comparisons), remove user-specific artifacts, and standardize documentation and repo metadata. chore: record green marbles gate pass chore: record marbles gate verification chore: record green quality gates chore: record green quality gates chore: record green marbles gate pass chore: record green marbles gate pass Fix e2e config file race

…config Net: +268/-801 LOC across 18 files (-533 net). Zero clippy warnings on `cargo clippy --workspace --all-targets -- -D warnings`. Zero new `#[allow]` annotations — every suppression removed had its root cause fixed, not re-silenced. ## Track C — NamespaceAccessManager → AuthManager (27 deprecated warnings → 0) - `src/security/mod.rs`: 361→38 LOC. Deleted `NamespaceAccessManager`, `TokenStore`, `NamespaceToken` + their tests. Kept `NamespaceSecurityConfig` (still used by `ServerConfig`; CLI/file-config surface unchanged). - `src/mcp_protocol.rs`: field `access_manager: Arc<NamespaceAccessManager>` → `auth_manager: Arc<AuthManager>`. New `verify_tool_access()` preserves legacy MCP-tool "open namespace unless token covers it" semantics. Rewrote 4 `namespace_*` tool handlers against `AuthManager`. `namespace_create_token` now revokes-then-creates to keep legacy idempotence (HashMap::insert-style overwrite). - `src/mcp_runtime.rs`: builds `AuthManager` from `NamespaceSecurityConfig`; defaults to `~/.rmcp-servers/rust-memex/tokens.json`. - `src/http/mod.rs`: diagnostic loop + test scaffold rewritten against `AuthManager`; tempdir-scoped `tokens.json`. - `src/lib.rs`: dropped `NamespaceAccessManager` from `pub use security::{..}`. - `src/tests/transport_parity.rs`: both test helpers build `AuthManager`. Public API changes: - `McpCore::access_manager() -> Option<&NamespaceAccessManager>` replaced by `McpCore::auth_manager() -> &AuthManager` (always-present, emptiness implicit in "no tokens"). - `McpCore::new(...)` 6th arg: `Arc<NamespaceAccessManager>` → `Arc<AuthManager>`. ## CLI cleanup (25 dead_code + 3 singletons → 0) - Deleted duplicate `parse_features` / `discover_config` / `load_file_config` / `load_or_discover_config` / `CONFIG_SEARCH_PATHS` copies from 5 CLI modules (data/formatting/inspection/maintenance/search). Canonical lives in `src/bin/cli/config.rs`. `parse_features` had zero callers repo-wide — removed entirely. Triage confirmed: artifact from "marble: decompose release binary into compilable modules" (7285198), not unfinished feature. - `src/bin/cli/dispatch.rs::open_browser()`: restructured so `#[cfg(target_os)]` branches yield `Ok(())` and the catch-all `Err(...)` is cfg-gated to platforms with no other arm (makes it reachable instead of dead). - `src/bin/cli/maintenance.rs::KeepStrategy`: replaced shadowed `from_str` method with idiomatic `From<&str>` (infallible conversion, avoids forcing callers into pointless `.unwrap()`). - `src/tui/indexer/scheduler.rs::start_indexing`: 8 args → `IndexingJob` struct (6 config fields) + 2 runtime handles (`sink`, `control_rx`). `IndexingJob` re-exported from `tui::indexer` and `tui`. ## Known pre-existing test failure (out of scope) `search::hybrid::tests::project_filter_matches_project_and_project_id` fails on main@ac98c61 baseline (verified by both agents). Rebrand-era test asserts `project_id: "Loctree"` matches filter `"vetcoders"`. Separate cleanup. Delegated to 2 parallel Opus subagents (Track C migration + quick cleanup), supervised integration, version-bumped & verified locally. 𝚅𝚒𝚋𝚎𝚌𝚛𝚊𝚏𝚝𝚎𝚍. with AI Agents by VetCoders (c)2024-2026 The LibraxisAI Team

…port/migrate)

…stats/timeline)

Remove deprecated rag_index_text and rag_search tools and associated handling across MCP protocol and tests. Simplify hybrid search result payload to a unified memory-style format and remove the SearchShape enum and Uuid dependency from the MCP core. Update create_core_only_slice to emit both Outer and Core slices for short content so default (outer) searches can find small documents. Adjust tests and tool count assertions to reflect the removed tools.

…port/migrate)

Introduce StorageManager::delete_documents to delete many documents in a namespace in batches. The method returns early for empty input or missing table, splits IDs into 500-id chunks to bound SQL length, escapes single quotes in IDs, counts rows matching the namespace+id IN(...) predicate, issues a single DELETE per chunk, and accumulates the total deleted. This avoids per-document table scans when deleting large batches.

…1+P2+P3+P4+P5a+backfill) Implement the full set of fixes from `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md`: - P0 (content_hash bug): Schema bumped to v4. ChromaDocument now carries per-chunk `content_hash` AND a separate `source_hash` for the source document text. Pipeline computes a true SHA256 per slice, while source_hash stays constant across all four onion layers from one source. RAGPipeline + flat path updated to write both fields. Pre-v4 rows still read fine (graceful degradation). - P0 backfill: New `diagnostics::backfill_chunk_and_source_hashes` plus `POST /api/backfill-hashes` (dry-run by default, behind diagnostic approval gate). Promotes legacy `content_hash` (which was actually the source-text hash) into `source_hash`, then recomputes the per-chunk hash from the stored chunk text. Idempotent. - P1 (stoplist + boilerplate filter): keyword extractor now filters PL+EN top-100 stopwords, Claude Code/Codex animation gerundy (Brewing/Frosting/Grooving/...), CLI control tokens, markdown structural words, and path-like fragment tokens. - P2 (section-aware chunker): pipeline detects markdown transcripts (filename hints + content-shape) and tags metadata with `format: "markdown_transcript"`, which routes the content through the existing structured (turn-aware) slicing path. parse_markdown_heading now also recognizes `## user`/`## assistant`/`## tool`/`## system` Claude Code/Codex headings. - P3 (LLM-synthesized outer): scaffolded behind the new `ollama-outer` cargo feature. `OuterSynthesis::Llm` enum variant + `synthesize_outer_via_ollama` async stub that documents the contract for future wire-up. - P4 (source-hash dedup pre-index): reader stage now checks `has_source_hash(namespace, hash)` before chunking + embedding, with a fallback to `has_content_hash` for pre-v4 namespaces. Skips re-embedding duplicate sources entirely. - P5a (bump max tokens): TokenConfig::DEFAULT_MAX_TOKENS raised from 8192 to 35000. qwen3-embedding has a verified 40960-token context window; 6k-token margin keeps prompt overhead/language drift safe. Hardening: - Storage `has_source_hash`, `source_hash` Arrow column, `from_onion_slice_with_hashes` / `new_flat_with_hashes` constructors. - Recovery merge propagates `source_hash`. - All call sites of `ChromaDocument` initializers updated. - 273 unit tests pass; clippy --all-features --all-targets -D warnings clean. Authored-By: claude <agents@vetcoders.io>

…plist Round 002 audit of `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md` found one functional gap and one missing test guard. Functional gap (P4 grouping): After the round 001 P0 fix every chunk has a unique per-chunk `content_hash`. The `dedup` CLI / `/api/dedup` endpoint still grouped exclusively on `content_hash`, so on a freshly-rebuilt onion namespace they reported zero duplicate groups — the spec acceptance criterion ("dedup -n kb:transcripts po fixie pokazuje liczby zgodne z rzeczywistymi duplikatami") was unsatisfiable. - Add `DedupGroupBy` enum: `SourceHashLayer` (default), `SourceHash`, `ContentHash` (legacy opt-in). Default is post-v4: groups chunks by `(source_hash, layer)` so `__dupe__` + `__clean__` variants of one transcript collapse to one chunk per layer while distinct sources survive untouched. - `deduplicate_documents` takes the new strategy and builds the bucket key per-strategy. Docs without the strategy-required field land in `docs_without_hash` instead of being silently misgrouped. - `DedupGroup` carries both the legacy `content_hash` field (now the grouping-key value, kept for wire compat) and a clearer `group_key` field for new clients. `DedupResult` reports `group_by`. - CLI: `rust-memex dedup --group-by source-hash-layer|source-hash|content-hash` with the new default. Group-key suffix is shown in CLI output so operators can verify per-layer onion preservation. Updated help text. - HTTP: `/api/dedup?group_by=...` (also accepts `group-by` / `groupBy` aliases). Backward-compatible: missing param resolves to the v4 default. Test guards: - `dedup_grouping_tests::source_hash_layer_grouping_preserves_onion_structure` seeds 12 chunks (source A indexed 2× across 4 layers + source B once) and asserts exactly 4 duplicate groups, 4 removed, 8 unique survivors, and that every group key encodes its layer. - `dedup_grouping_tests::content_hash_grouping_finds_zero_duplicates_on_fresh_onion` locks the symptom that motivated the fix: legacy grouping must report zero duplicates on a fresh onion. - `dedup_grouping_tests::dedup_group_by_parses_known_aliases_and_falls_back_to_default` pins the parser surface (kebab + snake aliases, default fallback). - `extract_keywords_drops_spec_boilerplate_even_when_dominant`: P1 acceptance regression guard — synthetic boilerplate-heavy text must NOT surface `assistant`, `user`, `transcript`, `nie`, `jest`, `Brewing`, `bypass`, etc. into top keywords, while real signal tokens still survive (guards against an over-aggressive filter). Hardening: - Cleaner CLI report (`Without group key:` instead of generic "Without hash") points operators at backfill when source_hash is missing. - 277 unit tests pass (273 prior + 4 new); clippy --all-features --all-targets -D warnings clean. Authored-By: claude <agents@vetcoders.io>

Round 003 audit of `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md` found that round 001 had only scaffolded P3 — `synthesize_outer_via_ollama` was a `cfg(feature = "ollama-outer")`-gated stub that always returned `None`, AND `create_onion_slices` never inspected `OuterSynthesis::Llm`. A caller setting the `Llm` variant silently received the keyword outer with no warning, no log, no observable difference. That is a present falsehood at the public API surface. This round compresses it. Spec compliance (P3 — LLM-synthesized outer): - Real Ollama integration: `synthesize_outer_via_ollama` now POSTs `{endpoint}/api/generate` with `{model, prompt, stream:false}`, parses the `response` field, and returns `Some(text)` on success. Failures (network, non-2xx, malformed JSON, empty completion, whitespace-only response) all surface as `None` so the keyword outer takes over silently. Prompt template follows the spec verbatim: Polish 1-3 sentence summary, focuses on goal/decision/outcome, instructs the model to skip Brewing…/Frosting…/Grooving…/tokens·/ shifttab/⎿/⎯ UI noise. - Connect-phase timeout (5s) + total request timeout (60s) so a misconfigured or moved Ollama endpoint fails fast instead of burning 60s per doc on every indexing run. - Input is capped to OLLAMA_OUTER_INPUT_CHAR_BUDGET (8000 chars) with an explicit truncation marker so the model sees the boundary instead of receiving silently-clipped context. Slicer integration: - `create_onion_slices_async` and `create_onion_slices_fast_async` read `config.outer_synthesis`; when set to `Llm` they pre-fetch the summary, then run the standard slicer and post-process via `replace_outer_slice` which swaps the outer content, regenerates the outer ID, and patches every parent's `children_ids` so the onion hierarchy stays internally consistent (no dangling references). - The structured (markdown_transcript) path benefits automatically — the override is applied by layer, not by which slicer produced the skeleton, so kb:transcripts (the actual spec target) gets LLM outers exactly the same way unstructured docs do. Pipeline wire-up: - `PipelineConfig` gains `outer_synthesis: OuterSynthesis` (default `Keyword`, fully backward-compatible) and threads it through `stage_chunk_content` and `create_chunks_from_content`. The chunker stage is now async at the slicing boundary so the LLM call lives inside the bounded mpsc backpressure envelope. Cargo hygiene: - Removed the historical `ollama-outer` feature flag. It was a no-op (reqwest is an unconditional workspace dep), and keeping it implied the LLM path was opt-in at compile time when in reality the only meaningful gate is the runtime `OuterSynthesis` config knob. Test guards (12 new tests, lib total: 290 pass): - `synthesize_outer_via_ollama_posts_correct_payload_and_parses_response` spins up an axum mock on 127.0.0.1:0, asserts the captured body has the right model, stream=false, prompt with Polish directive AND the Brewing/UI-noise skip directive, and that the parsed `response` field is what the helper returns. Locks the wire contract. - `..._truncates_oversized_input` proves the 8k-char budget kicks in with the explicit truncation marker. - `..._returns_none_on_non_2xx` / `..._on_malformed_payload` / `..._on_empty_response_field` / `..._on_empty_input` / `..._on_unreachable_endpoint` (RFC 5737 TEST-NET-3 address) lock the silent-fallback contract on every documented failure mode. - `create_onion_slices_async_replaces_outer_with_llm_summary` and `..._fast_async_replaces_outer_with_llm_summary` prove the override actually rewires the hierarchy (parent's children_ids point at the new outer id). `structured_conversation_outer_is_replaced_by_llm_summary` proves the same for markdown_transcript metadata — the actual spec target. - `..._falls_back_to_keyword_when_ollama_unreachable` is the pipeline-must-not-stall contract: with TEST-NET-3 endpoint, the async slicer completes inside 15s and yields the bracketed keyword- style outer. - `replace_outer_slice_is_a_noop_when_summary_is_empty` and `replace_outer_slice_rewrites_outer_id_and_parent_links` lock the helper's invariants directly. Tests use only existing workspace deps (axum + tokio). No new crates. Hardening: - 290 unit tests pass (277 prior + 13 new). - `cargo clippy -p rust-memex --all-targets --all-features -- -D warnings` clean. - `cargo check -p rust-memex --bin rust-memex --all-features` clean. Spec status after this round: P0+P0backfill+P1+P2+P3+P4+P5a all implemented end-to-end with regression guards. P5b (chunked core for docs >35k tokens) deferred per spec (gated on operator measurement). Authored-By: claude <agents@vetcoders.io>

…command Round 004 audit of `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md` against `c833d10` (round 003) found the operator-facing falsehood that round 003 itself flagged in its boundary notes: round 003 wired `OuterSynthesis::Llm` end-to-end through the lib API + pipeline, with 13 tests guarding it, but the `rust-memex index` CLI had no flag to opt into the LLM path. The spec's procedure A — `rmcp-memex index ... --slice-mode onion --preprocess --pipeline ...` — is the canonical kb:transcripts rebuild path, and after round 003 it still produced the keyword outer regardless of intent. Code-shaped P3 was complete; the operator could not reach it. Round 004 closes that surface. Spec compliance (P3 — operator surface): - New `--outer-synthesis <keyword|llm>` flag on the `Index` command (default `keyword`, fully backward-compatible). Spec verbatim: "keyword (default): TF-based keyword extraction. No I/O." vs "llm: Synthesize the outer layer via a local Ollama model." - New `--ollama-model <str>` (default `qwen2.5:3b` per spec P3 baseline) and `--ollama-endpoint <str>` (default `http://localhost:11434`) flags. The defaults intentionally point at the spec's recommended small model + the conventional local Ollama daemon so an operator who has already followed the spec setup can run the LLM rebuild with one extra flag. - `parse_outer_synthesis_flag` helper translates the trio (variant, model, endpoint) into a typed `OuterSynthesis`. Empty model / empty endpoint on the `llm` variant are rejected up-front so the helper does not pass blank strings into the Ollama HTTP path (which would silently fall back to the keyword outer per the round-003 contract — that fallback is for runtime failure, not for operator misconfig). Guard against introducing a NEW silent falsehood: - The legacy non-pipeline `run_batch_index` path uses the synchronous `create_onion_slices` with `OnionSliceConfig::default()` and never sees `OuterSynthesis`. Adding the flag without `--pipeline` would have produced the exact silent-keyword-fallback that round 003 exorcised at the lib API. `run_batch_index` now rejects `--outer-synthesis llm` without `--pipeline` up-front, with an error message that names the flag, explains the silent-fallback risk, and tells the operator to re-run with `--pipeline`. - `--outer-synthesis llm` with `--slice-mode flat` is also rejected — the flat slicer has no outer layer to synthesize, so the combination is meaningless. Failing fast is honest; silently downgrading to flat without the LLM call would be a lie. Pipeline wire-up: - `BatchIndexConfig` gains `outer_synthesis: OuterSynthesis`. - The `--pipeline` branch of `run_batch_index` threads the value into `PipelineConfig::outer_synthesis` (the field round 003 added) and emits an operator-visible breadcrumb on the LLM path so the run log records which outer the rebuild actually used. - `OuterSynthesis` is re-exported from the `rust_memex` crate root so the CLI module can name the type without reaching into `rust_memex::rag::*`. Test deltas (9 new bin tests; lib unchanged at 290 pass): - `index_command_outer_synthesis_defaults_to_keyword` — backward-compat guarantee: the default value stays `keyword` and Ollama defaults populate even on the keyword path so the CLI surface is consistent. - `index_command_accepts_outer_synthesis_llm_with_overrides` — full LLM invocation roundtrips through clap with custom model + endpoint. - `index_command_rejects_unknown_outer_synthesis` — clap value_parser rejects unknown variants up-front (no surprise enum drift). - `parse_outer_synthesis_flag_keyword_ignores_ollama_overrides` — Keyword path ignores model/endpoint so a stale config doesn't poison the keyword default. - `parse_outer_synthesis_flag_llm_carries_model_and_endpoint` — LLM path roundtrips both fields end-to-end. - `parse_outer_synthesis_flag_rejects_empty_model_or_endpoint` — empty strings are operator misconfig, NOT a silent-fallback trigger. - `parse_outer_synthesis_flag_rejects_unknown_variant` — error message must name the offending flag so the operator sees the lie loud. - `run_batch_index_rejects_llm_without_pipeline_so_no_silent_keyword_downgrade` — the central anti-falsehood test: error must point operator at `--pipeline` AND explain the silent-fallback risk. - `run_batch_index_rejects_llm_with_flat_slice_mode` — the second guard: flat has no outer layer to synthesize. Hardening: - `cargo test -p rust-memex --lib --all-features`: 290 pass, 0 fail (unchanged from round 003). - `cargo test -p rust-memex --bins --all-features`: 28 pass, 0 fail (+9 new tests). - `cargo clippy -p rust-memex --all-targets --all-features -- -D warnings`: clean. Living-tree note: the diff also picks up cosmetic rustfmt-style reflows (closure indentation, format!() string concatenation, multi-line `if let` flattening) in `rag/mod.rs`, `rag/pipeline.rs`, `diagnostics.rs`, and `storage/mod.rs`. These came from a formatter-side hook that ran before this round and are preserved verbatim — no semantic changes. Spec status after round 004: P0+P0backfill+P1+P2+P3+P3-CLI+P4+P5a all implemented end-to-end with regression guards. P5b deferred per spec (gated on operator measurement of >5% corpus exceeding 35k tokens). Authored-By: claude <agents@vetcoders.io>

Round 005 audit of `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md` against `2cd6648` (round 004) found three remaining operator-facing falsehoods. The lib API and HTTP endpoint for P0 backfill were both wired in round 001 (`diagnostics::backfill_chunk_and_source_hashes` + `POST /api/backfill-hashes`), but the spec's exact procedure ("Uruchomić jako `rust-memex backfill-hashes --namespace <ns>`") had no shell entrypoint — the only way to run it was `curl` against a running server. Two P4 acceptance criteria were also unsatisfied: the per-source skip line lived at `debug!` so a default operator run produced no evidence dedup actually fired, and the spec's `--allow-duplicates` escape hatch for force reindex was missing entirely. Same falsehood pattern round 004 closed for P3 (lib done, CLI missing). Round 005 closes it for P0-backfill + P4. Spec compliance (P0 backfill — operator surface): - New `Commands::BackfillHashes { namespace, dry_run, json }` in `cli/definition.rs` matching the dedup/audit shape: `-n` for one namespace (omit for every namespace), `--dry-run true` default for safety (mirrors `dedup`), `--json` for machine-readable output. Help text quotes the spec verbatim and explains the v4 contract (per-chunk `content_hash` vs source-text `source_hash`). - Dispatcher in `cli/dispatch.rs` resolves config and calls the new `run_backfill_hashes` runner. - `run_backfill_hashes` in `cli/data.rs` calls `diagnostics::backfill_chunk_and_source_hashes` (lib API from round 001) and renders the same boxed summary style as `audit` / `purge-quality`. Reports per-namespace totals: documents inspected, content_hash backfilled, source_hash backfilled, already consistent, skipped (no embedding). Dry-run output explicitly tells the operator how many rows would be rewritten and the exact re-run command. Spec compliance (P4 — `--allow-duplicates` escape hatch): - New `--allow-duplicates` flag on `Index` (defaults to false). Spec verbatim: "CLI flag `--allow-duplicates` dla edge cases (np. force reindex)". - Dispatcher applies the precedence at run time: if `--allow-duplicates` is set, the effective `dedup` becomes `false` regardless of the `--dedup` flag, with a one-line operator-visible note when both are set so the run log records dedup was disabled intentionally rather than because the user forgot a flag. - The flag itself does not flip `--dedup` at parse time, by design — this keeps the parsed surface honest to what the operator typed and lets the dispatcher own the semantics. Spec compliance (P4 — pipeline skip log visibility): - Promoted `debug!("Skipping duplicate source: ...")` to `info!("Skip duplicate source: {} (source_hash {})")` in `rag/pipeline.rs::stage_read_files`. Spec acceptance criterion: "Pipeline log: każdy skipped duplicate source jedna linia z source path + source_hash". A default operator run (no RUST_LOG override) now produces one line per skipped source. - `--allow-duplicates` is the documented escape hatch when an operator actually wants to re-embed, called out in the flag's doc comment so the index command's `--help` surfaces both halves of the contract. Test deltas (4 new bin tests; 32 total pass, +4 vs round 004; lib unchanged at 290): - `index_command_allow_duplicates_defaults_to_false` — backward-compat guarantee: the safe path (dedup-on) stays the default. - `index_command_accepts_allow_duplicates_flag` — flag parses and does NOT flip `--dedup` at parse time (dispatcher owns the precedence). - `backfill_hashes_command_defaults_to_dry_run_all_namespaces` — safety default: no `-n` means all namespaces, dry-run is true. - `backfill_hashes_command_accepts_namespace_and_live_run` — full override path: `-n kb:transcripts --dry-run false --json` round-trips. Hardening: - `cargo check -p rust-memex --all-features`: clean. - `cargo clippy -p rust-memex --all-targets --all-features -- -D warnings`: clean. - `cargo test -p rust-memex --bins --all-features`: 32 pass, 0 fail (+4 new tests vs round 004). - `cargo test -p rust-memex --lib --all-features`: 290 pass, 0 fail (unchanged from round 004 — no lib regression). Spec status after round 005: P0 + P0-backfill (lib + HTTP + CLI) + P1 + P2 + P3 (lib + CLI) + P4 (lib + grouping + skip-log + escape hatch) + P5a all reachable from the operator's shell with regression guards. P5b deferred per spec (gated on operator measurement of >5% corpus exceeding 35k tokens). Authored-By: claude <agents@vetcoders.io>

gemini-code-assist

Code Review

This pull request transitions the project to a workspace structure, introduces a new memex-contracts crate for shared data structures, and implements a robust multi-token authentication system with Argon2id hashing and OIDC support. It significantly enhances the indexing pipeline with resume capabilities, adaptive throughput control, and LLM-based outer layer synthesis via Ollama. Additionally, it adds comprehensive diagnostic tools for database maintenance, including quality audits, deduplication based on source hashes, and write repair. Feedback focuses on improving efficiency by batching database operations during backfills and rollbacks, reusing HTTP clients, and optimizing hot paths in keyword extraction. There is also a critical note regarding an inverted hierarchy in the simplified onion slicing logic and a suggestion to replace fragile manual path parsing for ACL checks with a more robust routing-based approach.

gemini-code-assist · 2026-04-27T19:41:46Z

+        OnionSlice {
+            id: outer_id.clone(),
+            layer: SliceLayer::Outer,
+            content: content.to_string(),
+            parent_id: Some(core_id.clone()),
+            children_ids: vec![],
+            keywords: outer_keywords,
+        },
+        OnionSlice {
+            id: core_id,
+            layer: SliceLayer::Core,
+            content: content.to_string(),
+            parent_id: None,
+            children_ids: vec![outer_id],
+            keywords: core_keywords,
+        },
+    ]


In create_core_only_slice, the hierarchy is inverted compared to the standard onion model. The Core slice is set as the parent of the Outer slice, whereas in a typical onion structure, the Outer slice (summary) should be the root. This inversion may cause unexpected behavior when navigating the hierarchy via expand or parent operations in the TUI or API.

vec![ OnionSlice { id: outer_id.clone(), layer: SliceLayer::Outer, content: content.to_string(), parent_id: None, children_ids: vec![core_id.clone()], keywords: outer_keywords, }, OnionSlice { id: core_id, layer: SliceLayer::Core, content: content.to_string(), parent_id: Some(outer_id), children_ids: vec![], keywords: core_keywords, }, ]

gemini-code-assist · 2026-04-27T19:41:46Z

+            storage.delete_document(&doc.namespace, &doc.id).await?;
+            storage.add_to_store(vec![new_doc]).await?;


Performing a delete_document and add_to_store for every single document during backfill is extremely inefficient. In LanceDB, this will create a new table version for every row updated, leading to significant storage overhead and slow performance. These operations should be batched per page (e.g., collect all new_docs and their IDs, then perform one batch delete and one batch add per page).

gemini-code-assist · 2026-04-27T19:41:46Z

+fn extract_namespace_from_path(path: &str) -> Option<String> {
+    let segments: Vec<&str> = path.trim_matches('/').split('/').collect();
+    match segments.as_slice() {
+        // /api/browse/{ns}
+        ["api", "browse", ns] => Some(ns.to_string()),
+        // /ns/{namespace}
+        ["ns", ns] => Some(ns.to_string()),
+        // /expand/{ns}/{id}, /parent/{ns}/{id}, /get/{ns}/{id}, /delete/{ns}/{id}
+        [verb, ns, _id] if matches!(*verb, "expand" | "parent" | "get" | "delete") => {
+            Some(ns.to_string())
+        }
+        _ => None,
+    }
+}


Manual path parsing to extract namespaces for ACL checks is fragile and highly coupled to the specific URL structure. If routes are added or modified (e.g., adding a version prefix or changing a verb), this function may fail to correctly identify the namespace, potentially leading to security bypasses or unexpected denials. Consider using a more robust approach, such as extracting parameters from the matched route provided by the web framework's router.

gemini-code-assist · 2026-04-27T19:41:47Z

+    let client = match reqwest::Client::builder()
+        .connect_timeout(std::time::Duration::from_secs(
+            OLLAMA_OUTER_CONNECT_TIMEOUT_SECS,
+        ))
+        .timeout(std::time::Duration::from_secs(OLLAMA_OUTER_TIMEOUT_SECS))
+        .build()


Creating a new reqwest::Client for every document processed in the pipeline is inefficient. It prevents the reuse of connection pools and increases overhead for every HTTP call to Ollama. The client should be instantiated once (e.g., as a field in RAGPipeline) and reused across all synthesis calls.

gemini-code-assist · 2026-04-27T19:41:47Z

+        ['a', 'e', 'i', 'o', 'u', 'y'].into_iter().collect();
+    let mut max_vowel_run = 0;
+    let mut current_run = 0;
+    for ch in token.chars() {
+        if vowels.contains(&ch.to_ascii_lowercase()) {
+            current_run += 1;


Creating a HashSet of vowels on every call to looks_like_path_fragment is inefficient, as this function is called for every token during keyword extraction. This check should be optimized using a simple matches! macro or a constant bitmask to avoid repeated allocations.

let mut max_vowel_run = 0; let mut current_run = 0; for ch in token.chars() { if matches!(ch.to_ascii_lowercase(), 'a' | 'e' | 'i' | 'o' | 'u' | 'y') {

gemini-code-assist · 2026-04-27T19:41:47Z

+    for (namespace, id) in stored_doc_refs.iter().rev() {
+        match storage.delete_document(namespace, id).await {
+            Ok(count) => deleted += count,
+            Err(err) => {
+                failures += 1;
+                warn!(
+                    "Failed to roll back partially stored chunk {}/{}: {}",
+                    namespace, id, err
+                );
+            }
+        }
+    }


The rollback logic currently deletes chunks one by one, which is inefficient. Since a batch delete_documents method was added to StorageManager in this PR, it should be used here to perform the rollback in a single operation per namespace.

Suggested change

for (namespace, id) in stored_doc_refs.iter().rev() {

match storage.delete_document(namespace, id).await {

Ok(count) => deleted += count,

Err(err) => {

failures += 1;

warn!(

"Failed to roll back partially stored chunk {}/{}: {}",

namespace, id, err

);

}

}

}

let mut by_ns: std::collections::HashMap<String, Vec<&str>> = std::collections::HashMap::new();

for (ns, id) in stored_doc_refs {

by_ns.entry(ns.clone()).or_default().push(id.as_str());

}

for (ns, ids) in by_ns {

match storage.delete_documents(&ns, &ids).await {

Ok(count) => deleted += count,

Err(err) => {

failures += ids.len();

warn!("Failed to roll back partially stored chunks for namespace {}: {}", ns, err);

}

}

}

Round 005 of marbles-ancestor lineage. Substrate (commit e296d41) ships the full spec P0-P5a stack but README still documented only the legacy `--no-dedup` flag. Operator installing v0.6.2 had no way to discover the new flags from the README, so the work was implemented-but-not-shipped. Compressing the surface so docs match runtime truth: - HTTP endpoints table now lists diagnostic & lifecycle handlers: /api/audit, /api/stats[/{ns}], /api/timeline, /api/purge-quality, /api/dedup, /api/backfill-hashes (gated by approval key + Bearer auth). - New "Deduplication & Hash Hygiene" section replaces the thin Exact-Match Deduplication blurb. Documents the three-layer surface: pre-index source dedup with `--allow-duplicates` escape hatch (P4), standalone post-index `dedup` command with `--group-by source-hash-layer` default that preserves onion structure (P4 grouping), and the `backfill-hashes` command that closes the P0 backfill gap for pre-v4 namespaces. - New "LLM-Synthesized Outer Layer (Spec P3)" section documents `--outer-synthesis llm` with `--ollama-model` / `--ollama-endpoint` overrides, the `--pipeline` requirement, and the silent-fallback semantics on Ollama failures. - Code Structure schema note bumped from v3 to v4 (source_hash + per-chunk content_hash) to match SCHEMA_VERSION = 4 in storage/mod.rs:33. No code changes. cargo test -p rust-memex --lib stays at 290/0; cargo clippy --lib --all-features stays clean. Authored-By: claude <agents@vetcoders.io>

Spec acceptance criterion for P2 (section-aware chunker) demands "Code blocks: 0% rozcięć w środku ` ``` `". Prior rounds 001-006 marked P2 as done because semantic-card extraction lands the right shape on well-formed transcripts, but `parse_markdown_transcript_blocks` (structured.rs:107) never tracked fence state. A user turn quoting an example transcript verbatim — common in Claude Code / Codex sessions where operators paste prior conversations into prompts — would split on the fenced `## assistant` / `## user` lines and emit phantom blocks mid-fence, breaking onion-slice integrity. Changes: - Add `is_fence_marker` helper that recognises both ``` and ~~~ fence delimiters (handles CommonMark info strings such as ```rust, ```bash). - Wire fence-state toggle into `parse_markdown_transcript_blocks`. Heading detection is suppressed while `in_fence` is true; a fence delimiter line is appended verbatim to the current block. Result: pseudo-headings inside fences stay glued to the parent role's content, satisfying the spec invariant. - Add three regression tests covering the spec criterion: * fence_marker_detects_backtick_and_tilde_openers (helper sanity) * parse_blocks_keeps_fenced_pseudo_headings_inside_user_turn (the canonical real-world failure case: example transcript inside a user prompt) * parse_blocks_keeps_fenced_pseudo_headings_inside_tilde_fence (symmetry test on the tilde fence form) Quality gates: cargo clippy --workspace --all-targets --all-features -- -D warnings clean; cargo test -p rust-memex --lib --all-features 293 passed, 0 failed (290 baseline + 3 new). Authored-By: claude <agents@vetcoders.io>

…ock group-by surface Updates `tests/http_diagnostic_endpoints.rs` so the dedup endpoint suite asserts the post-v4 `--group-by source-hash-layer` default that round 002 locked into runtime — and adds a regression guard for the legacy `?group-by=content-hash` opt-in escape hatch from spec P4. Background: - Round 002 (`a651079`) flipped the dedup default from `content-hash` to `source-hash-layer` so the onion structure (one chunk per layer per source) is preserved by default. - The integration test `dedup_endpoint_lists_duplicates_then_executes` kept asserting the legacy contract (`duplicate_groups == 1`, `groups[0].content_hash == "dup-hash"`) and was failing against HEAD ever since. - L3/L4/L5 + L1/L2 of the current ancestor pipeline ran `cargo test --lib` + `cargo test --bins` only; the integration suite was not in the gate set, so the failure shipped silently across multiple rounds. Spec target (`2026-04-27_kb-transcripts-onion-slicer-fix-spec.md`, P4): > "dedup CLI: nowy default `--group-by source-hash-layer` zachowuje onion > (1 chunk per layer per source), stary `--group-by content-hash` jako > opt-in dla edge cases". Changes: 1. `seed_documents` test fixture - Helper `doc_with_layer_and_hash` now takes `source_hash: Option<&str>` and routes through `ChromaDocument::new_flat_with_hashes`. - The two `dup-*` rows share both `content_hash` AND `source_hash` AND layer, so they collapse to one cluster under the post-v4 default (`<source_hash>|layer<N>`) AND under the legacy `content-hash` opt-in. - Added a pre-v4 row (`dup-pre-v4`) with `content_hash` populated but `source_hash = None`, so the suite covers the operator-visible `docs_without_hash` path under default grouping. 2. `dedup_endpoint_lists_duplicates_then_executes` - Asserts `result.group_by == "source-hash-layer"` (the post-v4 default surfaces back to the operator on the wire). - Asserts `result.docs_without_hash == 1` so the pre-v4 row is visible instead of being silently swept into a phantom cluster. - Asserts both the new `group_key` field AND the legacy `content_hash` field on the returned group; they now both carry the strategy-native key (`dup-source|layer1`) so older clients keep parsing without losing the new semantic. - Asserts kept/removed identities so a future regression in `KeepStrategy::Oldest` is caught. - Post-execute namespace count is 3 (kept duplicate + unique + pre-v4), not 2. 3. `dedup_endpoint_supports_legacy_content_hash_grouping` (new) - Locks the legacy opt-in: `?group-by=content-hash` flips the default, `result.group_by` echoes `"content-hash"` back, the pre-v4 row stops contributing to `docs_without_hash`, and the cluster is keyed by the per-chunk `content_hash` (`"dup-hash"`). Quality gates (HEAD = post-fix tree): - `cargo clippy --workspace --all-targets --all-features -- -D warnings`: clean. - `cargo test -p rust-memex --lib --all-features`: 293 pass / 0 fail. - `cargo test -p rust-memex --bins --all-features`: 32 pass / 0 fail. - `cargo test -p rust-memex --tests --all-features`: e2e_cli_folder_index: 2 pass; e2e_pipeline: 1 pass / 4 ignored; engine_integration: 15 pass; http_diagnostic_endpoints: 7 pass (5 baseline + 2 new); http_lifecycle_endpoints: 3 pass; http_recovery_endpoints: 3 pass; transport_parity: 9 ignored. Total: 31 active, 0 fail. Two consecutive full sweeps confirm. Boundary note (out of scope for this round): `rag::p3_llm_outer_tests::create_onion_slices_async_replaces_outer_with_llm_summary` is non-deterministic under the full `cargo test --tests` sweep (observed 1/3 fail on first verification, 2/3 pass thereafter; 8/8 pass solo). The keyword extractor's tie-break ordering depends on global hash-randomized state, not on this round's changes. Locked under the next round's surface. Authored-By: claude <agents@vetcoders.io>

… flake into deterministic tie-break Most dangerous present falsehood after L3 (commit add8565): the test `rag::p3_llm_outer_tests::create_onion_slices_async_replaces_outer_with_llm_summary` was failing roughly 1 in 10 runs with a `got ["resolved", "llm", "outer"]` mismatch against an assertion that expected at least one of `["streszczenie", "naprawy", "slicera"]`. Root cause is in `extract_keywords` (`crates/rust-memex/src/rag/mod.rs:1249`): the function collects token counts into a `HashMap`, drains them into a `Vec`, then `sort_by_key(count)` (stable). HashMap iteration order is randomized per process via `RandomState`, so the input order to the stable sort is itself non-deterministic. When several tokens share a count -- the canonical case for an LLM-synthesized outer where every meaning-bearing word appears exactly once -- `top-N` becomes a different N-subset on every run. Compress to a fundament fix instead of a test patch. The keyword extractor now tie-breaks alphabetically on the token, which makes `top-N` deterministic across processes for every caller (production indexing pipeline included), not just the one flaky test. Stable order also means the BM25 / outer-prefix cache stays warm across rebuilds, which is a quiet retrieval-quality win on top of the flake fix. Verification: - Pre-fix reproducer: 9/10 pass / 1/10 fail (`got ["resolved", "llm", "outer"]`). - Post-fix reproducer: 30/30 pass on the same target. Expected ~3 fails over 30 runs at the pre-fix rate, observed zero. - New regression test `extract_keywords_is_deterministic_on_count_ties` runs `extract_keywords` 51 times against an all-ties fixture and asserts byte equality with an explicit alphabetical baseline. Catches a future tie-break regression deterministically on CI. - Stale comment in `create_onion_slices_async_falls_back_to_keyword_when_ollama_unreachable` that documented the now-fixed HashMap instability replaced with an accurate description of the narrower contract that test still locks. Gates: - cargo clippy --workspace --all-targets --all-features -- -D warnings: clean. - cargo test -p rust-memex --lib --all-features: 294 pass / 0 fail (293 baseline + 1 new). - cargo test -p rust-memex --bins --all-features: 32 pass / 0 fail. - cargo test -p rust-memex --tests --all-features: 31 active pass / 0 fail (e2e + engine + http_diagnostic + http_lifecycle + http_recovery suites); 9 transport_parity tests remain ignored by design. Authored-By: claude <agents@vetcoders.io>

…tadata schema `HybridSearcher::dedup_by_content_hash` was lying about its semantics since the schema-v4 split (spec `2026-04-27_kb-transcripts-onion-slicer-fix-spec.md`, P0). Indexing now writes per-chunk SHA256 under `metadata.chunk_hash` and per-source SHA256 under `metadata.source_hash` (plus a deprecated `file_hash` alias). The legacy `metadata.content_hash` field is no longer written by either the pipeline path (`pipeline.rs::slices_to_chunks`) or the engine paths (`rag::index_with_onion_slicing_and_hash` etc.). Effect of the silent drift: - For schema-v4 chunks: `metadata.content_hash` is absent, every match fell into the "keep" branch, and the function deduplicated nothing. Same chunk surfaced through both vector and BM25 lanes survived twice in the result list. - For pre-v4 chunks (no backfill yet): `metadata.content_hash` held the *source* document hash for every onion layer. Reading it as a per-chunk key collapsed outer/middle/inner/core into a single result, masking the very onion structure operators rebuild for. Compression: - Rename to `dedup_by_chunk_hash`; key strictly off `metadata.chunk_hash`. - Drop the legacy `content_hash` read so we cannot accidentally collapse layers from pre-v4 namespaces. - Update the doc comment to describe the v4 contract and explain why legacy chunks are passed through. - Lock the contract with two new tests: - `dedup_by_chunk_hash_collapses_v4_chunk_duplicates_only` — same chunk surfaced twice collapses, distinct chunks survive, missing field is passed through. - `dedup_by_chunk_hash_ignores_legacy_content_hash_field` — pre-v4 chunks sharing the legacy source-shaped `content_hash` must NOT collapse, otherwise onion structure is silently flattened at search time. `cargo test -p rust-memex --lib`: 296 passed (was 294, +2 above). `cargo test --workspace --tests`: 359 passed total, 0 failed. `cargo clippy --workspace --all-targets -- -D warnings`: clean. Authored-By: claude <agents@vetcoders.io>

- Capture local cargo install bump to Cargo.lock and Cargo.toml as baseline before introducing aicx-parser path dependency. Authored-By: codex <agents@vetcoders.io>

…te source_hash Authored-By: codex <agents@vetcoders.io>

Authored-By: codex <agents@vetcoders.io>

- Add ChunkProvider implementations for aicx, onion, and flat chunking. - Route index/reindex/reprocess through explicit --chunker overrides or transcript-aware defaults. - Wire aicx-parser as a workspace path dependency and preserve onion/flat behavior for existing document flows. Authored-By: codex <agents@vetcoders.io>

Combines: - Track B: aicx-parser ChunkProvider integration (Phase 6) - Track C: storage layer fixes P0/P4/P5a (Phase 7) Authored-By: codex <agents@vetcoders.io>

End-state of aicx-parser extraction: - aicx-parser v0.1.0 as workspace path dep - ChunkProvider trait with aicx/onion/flat implementations - CLI flag --chunker on index/reindex/reprocess - Storage fixes: per-chunk content_hash, source-hash dedup pre-index, max_chunk_tokens=35000 Authored-By: codex <agents@vetcoders.io>

…eindex

Authored-By: codex <agents@vetcoders.io>

The post-rebrand commit (e70b19c) renamed the binary to rust-memex and set COMPAT_ALIASES=("rust_memex") at the top of install.sh, but the post-install info line was inverted to advertise rmcp_memex instead. That path never gets created — install_compat_aliases only symlinks the names listed in COMPAT_ALIASES — so users following the printed hint hit a missing file. - Restore the info message to point at $INSTALL_DIR/rust_memex so it matches the alias actually written by install_compat_aliases. Authored-By: claude <agents@vetcoders.io>

… deny_unknown Salvage commit for 4 codex cuts dispatched concurrently against silent-failure incident from 2026-04-21. Codex sandbox blocked `.git/index.lock` write so the operator (synthesis-brain Klaudiusz) commits the work codex authored. All gates verified green by codex (cargo fmt --check, cargo check --workspace, cargo clippy --workspace --all-targets -- -D warnings, focused integration tests). memex-001 migrate-schema (run_id owne-182100-*): - SchemaVersion enum + required_columns_for(target) helper. - StorageManager::migrate_lance_schema (LanceDB Table::add_columns). - New CLI subcommand: rust-memex migrate-schema --db-path <p> [--check-only]. - --check-only exits 1 when migration needed (CI mode). - Idempotent re-run. - tests/migrate_schema.rs covers pre-v4 missing source_hash, check-only fail, migration success, backfill-hashes --dry-run false post-migration, idempotent. memex-002 HTTP error propagation (run_id owne-182102-*): - Typed AppendError::SchemaMismatch at Lance write boundary in rag/mod.rs. - HTTP /upsert and /index return 412 Precondition Failed with structured body (error_kind, missing_columns, remediation). - ERROR-level structured logging with run-this-command remediation hint. - Write-path schema preflight before embedding (fail-fast on pre-v4 table). - tests/http_schema_mismatch.rs regression for pre-v4 upsert. memex-003 CLI strict + JSON summary (run_id owne-182104-*): - New shared module bin/cli/batch_policy.rs aggregating per-file outcomes. - --strict: exit 1 if any failure. - --max-failure-rate <FLOAT>: exit 1 if rate exceeded. - --json: compact final summary {indexed, failed, total, failure_rate, errors}. - Default behavior preserved (exit 0) but emits WARNING when failures > 0. - Applied to: index, reprocess, reindex, backfill-hashes. - tests/cli_index_strict.rs covers strict/json/threshold/default-warning modes. memex-004 TOML deny_unknown_fields + WARN on default db_path (run_id owne-182105-*): - #[serde(deny_unknown_fields)] on EmbeddingsConfig and DaemonConfig sections. - Shared resolve_db_path() helper used by both CLI dispatch and daemon path. - WARN at startup when configured db_path not found at top level (with hint about [embeddings] section quirk that caused 11-day silent failures incident). - Explicit max_batch_chars / max_batch_items fields. - tests/config_deny_unknown.rs 3 acceptance tests for unknown-field, default- warning, and missing-dir scenarios. Concurrent execution discipline observed: - All 4 cuts ran in parallel against the same working tree. - Codex agents adapted to in-flight edits (e.g. memex-002 noted "tree already contained uncommitted migrate-schema and batch-policy work, I adapted"). - memex-004 retried clippy after concurrent-edits window. Source plans: - /Users/polyversai/Libraxis/vc-runtime/vc-context-engine/cuts/memex-001-migrate-schema.md - /Users/polyversai/Libraxis/vc-runtime/vc-context-engine/cuts/memex-002-http-error-propagation.md - /Users/polyversai/Libraxis/vc-runtime/vc-context-engine/cuts/memex-003-cli-strict-flag.md - /Users/polyversai/Libraxis/vc-runtime/vc-context-engine/cuts/memex-004-toml-deny-unknown.md Authored-By: codex <agents@vetcoders.io>

Authored-By: codex

…ess spinner Makefile: replace `cp` with `cargo install --path` so macOS doesn't quarantine the binary and cargo tracks the installation. backfill-hashes: add 10s interval progress reporting with braille spinner, rate, and ETA instead of silent multi-hour runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BM25 index_path was hardcoded to ~/.rmcp-servers/rust-memex/bm25 via BM25Config::default(), while Lance db_path was overridden by --db-path CLI flag (e.g. rmcp-memex/lancedb). This caused hybrid search to read vector results from one directory and BM25 results from another. Fix: derive BM25 path as sibling of db_path in both ServerConfig (daemon) and CLI search commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Route new Lance writes into deterministic namespace-specific tables while preserving legacy mcp_documents reads during migration. Aggregate stats, namespace listing, reads, search, deletes, and maintenance across both storage shapes. Add keyword search fallback when the BM25 sidecar is empty so populated Lance namespaces do not report falsely empty keyword results. Cover both cuts with focused tests.

Root cause: test file was 43 chars, below the 50-char minimum document threshold in index_document_with_json_awareness. Combined with ChunkerKind::Onion overriding --slice-mode flat, the document was silently skipped instead of hitting the embedding server. Fix: larger test file (>50 chars) + explicit --chunker flat to prevent the Onion chunker from overriding the requested flat slice mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Keep the production embedding batch retry defaults intact, but allow the retry budget to be lowered with RUST_MEMEX_EMBED_BATCH_MAX_RETRIES and RUST_MEMEX_EMBED_BATCH_MAX_BACKOFF_SECS. The cli_index_strict failure-policy test now sets the retry budget to one attempt so cargo test exercises the intended error path without waiting through the full production backoff ladder.

Introduce a new HTTP context-pack feature and related improvements: add crates/rust-memex/src/http/context_pack.rs implementing /api/context-pack (builds markdown context packs, groups evidence into clusters, reports duplicate_count and sources). Switch CLI recall to use HybridSearcher (hybrid lexical/vector ranking) with BM25 config and add bm25_path_from_db helper. Pipeline/maintenance fixes: pass preprocessing config into pipeline mode, treat zero-chunk indexed files as failures (update tracking/checkpointing logic accordingly). Diagnostic/test tweaks: tighten path handling (expect UTF-8 temp paths) and small formatting fixes. Bump workspace version to 0.6.4, update README to document the new endpoint and response changes, and refresh dependencies in Cargo.lock.

…crates.io Adds explicit `version = "0.6.4"` next to the path dep on memex-contracts in crates/rust-memex/Cargo.toml. Required so cargo publish accepts the manifest (path-only deps without version are rejected at upload). Both crates now have a real crates.io presence as of this turn: - memex-contracts 0.6.4 published 2026-05-06 (first crates.io release) - rust-memex 0.6.4 published 2026-05-06 (first crates.io release under renamed identity; predecessor crate `rmcp-memex` last shipped 0.5.0 in April) Local dev continues to use the path; published consumers resolve memex-contracts 0.6.4 from crates.io. Authored-By: claude <agents@vetcoders.io>

+    // Atomic wrapper: `validated` comes from validate_read_path(), which
+    // canonicalizes the path and enforces the allowed-base policy.
+    // nosemgrep: rust.actix.path-traversal.tainted-path.tainted-path
+    let file = tokio::fs::File::open(&validated)


…sh rust-memex 0.6.5 Republishes rust-memex with the correct aicx-parser ^0.2 dep, fixing the dep-graph split that aicx 0.6.5 inherited from rust-memex 0.6.4 (which was published with aicx-parser 0.1.0 because the 0.2 bump happened in a later step of the same session). Changes: - Cargo.toml workspace.package.version: 0.6.4 → 0.6.5. - Cargo.toml workspace.dependencies and other tree refresh: aicx-parser bumped to "0.2", lancedb to "0.27", arrow to "57" (operator-side alignment with the new aicx-parser/lancedb major). - crates/rust-memex/Cargo.toml memex-contracts dep version: 0.6.4 → 0.6.5 (matches workspace bump cascade). - Cargo.lock regenerated for the new dep tree. Source edits in rag/mod.rs, storage/mod.rs, host_detection.rs and tests/* are operator-side WIP (tests/common/ untracked) and intentionally NOT included in this commit. They are part of the published rust-memex 0.6.5 tarball (cargo publish packaged the working tree) but stay as Living Tree changes for operator to commit on their own cadence. Verification: - cargo publish --dry-run --allow-dirty rust-memex: full workspace compile in 1m 10s against aicx-parser 0.2.0 + memex-contracts 0.6.5. - crates.io API confirms rust-memex 0.6.5 default_version with aicx-parser ^0.2 transitive. Authored-By: claude <agents@vetcoders.io>

Update crate manifests and tooling and adapt code/tests for upgraded dependencies. Regenerate Cargo.lock and adjust Cargo.toml/Makefile, add tests/common helper module, and modify rust-memex modules (rag, storage, host_detection) and related tests (e2e_pipeline, transport_parity) to accommodate API/behavior changes from the dependency upgrades.

Szowesgad added 16 commits April 19, 2026 01:50

feat(v0.6.2-A): cargo workspace + memex-contracts crate

721180f

feat(v0.6.2-C): lifecycle HTTP endpoints (reprocess/reindex/export/im…

01d0428

…port/migrate)

feat(v0.6.2-D): recovery HTTP endpoints + granular compact/cleanup/gc

b25867c

feat(v0.6.2-B): diagnostic HTTP endpoints (audit/purge-quality/dedup/…

26dd8bf

…stats/timeline)

feat(v0.6.2-C): lifecycle HTTP endpoints (reprocess/reindex/export/im…

3b2641b

…port/migrate)

feat(v0.6.2-A): cargo workspace + memex-contracts crate

942e7d0

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

Szowesgad added 13 commits April 27, 2026 23:17

Update structured.rs

c79d92b

chore: pre-integration cargo lockfile bump

f8e7db9

- Capture local cargo install bump to Cargo.lock and Cargo.toml as baseline before introducing aicx-parser path dependency. Authored-By: codex <agents@vetcoders.io>

[codex/aicx-parser-p7a] fix(storage): per-chunk content_hash + separa…

75e9b7c

…te source_hash Authored-By: codex <agents@vetcoders.io>

[codex/aicx-parser-p7b] feat(storage): source-hash dedup pre-index

5aec19a

Authored-By: codex <agents@vetcoders.io>

[codex/aicx-parser-p7c] fix(slicer): bump max_chunk_tokens to 35000

35550a0

Authored-By: codex <agents@vetcoders.io>

Merge storage-fixes-p0-p4-p5a into aicx-parser-integration

86ffeb0

Combines: - Track B: aicx-parser ChunkProvider integration (Phase 6) - Track C: storage layer fixes P0/P4/P5a (Phase 7) Authored-By: codex <agents@vetcoders.io>

Szowesgad and others added 13 commits April 28, 2026 23:23

[codex/aicx-parser-p8-migration] fix: upgrade v4 hash schema during r…

aef9de4

…eindex

[codex/aicx-parser-p10] fix(cargo): unblock pre-push portability check

2d25c72

Authored-By: codex <agents@vetcoders.io>

feat(http): /health reports schema status + last_successful_append_at

94d4173

feat(daemon): startup schema guard + --auto-migrate flag

04d2782

Authored-By: codex

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread crates/rust-memex/src/path_utils.rs

// Atomic wrapper: `validated` comes from validate_read_path(), which

// canonicalizes the path and enforces the allowed-base policy.

// nosemgrep: rust.actix.path-traversal.tainted-path.tainted-path

let file = tokio::fs::File::open(&validated)

Szowesgad added 2 commits May 8, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance TUI indexing dashboard with OIDC support and metrics#1

Enhance TUI indexing dashboard with OIDC support and metrics#1
Szowesgad wants to merge 44 commits intomainfrom
migration

Szowesgad commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		storage.delete_document(&doc.namespace, &doc.id).await?;
		storage.add_to_store(vec![new_doc]).await?;

Conversation

Szowesgad commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants