You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: incremental Meilisearch indexing with HF Hub ID tracker (#759)
* feat: incremental Meilisearch indexing with HF Hub ID tracker
- Add `--incremental` flag to `populate-search-engine` command: only
embeds new/changed chunks and deletes stale ones, uploading directly
to the main index (no temp-index swap needed).
- Introduce `src/doc_builder/embeddings_tracker.py` to persist the set
of indexed document IDs as JSON in the HF Hub dataset
`hf-doc-build/doc-builder-embeddings-tracker`.
- Add `migrations/export_meili_ids_to_hf.py` as a one-time bootstrap
script to seed the tracker from the existing Meilisearch index.
- Add `get_all_document_ids` and `delete_documents_from_db` helpers to
`meilisearch_helper.py`.
- Replace all hardcoded `https://edge.meilisearch.com` references with a
`--meilisearch_url` CLI argument (mirrors the existing `--meilisearch_key`
pattern) across all commands and migration scripts.
- Update the CI workflow to use `--incremental` and supply the new
`MEILISEARCH_URL` and `HF_TOKEN` secrets.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* check
* Apply suggestion from @mishig25
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
0 commit comments