Version 0.3.0 released #167
xhluca
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Breaking Changes
scipyhas been removed frominstall_requiresinsetup.py. The library now uses a pure NumPy-based CSC matrix builder by default. If you need scipy's CSC builder, install it separately and passcsc_backend="scipy"toBM25(), or install viapip install bm25s[indexing].from bm25s import selectionis nowfrom bm25s import selection as selection_npinternally. If you were importingselectiondirectly frombm25s, update your imports.New Features
High-Level API (
bm25s.high_level)A new simplified 1-line indexing and 1-line search API:
bm25.load()supports CSV, JSON, JSONL, and TXT files with automatic format detection.bm25.index()handles tokenization (with stemming + stopword removal) and indexing in one call.BM25Search.search()returns ranked results with document text, scores, and IDs.Command-Line Interface (
bm25CLI)A new terminal CLI via the
bm25console script entry point:bm25 index <file>— Index documents from CSV, TXT, JSON, or JSONL files.-oto specify output directory,-cto specify text column,-uto save to user directory (~/.bm25s/indices/).bm25 search -i <index> "query"— Search an existing index.-kfor top-k,-sto save results as JSON,-ufor user directory with interactive index picker.pip install bm25s[cli]for Rich-based UI, falls back to plain text).MCP Server (
bm25s.mcp)A built-in Model Context Protocol server to expose BM25 indices as tools for LLMs:
bm25 mcp launch --index-dir <path>— Launch an MCP server withretrieveandget_infotools.pip install bm25s[mcp].Numba Compilation & Auto-Compile
compile()method onBM25for explicit JIT compilation of both the scorer and CSC builder.auto_compile=Trueparameter onBM25.__init__()— automatically compiles Numba JIT functions on initialization.warmup_numba_scorer()andwarmup_numba_csc()methods to pre-trigger JIT compilation with dummy data.activate_numba_csc()method — applies Numba JIT to the CSC matrix builder for faster indexing.Pure NumPy CSC Matrix Construction
csc_backendparameter onBM25(): choose"numpy"(default),"scipy", or"auto"._np_csc_python()— Pure NumPy implementation using packed-index argsort._np_csc_jit_ready()— Numba-compilable implementation using counting sort (linear time).Parameter Overrides on Load
BM25.load()now acceptsoverride_params={}and**kwargsto override saved parameters at load time (e.g., changeauto_compile,backend, etc.).Improvements
_faketqdmfix: The fallback tqdm replacement now properly handles being called with no positional arguments (returnsNoneinstead of raising).__init__,scoring,tokenization,hf,beir,corpus) now respect theDISABLE_TQDMenvironment variable uniformly._compute_relevance_from_scoresnow wrapsdtypewithnp.dtype()for compatibility with Numba JIT.selection_jit is Nonechecks with a properNUMBA_AVAILABLEboolean flag.activate_numba_scorer()now respects theNUMBA_DISABLE_JITenvironment variable.New Install Extras
mcpmcpclirichindexingscipyCI/CD
test-numbaandtest-high-levelCI jobs with proper thread-safety env vars (OMP_NUM_THREADS=1, etc.).coverageand report percentage.dev*branches in addition tomain.claude.yml(Claude Code GitHub Action for issue/PR interaction) andclaude-code-review.yml(automated PR code review).New Test Coverage
tests/core/test_core_coverage.py— 447 lines of comprehensive core module tests.tests/core/test_corpus.py,test_hf_utils.py,test_init_utils.py,test_json_functions.py,test_scoring.py,test_selection.py,test_tokenization_extended.py— Extended unit tests for core modules.tests/high_level/test_high_level.py— 121 lines testing the high-level API.tests/high_level/test_terminal.py— 647 lines testing the CLI terminal commands.tests/data/dummy.csv,dummy.jsonl,dummy.txt.New Examples
examples/mcp/create_index.py— Create a test index for the MCP server.examples/mcp/verify_server.py— Verify MCP server functionality.examples/simple_load.py— Demonstrate the high-level load/index/search workflow.Documentation
Stats
Beta Was this translation helpful? Give feedback.
All reactions