OgbujiPT 0.10.0 Phase 1: Foundation & Rearchitecture#93
Conversation
Major breaking changes for the transformation into a focused LLMOps knowledge bank library. ## Removed (907 lines) - Entire prompting module (basic.py, model_style.py) - obsolete with modern chat templates - word_loom.py - TOML template system no longer needed - Prompting test suite (4 test files) - 3 demo files showcasing removed features ## Reorganized Module Structure - `pylib/embedding/` → `pylib/store/postgres/` (pgvector implementations) - `pylib/embedding/qdrant.py` → `pylib/store/qdrant/collection.py` - `pylib/memoization/pgmemo.py` → `pylib/store/postgres/pgmemo.py` - `pylib/llm_wrapper.py` → `pylib/llm/wrapper.py` - `pylib/text_helper.py` → `pylib/text/splitter.py` - `pylib/html_helper.py` → `pylib/text/html.py` - `test/embedding/` → `test/store/` ## New Module Structure Created directory structure for 0.10.0 features: - `pylib/memory/` - KB abstractions & unified API - `pylib/store/` - Storage backends (organized by backend type) - `pylib/retrieval/` - Retrieval strategies - `pylib/ingestion/` - Data pipelines - `pylib/maintenance/` - KB health & pruning - `pylib/observability/` - Logging, tracing, metrics - `pylib/mcp/` - Model Context Protocol ## New Foundation Created base KB abstractions (memory/base.py, memory/metadata.py): - Protocol-based interfaces (PEP 544) for flexibility - SearchResult, KBBackend, SearchStrategy protocols - ItemMetadata with RBAC support - Metadata filter builders (functional approach) ## Dependencies Added to pyproject.toml: - onya (GraphRAG) - chonkie (document chunking) - rank-bm25 (sparse retrieval) - structlog (structured logging) - tenacity (retry logic) - httpx (async HTTP) - mcp (Model Context Protocol) ## Import Updates Updated 400+ import statements across: - pylib/ modules (cross-references) - demo/ files (6 files) - test/ files (5 files + fixtures) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove circular imports in pgvector.py (DataDB, MessageDB now imported directly) - Update test fixture paths: test/embedding → test/store - Update qdrant test mock path to new module structure - All 11 non-database tests passing Database tests (15) skip gracefully when Postgres/Qdrant unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ood starting point.
Implement InMemoryDataDB and InMemoryMessageDB as drop-in replacements for PostgreSQL-backed stores. Tests now run instantly without external dependencies, following the "own your I/O boundaries" principle. - In-memory stores available to users for prototyping/embedded use - All 14 pgvector tests pass without PostgreSQL (0.52s vs multi-second setup) - Integration test markers added for optional PostgreSQL testing - Abstracted terminology (setup/cleanup vs create_table/drop_table) Helps fix CI test failures requiring PostgreSQL setup.
|
Ended up on a bit of an extended detour today. CI tests were failing because they required a running PostgreSQL server with pgvector extension: 14 tests were blocked by this dependency, requiring complex CI setup (DB services, connection management, etc.). I could dip into the well of more and more mocking, but this can become brittle and unwieldy, especially trying to mock database operations directly. I always like the reasoning from the Hynek article "'Don’t Mock What You Don’t Own' in 5 Minutes". The better approach is "own your I/O boundaries"—create your own abstraction and provide alternative implementations rather than always mocking third-party libraries. SolutionI'd always wanted to have production-ready in-memory vector stores as a lightweight option for end users in prototyping, demos, and embedded. Might as well get to that, with the dual use of embodying fast, dependency-free tests of the OgbujiPT store components. Changes1. In-Memory Vector Store Implementation (
|
Introduce two new demo scripts: `chat_with_memory.py` for simulating conversations with in-memory message storage and semantic search, and `simple_search_demo.py` for demonstrating vector search capabilities without database setup. Additionally, a README.md file is added to provide an overview of the in-memory vector store demos, including prerequisites and usage patterns. - `chat_with_memory.py`: Simulates a conversation, showcases message storage, retrieval, and semantic search. - `simple_search_demo.py`: Demonstrates basic vector search with filtering and metadata. - `README.md`: Overview of demos, installation instructions, and comparison with PostgreSQL-based solutions. These additions enhance the usability and accessibility of the in-memory vector store for prototyping and learning purposes.
First major step toward transforming OgbujiPT into a general-purpose LLMOps knowledge bank system, as outlined in discussion #92. This phase focuses on establishing a solid foundation through code reorganization and introducing core retrieval capabilities.
For a quick intro to the new changes, a good start is the
demo/pg-hybriddir.Major Changes
Code Reorg
llm_wrapper.py→llm/wrapper.pyembedding/→store/postgres/(pgvector modules)embedding/qdrant.py→store/qdrant/collection.pytext_helper.py→text/splitter.pyhtml_helper.py→text/html.pypylib/retrieval/- Search strategies (dense, sparse, hybrid)pylib/memory/- Knowledge base interfaces and metadatapylib/store/- Storage backends (postgres, qdrant)pylib/llm/- LLM wrapper functionalitypylib/text/- Text processing utilitiesNew Retrieval Capabilities
retrieval/sparse.py): BM25 implementation for keyword-based searchretrieval/dense.py): Wrapper for embedding-based semantic searchretrieval/hybrid.py): Reciprocal Rank Fusion (RRF) combining multiple strategiesMemory/Knowledge Base Foundation
KBBackendprotocol (memory/base.py): Protocol-based interface for knowledge base backends (vector stores, graph DBs, etc.)SearchStrategyprotocol: Interface for pluggable search strategiesSearchResultdataclass: Unified result format across all backendsmemory/metadata.py): Foundation for enriched item metadataPostgreSQL Enhancements
store/postgres/pgvector_sparse.py): BM25-compatible sparse vector storage using pgvectordemo/pg-hybrid/): Complete examples showing dense + sparse retrievalRemoved Deprecated Code
pylib/prompting/- Old prompting utilities (replaced by direct LLM wrapper usage)pylib/word_loom.py- Deprecated prompt loading systempylib/memoization/- Moved tostore/postgres/pgmemo.pyUpdated Imports
All imports across the codebase have been updated to reflect the new structure. This includes:
chat_web_selects.py,chat_doc_folder.py, etc.)Testing
demo/pg-hybrid/Documentation & Examples
demo/pg-hybrid/README.md: Comprehensive guide to hybrid searchdemo/pg-hybrid/hybrid_search.ipynb: Interactive Jupyter notebook tutorialdemo/pg-hybrid/chat_with_hybrid_kb.py: Full conversational RAG example with hybrid searchdemo/pg-hybrid/hybrid_search_demo.py: Standalone hybrid search demonstrationMigration Notes
For users upgrading:
Import paths have changed:
ogbujipt.llm_wrapper→ogbujipt.llm.wrapperogbujipt.embedding.*→ogbujipt.store.postgres.*orogbujipt.store.qdrant.*ogbujipt.text_helper→ogbujipt.text.splitterogbujipt.html_helper→ogbujipt.text.htmlNew capabilities available:
ogbujipt.retrieval.hybrid.HybridSearchfor combining dense + sparse searchogbujipt.retrieval.sparse.BM25Searchfor keyword-based retrievalKBBackendprotocol for custom storage backendsNext Steps (Future PRs)
This foundation enables future work on:
Checklist