refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol#6726
Open
willemcdejongh wants to merge 16 commits intowillemcdejongh/knowledge-stagingfrom
Open
Conversation
Extract ~3500-line Knowledge class into three internal components: - ContentStore: content CRUD against the contents database - ReaderRegistry: reader management, lazy loading, and selection - IngestionPipeline: content loading from paths, URLs, text, and topics All public API preserved through forwarding methods. Loader mixin compatibility maintained via forwarding internal methods. Tests updated to mock at component level where needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert the mixin-based loader pattern to composition. Each loader now receives a knowledge reference instead of relying on MRO callbacks. RemoteKnowledge is replaced by RemoteLoader which composes loader instances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix load_from_topics variable shadowing: content param was overwritten in loop body, losing original metadata/reader after first iteration - Fix URL validation missing return: invalid URLs fell through to parsed_url.path causing UnboundLocalError - Restore strip_agno_metadata in insert path: reserved _agno key was no longer stripped from user-provided metadata - Restore basename matching in should_include_file: patterns like *.go now match against both full path and filename - Restore merge_user_metadata and strip_agno_metadata in content_store update methods: metadata merging preserved _agno sub-keys, and vector_db only receives user-defined fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original code unconditionally set linked_to on every document. The decomposition accidentally made it conditional on isolate_vector_search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Loaders now receive the IngestionPipeline directly instead of a Knowledge reference, calling content_store, reader_registry, and vector_db methods without going through ~24 forwarding methods on Knowledge. This eliminates the circular callback pattern (Knowledge -> RemoteLoader -> Loaders -> Knowledge) and makes the data flow explicit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend protocol Introduces a @runtime_checkable Protocol that any backend can implement to bypass the default chunk-embed-store pipeline. LightRAG is the first implementation. All 12 hardcoded `__class__.__name__ == "LightRag"` checks are replaced with `isinstance(vdb, ManagedKnowledgeBackend)` detection. - Add ManagedKnowledgeBackend protocol (knowledge/backend/managed.py) - Add LightRagBackend implementation (knowledge/backend/lightrag.py) - Update LightRag VectorDb to delegate to LightRagBackend - Replace 10 ingestion pipeline checks with managed_backend routing - Replace 2 Knowledge class checks for search/delete routing - Delete ~310 lines of duplicated LightRAG processing code - Add comprehensive test suite for protocol detection and routing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 tasks
Resolves conflicts by keeping phase3's managed backend additions while carrying forward staging fixes (async remove_vectors methods, pipeline typing, URL source metadata tracking). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 'backend' name was ambiguous. Renamed to 'managed_backend' to clearly indicate this module contains the ManagedKnowledgeBackend protocol and implementations (e.g. LightRagBackend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g VectorDb wrapper - Rename managed_backend/ folder to external_provider/ - Rename ManagedKnowledgeBackend protocol to ExternalKnowledgeProvider - Rename managed.py to protocol.py - Add explicit external_provider field on Knowledge (replaces auto-detection) - Delete LightRag VectorDb wrapper (libs/agno/agno/vectordb/lightrag/lightrag.py) - Update vectordb/lightrag/__init__.py to re-export LightRagBackend from external_provider - Update all tests, cookbooks, and imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename local variables, log messages, comments, and docstrings from backend terminology to provider terminology for consistency with the external_provider naming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consistent with the external_provider naming convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cookbook/07_knowledge/05_integrations/external_providers/01_lightrag.py demonstrating LightRagProvider usage with Knowledge. Update integrations README with new External Providers section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ders External provider content was stuck in "processing" status because the ExternalKnowledgeProvider protocol lacked get_status/aget_status methods. This adds a ProcessingResult dataclass, a processing_id field for two-stage ID tracking (track_id -> document_id), and proper status polling via the protocol's new get_status/aget_status methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9 tasks
The processing_id field was defined in schemas and KnowledgeRow but was missing from the field mappings in upsert_knowledge_content() across all SQL-based DB implementations. This caused the field to be silently dropped on write, so status polling could never find the processing_id to resolve external provider status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The processing_id is now read from the _agno metadata dict (which is already persisted as part of the metadata JSON column) instead of a dedicated DB column. This avoids schema changes across all DB backends while still allowing status polling to find the processing_id. Removes the processing_id field from Content, KnowledgeRow, all DB schemas, and content_store. Status resolution now uses get_agno_metadata(content.metadata, "processing_id") to retrieve the polling ID that was already being saved during ingestion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LightRAG returns documents in the track_status response even while still processing them. Check status_summary.processing and status_summary.pending counts before declaring COMPLETED, and only return COMPLETED when status_summary.completed > 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ManagedKnowledgeBackend, a@runtime_checkableProtocol for backends that manage their own indexing pipeline (bypassing Agno's default chunk-embed-store flow)__class__.__name__ == "LightRag"checks with polymorphicisinstancedetectionLightRagBackendas the first implementation, delegating all HTTP communication to a clean standalone classLightRagVectorDb to be a backwards-compatible wrapper that delegates toLightRagBackendType of change
Changes
New Files
libs/agno/agno/knowledge/backend/__init__.py— exportsManagedKnowledgeBackendlibs/agno/agno/knowledge/backend/managed.py—@runtime_checkableProtocol withingest_file,ingest_text,query,delete_content+ async variantslibs/agno/agno/knowledge/backend/lightrag.py—LightRagBackendimplementation migrating HTTP logic from VectorDb classlibs/agno/tests/unit/knowledge/test_managed_backend.py— protocol detection, search routing, delete routing, pipeline ingestion testsModified Files
libs/agno/agno/knowledge/knowledge.py— adds_detect_managed_backend(), routessearch/asearchandremove_content_by_id/aremove_content_by_idthrough managed backend when detectedlibs/agno/agno/knowledge/pipeline/ingestion.py— replaces 10 class-name checks withmanaged_backendfield routing, adds_ingest_managed/_aingest_managed(~120 lines), deletesprocess_lightrag_content/aprocess_lightrag_content(~310 lines)libs/agno/agno/vectordb/lightrag/lightrag.py— delegates to internalLightRagBackend, exposes protocol methods forisinstancedetectionlibs/agno/agno/knowledge/__init__.py— exportsManagedKnowledgeBackendlibs/agno/tests/unit/knowledge/test_knowledge_topic_loading.py— updates tests to use managed_backend instead of LightRag class nameLatest: ProcessingResult status polling fix
ProcessingResultschema andprocessing_idfield for two-stage ID trackingget_status/aget_statusto theExternalKnowledgeProviderprotocolprocessing_idcolumn to all DB schemas (SQLite, Postgres, MySQL, SingleStore)Test plan
isinstance(LightRagBackend(), ManagedKnowledgeBackend)returnsTrueisinstance(MockVectorDb(), ManagedKnowledgeBackend)returnsFalseisinstance(LightRag(), ManagedKnowledgeBackend)returnsTrue__class__.__name__ == "LightRag"remaining in codebaseGenerated with Claude Code