refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol by willemcdejongh · Pull Request #6726 · agno-agi/agno

willemcdejongh · 2026-02-26T10:07:14Z

Summary

Introduces ManagedKnowledgeBackend, a @runtime_checkable Protocol for backends that manage their own indexing pipeline (bypassing Agno's default chunk-embed-store flow)
Replaces all 12 hardcoded __class__.__name__ == "LightRag" checks with polymorphic isinstance detection
Creates LightRagBackend as the first implementation, delegating all HTTP communication to a clean standalone class
Updates LightRag VectorDb to be a backwards-compatible wrapper that delegates to LightRagBackend

Type of change

Refactoring (no functional changes, no API changes)

Changes

New Files

libs/agno/agno/knowledge/backend/__init__.py — exports ManagedKnowledgeBackend
libs/agno/agno/knowledge/backend/managed.py — @runtime_checkable Protocol with ingest_file, ingest_text, query, delete_content + async variants
libs/agno/agno/knowledge/backend/lightrag.py — LightRagBackend implementation migrating HTTP logic from VectorDb class
libs/agno/tests/unit/knowledge/test_managed_backend.py — protocol detection, search routing, delete routing, pipeline ingestion tests

Modified Files

libs/agno/agno/knowledge/knowledge.py — adds _detect_managed_backend(), routes search/asearch and remove_content_by_id/aremove_content_by_id through managed backend when detected
libs/agno/agno/knowledge/pipeline/ingestion.py — replaces 10 class-name checks with managed_backend field routing, adds _ingest_managed/_aingest_managed (~120 lines), deletes process_lightrag_content/aprocess_lightrag_content (~310 lines)
libs/agno/agno/vectordb/lightrag/lightrag.py — delegates to internal LightRagBackend, exposes protocol methods for isinstance detection
libs/agno/agno/knowledge/__init__.py — exports ManagedKnowledgeBackend
libs/agno/tests/unit/knowledge/test_knowledge_topic_loading.py — updates tests to use managed_backend instead of LightRag class name

Latest: ProcessingResult status polling fix

Adds ProcessingResult schema and processing_id field for two-stage ID tracking
Adds get_status/aget_status to the ExternalKnowledgeProvider protocol
Fixes content stuck in "processing" status by implementing proper status polling
Adds processing_id column to all DB schemas (SQLite, Postgres, MySQL, SingleStore)

Test plan

Generated with Claude Code

Extract ~3500-line Knowledge class into three internal components: - ContentStore: content CRUD against the contents database - ReaderRegistry: reader management, lazy loading, and selection - IngestionPipeline: content loading from paths, URLs, text, and topics All public API preserved through forwarding methods. Loader mixin compatibility maintained via forwarding internal methods. Tests updated to mock at component level where needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Convert the mixin-based loader pattern to composition. Each loader now receives a knowledge reference instead of relying on MRO callbacks. RemoteKnowledge is replaced by RemoteLoader which composes loader instances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix load_from_topics variable shadowing: content param was overwritten in loop body, losing original metadata/reader after first iteration - Fix URL validation missing return: invalid URLs fell through to parsed_url.path causing UnboundLocalError - Restore strip_agno_metadata in insert path: reserved _agno key was no longer stripped from user-provided metadata - Restore basename matching in should_include_file: patterns like *.go now match against both full path and filename - Restore merge_user_metadata and strip_agno_metadata in content_store update methods: metadata merging preserved _agno sub-keys, and vector_db only receives user-defined fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The original code unconditionally set linked_to on every document. The decomposition accidentally made it conditional on isolate_vector_search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Loaders now receive the IngestionPipeline directly instead of a Knowledge reference, calling content_store, reader_registry, and vector_db methods without going through ~24 forwarding methods on Knowledge. This eliminates the circular callback pattern (Knowledge -> RemoteLoader -> Loaders -> Knowledge) and makes the data flow explicit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…kend protocol Introduces a @runtime_checkable Protocol that any backend can implement to bypass the default chunk-embed-store pipeline. LightRAG is the first implementation. All 12 hardcoded `__class__.__name__ == "LightRag"` checks are replaced with `isinstance(vdb, ManagedKnowledgeBackend)` detection. - Add ManagedKnowledgeBackend protocol (knowledge/backend/managed.py) - Add LightRagBackend implementation (knowledge/backend/lightrag.py) - Update LightRag VectorDb to delegate to LightRagBackend - Replace 10 ingestion pipeline checks with managed_backend routing - Replace 2 Knowledge class checks for search/delete routing - Delete ~310 lines of duplicated LightRAG processing code - Add comprehensive test suite for protocol detection and routing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolves conflicts by keeping phase3's managed backend additions while carrying forward staging fixes (async remove_vectors methods, pipeline typing, URL source metadata tracking). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 'backend' name was ambiguous. Renamed to 'managed_backend' to clearly indicate this module contains the ManagedKnowledgeBackend protocol and implementations (e.g. LightRagBackend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…g VectorDb wrapper - Rename managed_backend/ folder to external_provider/ - Rename ManagedKnowledgeBackend protocol to ExternalKnowledgeProvider - Rename managed.py to protocol.py - Add explicit external_provider field on Knowledge (replaces auto-detection) - Delete LightRag VectorDb wrapper (libs/agno/agno/vectordb/lightrag/lightrag.py) - Update vectordb/lightrag/__init__.py to re-export LightRagBackend from external_provider - Update all tests, cookbooks, and imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename local variables, log messages, comments, and docstrings from backend terminology to provider terminology for consistency with the external_provider naming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Consistent with the external_provider naming convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add cookbook/07_knowledge/05_integrations/external_providers/01_lightrag.py demonstrating LightRagProvider usage with Knowledge. Update integrations README with new External Providers section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ders External provider content was stuck in "processing" status because the ExternalKnowledgeProvider protocol lacked get_status/aget_status methods. This adds a ProcessingResult dataclass, a processing_id field for two-stage ID tracking (track_id -> document_id), and proper status polling via the protocol's new get_status/aget_status methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The processing_id field was defined in schemas and KnowledgeRow but was missing from the field mappings in upsert_knowledge_content() across all SQL-based DB implementations. This caused the field to be silently dropped on write, so status polling could never find the processing_id to resolve external provider status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The processing_id is now read from the _agno metadata dict (which is already persisted as part of the metadata JSON column) instead of a dedicated DB column. This avoids schema changes across all DB backends while still allowing status polling to find the processing_id. Removes the processing_id field from Content, KnowledgeRow, all DB schemas, and content_store. Status resolution now uses get_agno_metadata(content.metadata, "processing_id") to retrieve the polling ID that was already being saved during ingestion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

LightRAG returns documents in the track_status response even while still processing them. Check status_summary.processing and status_summary.pending counts before declaring COMPLETED, and only return COMPLETED when status_summary.completed > 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

willemcdejongh and others added 6 commits February 23, 2026 10:14

fix: always set linked_to metadata in prepare_documents_for_insert

09b1af9

The original code unconditionally set linked_to on every document. The decomposition accidentally made it conditional on isolate_vector_search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

willemcdejongh requested a review from a team as a code owner February 26, 2026 10:07

willemcdejongh mentioned this pull request Feb 26, 2026

feat: add KnowledgeCatalog and BackupStore for agent document access #6727

Open

7 tasks

willemcdejongh changed the base branch from willemcdejongh/knowledge-refactor-v2 to willemcdejongh/knowledge-staging March 4, 2026 08:51

willemcdejongh and others added 7 commits March 4, 2026 10:54

refactor: replace all remaining backend references with provider

802726c

Rename local variables, log messages, comments, and docstrings from backend terminology to provider terminology for consistency with the external_provider naming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: rename LightRagBackend class to LightRagProvider

fbc1c83

Consistent with the external_provider naming convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

willemcdejongh mentioned this pull request Mar 5, 2026

fix: add ProcessingResult schema and status polling to external providers #6877

Closed

9 tasks

willemcdejongh changed the title ~~refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol~~ feat: external knowledge providers with ProcessingResult status polling Mar 5, 2026

willemcdejongh changed the title ~~feat: external knowledge providers with ProcessingResult status polling~~ refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol Mar 5, 2026

willemcdejongh and others added 3 commits March 5, 2026 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol#6726

refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol#6726
willemcdejongh wants to merge 16 commits intowillemcdejongh/knowledge-stagingfrom
willemcdejongh/knowledge-phase3-managed-backend

willemcdejongh commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willemcdejongh commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Changes

New Files

Modified Files

Latest: ProcessingResult status polling fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willemcdejongh commented Feb 26, 2026 •

edited

Loading