Skip to content

refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol#6726

Open
willemcdejongh wants to merge 16 commits intowillemcdejongh/knowledge-stagingfrom
willemcdejongh/knowledge-phase3-managed-backend
Open

refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol#6726
willemcdejongh wants to merge 16 commits intowillemcdejongh/knowledge-stagingfrom
willemcdejongh/knowledge-phase3-managed-backend

Conversation

@willemcdejongh
Copy link
Contributor

@willemcdejongh willemcdejongh commented Feb 26, 2026

Summary

  • Introduces ManagedKnowledgeBackend, a @runtime_checkable Protocol for backends that manage their own indexing pipeline (bypassing Agno's default chunk-embed-store flow)
  • Replaces all 12 hardcoded __class__.__name__ == "LightRag" checks with polymorphic isinstance detection
  • Creates LightRagBackend as the first implementation, delegating all HTTP communication to a clean standalone class
  • Updates LightRag VectorDb to be a backwards-compatible wrapper that delegates to LightRagBackend

Type of change

  • Refactoring (no functional changes, no API changes)

Changes

New Files

  • libs/agno/agno/knowledge/backend/__init__.py — exports ManagedKnowledgeBackend
  • libs/agno/agno/knowledge/backend/managed.py@runtime_checkable Protocol with ingest_file, ingest_text, query, delete_content + async variants
  • libs/agno/agno/knowledge/backend/lightrag.pyLightRagBackend implementation migrating HTTP logic from VectorDb class
  • libs/agno/tests/unit/knowledge/test_managed_backend.py — protocol detection, search routing, delete routing, pipeline ingestion tests

Modified Files

  • libs/agno/agno/knowledge/knowledge.py — adds _detect_managed_backend(), routes search/asearch and remove_content_by_id/aremove_content_by_id through managed backend when detected
  • libs/agno/agno/knowledge/pipeline/ingestion.py — replaces 10 class-name checks with managed_backend field routing, adds _ingest_managed/_aingest_managed (~120 lines), deletes process_lightrag_content/aprocess_lightrag_content (~310 lines)
  • libs/agno/agno/vectordb/lightrag/lightrag.py — delegates to internal LightRagBackend, exposes protocol methods for isinstance detection
  • libs/agno/agno/knowledge/__init__.py — exports ManagedKnowledgeBackend
  • libs/agno/tests/unit/knowledge/test_knowledge_topic_loading.py — updates tests to use managed_backend instead of LightRag class name

Latest: ProcessingResult status polling fix

  • Adds ProcessingResult schema and processing_id field for two-stage ID tracking
  • Adds get_status/aget_status to the ExternalKnowledgeProvider protocol
  • Fixes content stuck in "processing" status by implementing proper status polling
  • Adds processing_id column to all DB schemas (SQLite, Postgres, MySQL, SingleStore)

Test plan

  • isinstance(LightRagBackend(), ManagedKnowledgeBackend) returns True
  • isinstance(MockVectorDb(), ManagedKnowledgeBackend) returns False
  • isinstance(LightRag(), ManagedKnowledgeBackend) returns True
  • Knowledge auto-detection wires managed backend when present
  • Search routes through managed backend when detected
  • Delete routes through managed backend via external_id
  • Pipeline managed ingestion handles CONTENT and TOPIC origins
  • Failure during managed ingestion sets FAILED status
  • Zero __class__.__name__ == "LightRag" remaining in codebase
  • Status resolution: completed, failed, still processing, fallback to external_id, async
  • All 27 knowledge unit tests pass

Generated with Claude Code

willemcdejongh and others added 6 commits February 23, 2026 10:14
Extract ~3500-line Knowledge class into three internal components:
- ContentStore: content CRUD against the contents database
- ReaderRegistry: reader management, lazy loading, and selection
- IngestionPipeline: content loading from paths, URLs, text, and topics

All public API preserved through forwarding methods. Loader mixin
compatibility maintained via forwarding internal methods. Tests updated
to mock at component level where needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert the mixin-based loader pattern to composition. Each loader
now receives a knowledge reference instead of relying on MRO callbacks.
RemoteKnowledge is replaced by RemoteLoader which composes loader instances.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix load_from_topics variable shadowing: content param was overwritten
  in loop body, losing original metadata/reader after first iteration
- Fix URL validation missing return: invalid URLs fell through to
  parsed_url.path causing UnboundLocalError
- Restore strip_agno_metadata in insert path: reserved _agno key was
  no longer stripped from user-provided metadata
- Restore basename matching in should_include_file: patterns like *.go
  now match against both full path and filename
- Restore merge_user_metadata and strip_agno_metadata in content_store
  update methods: metadata merging preserved _agno sub-keys, and
  vector_db only receives user-defined fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original code unconditionally set linked_to on every document.
The decomposition accidentally made it conditional on isolate_vector_search.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Loaders now receive the IngestionPipeline directly instead of a Knowledge
reference, calling content_store, reader_registry, and vector_db methods
without going through ~24 forwarding methods on Knowledge. This eliminates
the circular callback pattern (Knowledge -> RemoteLoader -> Loaders ->
Knowledge) and makes the data flow explicit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend protocol

Introduces a @runtime_checkable Protocol that any backend can implement
to bypass the default chunk-embed-store pipeline. LightRAG is the first
implementation. All 12 hardcoded `__class__.__name__ == "LightRag"` checks
are replaced with `isinstance(vdb, ManagedKnowledgeBackend)` detection.

- Add ManagedKnowledgeBackend protocol (knowledge/backend/managed.py)
- Add LightRagBackend implementation (knowledge/backend/lightrag.py)
- Update LightRag VectorDb to delegate to LightRagBackend
- Replace 10 ingestion pipeline checks with managed_backend routing
- Replace 2 Knowledge class checks for search/delete routing
- Delete ~310 lines of duplicated LightRAG processing code
- Add comprehensive test suite for protocol detection and routing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@willemcdejongh willemcdejongh requested a review from a team as a code owner February 26, 2026 10:07
@willemcdejongh willemcdejongh changed the base branch from willemcdejongh/knowledge-refactor-v2 to willemcdejongh/knowledge-staging March 4, 2026 08:51
willemcdejongh and others added 7 commits March 4, 2026 10:54
Resolves conflicts by keeping phase3's managed backend additions while
carrying forward staging fixes (async remove_vectors methods, pipeline
typing, URL source metadata tracking).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 'backend' name was ambiguous. Renamed to 'managed_backend' to clearly
indicate this module contains the ManagedKnowledgeBackend protocol and
implementations (e.g. LightRagBackend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g VectorDb wrapper

- Rename managed_backend/ folder to external_provider/
- Rename ManagedKnowledgeBackend protocol to ExternalKnowledgeProvider
- Rename managed.py to protocol.py
- Add explicit external_provider field on Knowledge (replaces auto-detection)
- Delete LightRag VectorDb wrapper (libs/agno/agno/vectordb/lightrag/lightrag.py)
- Update vectordb/lightrag/__init__.py to re-export LightRagBackend from external_provider
- Update all tests, cookbooks, and imports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename local variables, log messages, comments, and docstrings from
backend terminology to provider terminology for consistency with the
external_provider naming.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consistent with the external_provider naming convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cookbook/07_knowledge/05_integrations/external_providers/01_lightrag.py
demonstrating LightRagProvider usage with Knowledge. Update integrations
README with new External Providers section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ders

External provider content was stuck in "processing" status because the
ExternalKnowledgeProvider protocol lacked get_status/aget_status methods.
This adds a ProcessingResult dataclass, a processing_id field for two-stage
ID tracking (track_id -> document_id), and proper status polling via the
protocol's new get_status/aget_status methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@willemcdejongh willemcdejongh changed the title refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol feat: external knowledge providers with ProcessingResult status polling Mar 5, 2026
@willemcdejongh willemcdejongh changed the title feat: external knowledge providers with ProcessingResult status polling refactor: replace LightRag class-name checks with ManagedKnowledgeBackend protocol Mar 5, 2026
willemcdejongh and others added 3 commits March 5, 2026 18:32
The processing_id field was defined in schemas and KnowledgeRow but
was missing from the field mappings in upsert_knowledge_content()
across all SQL-based DB implementations. This caused the field to be
silently dropped on write, so status polling could never find the
processing_id to resolve external provider status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The processing_id is now read from the _agno metadata dict (which is
already persisted as part of the metadata JSON column) instead of a
dedicated DB column. This avoids schema changes across all DB backends
while still allowing status polling to find the processing_id.

Removes the processing_id field from Content, KnowledgeRow, all DB
schemas, and content_store. Status resolution now uses
get_agno_metadata(content.metadata, "processing_id") to retrieve the
polling ID that was already being saved during ingestion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LightRAG returns documents in the track_status response even while
still processing them. Check status_summary.processing and
status_summary.pending counts before declaring COMPLETED, and only
return COMPLETED when status_summary.completed > 0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant