Refactor: pip install --upgrade DocAgent by priyansh4320 · Pull Request #2076 · ag2ai/ag2

priyansh4320 · 2025-09-02T16:52:04Z

Why are these changes needed?

Identified Enterprise-readiness issues:

Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX.
Resource Waste: New vector storage processes are created for every document, even if already processed.
Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities.
Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches.

the base refactor solves the 1st 2 problems defined above , runtime Performance and resource waste via decoupling data ingestion from the parent architecture.

   # Setup
    llm_config = LLMConfig(model="o3-mini", api_type="openai", api_key=os.getenv("OPENAI_API_KEY"))

    # Initialize components
    query_engine = VectorChromaQueryEngine(collection_name="new_collection")
    ingestion_service = DocumentIngestionService(query_engine=query_engine)
    doc_agent = DocAgent(llm_config=llm_config, query_engine=query_engine)

    # Test document
    doc_path = "test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"

    if Path(doc_path).exists():
        # Step 1: Ingest document
        print("Step 1: Ingesting document...")
        result = ingestion_service.ingest_document(doc_path)
        # print(f"Ingestion: {result}")

        # Step 2: Query document
        print("\nStep 2: Querying document...")
        response = doc_agent.run(message="What is the fiscal year 2024 financial summary? ", max_turns=1)

example output:

DocAgent (to DocAgent):

What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
_User (to chat_manager):

What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

***** Suggested tool call (call_VrLT1PH5lY4fdVLLKzgwoEZ9): execute_rag_query *****
Arguments: 
{}
**********************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Group_Tool_Executor


>>>>>>>> EXECUTING FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}

>>>>>>>> EXECUTED FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}
Output:
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
_Group_Tool_Executor (to chat_manager):

***** Response from calling tool (call_VrLT1PH5lY4fdVLLKzgwoEZ9) *****
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, is as follows:

- Total Assets: $2,227 million
- Total Liabilities: $807 million
- Stockholders' Equity: $1,420 million
- Current Assets: $1,802 million
- Cash and Cash Equivalents: $761 million
- Accumulated Deficit: $1,636 million
- Additional Paid-in Capital: $3,053 million
- Total Current Liabilities: $748 million.

--------------------------------------------------------------------------------

Next speaker: SummaryAgent


>>>>>>>> USING AUTO REPLY...
SummaryAgent (to chat_manager):

Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

--------------------------------------------------------------------------------

>>>>>>>> TERMINATING RUN (4f10222b-717c-4c1c-bccf-c83aa3666058): No next speaker selected
DocAgent (to DocAgent):

Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

--------------------------------------------------------------------------------

>>>>>>>> TERMINATING RUN (d35f8d2a-e639-4d99-bfd5-0ffb1c3bb7f1): Maximum turns (1) reached
Answer: Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

Related issue number

closes #2078

Checks

I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

joggrbot · 2025-09-02T16:52:12Z

📝 Documentation Analysis

All docs are up to date! 🎉

✅ Latest commit analyzed: 34358b4 | Powered by Joggr

priyansh4320 · 2025-09-02T16:58:00Z

DocAgent Refactor

Current state of DocAgent

The existing DocAgent follows a swarm architecture with multiple specialized agents (Triage, Task Manager, Parser, Data Ingestion, Query, Error, and Summary agents). While this design provides clear separation of concerns, it introduces several production-readiness issues:
Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX
Resource Waste: New vector storage processes are created for every document, even if already processed
Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities
Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches

the new design will feature 4 layers:

Query Layer: Handles user interactions and RAG queries
Ingestion Layer: Processes documents asynchronously via events
Storage Layer: Abstracts storage backends (local, cloud, blob)
RAG Layer: Supports multiple RAG strategies (vector, structured, graph)

### How do we solve this problem?

Event-Driven Ingestion

Instead of processing documents during runtime, the new architecture will use an event-driven approach, where documents will be ingested based on triggered events like button clicks, file uploads, etc.

# Before: Synchronous processing during query
user_query = "What's in this PDF?"
# Agent processes PDF → chunks → vectorizes → stores → queries (slow!)

# After: Event-driven ingestion
ingestion_service.ingest_document("large_report.pdf")  # Async event
# Later...
user_query = "What's in this PDF?"
# Agent queries pre-processed data (fast!)

Decoupled Storage
The storage layer will be separated from the query logic, this will allow users to configure cloud storage without changing the core agent logic.

@dataclass
class StorageConfig:
    storage_type: str = "local"  # "local", "s3", "azure", "gcs", "minio"
    base_path: Path = field(default_factory=lambda: Path("./storage"))
    bucket_name: str | None = None
    credentials: dict[str, Any] | None = None

Multiple RAG Backends
The new architecture supports three RAG strategies through a unified interface, add can be configured for any backend

@dataclass
class RAGConfig:
    rag_type: str = "vector"  # "vector", "structured", "graph"
    backend: str = "chromadb"  # "chromadb", "weaviate", "neo4j", "inmemory"
    collection_name: str | None = None
    embedding_model: str = "all-MiniLM-L6-v2"

Configuration & Interfaces
Unified Configuration
The DocAgentConfig consolidates all settings in one place:

config = DocAgentConfig(
    rag=RAGConfig(
        rag_type="vector",
        backend="weaviate",
        embedding_model="all-MiniLM-L6-v2"
    ),
    storage=StorageConfig(
        storage_type="s3",
        bucket_name="my-docs-bucket"
    ),
    processing=ProcessingConfig(
        chunk_size=1024,
        max_file_size=500 * 1024 * 1024  # 500MB
    )
)

example usage

from autogen.agents.experimental.document_agent import DocAgent2, DocumentIngestionService
from autogen.agents.experimental.document_agent.core import DocAgentConfig

# Configure for production use
config = DocAgentConfig(
    rag=RAGConfig(backend="weaviate", rag_type="vector"),
    storage=StorageConfig(storage_type="s3", bucket_name="company-docs")
)

# Initialize query engine (supports multiple backends)
query_engine = WeaviateQueryEngine(config.rag)

# Create ingestion service (handles document processing)
ingestion_service = DocumentIngestionService(query_engine, config)

# Process documents asynchronously (event-driven)
ingestion_service.ingest_document("large_manual.pdf")  # Non-blocking

# Create query agent (fast, no document processing)
doc_agent = DocAgent2(
    query_engine=query_engine,
    config=config
)

# Query pre-processed documents
response = doc_agent.query("What are the safety procedures?")

todos:

initial refactoring plan:

Extract base interfaces from existing query engines
Move document processing to separate ingestion module
Simplify DocAgent to be query-only
Create separate ingestion service using existing code

rough FS structure

document_agent/
├── core/
│   ├── __init__.py
│   ├── base_interfaces.py          # Extract interfaces from existing code
│   └── config.py                   # Configuration from existing code
├── ingestion/
│   ├── __init__.py
│   ├── document_processor.py       # Move from parser_utils.py + docling_doc_ingest_agent.py
│   └── chunking_strategies.py      # Extract from existing parsing logic
├── storage/
│   ├── __init__.py
│   └── local_storage.py            # Move from document_utils.py
├── rag/
│   ├── __init__.py
│   ├── base_rag.py                 # Extract from chroma_query_engine.py + inmemory_query_engine.py
│   └── vector_rag.py               # Move chroma_query_engine.py
└── agents/
    ├── __init__.py
    ├── doc_agent.py                # Simplified version of document_agent.py
    └── ingestion_agent.py          # Move from docling_doc_ingest_agent.py

step 2: We will add a Database Storage Layer add blob storage support (S3, Azure, GCS), Implement MinIO/DynamoDB bucket support and Creating storage abstraction layer
- [ ] step 4: Add structured RAG support:
add postgresDBqueryengine support, implement structured query capabilities, create structured RAG strategy
step 5: We will add Graph RAG Backend , event based Knowledge Graph Creation support. add support for cypher queries support for data retrieval.
step 6: add unit test module for new DocAgent

The refactored DocAgent transforms from a research prototype into a production/enterprise-ready Ag2 feature with following benefits:

Performance: Query responses are instant since documents are pre-processed.
Scalability: Cloud storage support handles enterprise document volumes.
Flexibility: Multiple RAG backends for different use cases.
Maintainability: Clear separation of concerns and unified configuration.
Production Ready: Event-driven architecture supports real-world orchestrations.

qingyun-wu · 2025-09-03T04:34:38Z

@marklysze can you help review? Thank you!

autogen/agents/experimental/document_agent/core/base_interfaces.py

codecov · 2026-04-06T05:16:32Z

Codecov Report

❌ Patch coverage is 71.70418% with 88 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...tal/document_agent/ingestion/document_processor.py	28.57%	55 Missing ⚠️
...ts/experimental/document_agent/agents/doc_agent.py	74.48%	24 Missing and 1 partial ⚠️
...xperimental/document_agent/core/base_interfaces.py	82.60%	8 Missing ⚠️

Files with missing lines	Coverage Δ
...imental/document_agent/agents/ingestion_service.py	`100.00% <100.00%> (ø)`
.../agents/experimental/document_agent/core/config.py	`100.00% <100.00%> (ø)`
...xperimental/document_agent/core/base_interfaces.py	`82.60% <82.60%> (ø)`
...ts/experimental/document_agent/agents/doc_agent.py	`74.48% <74.48%> (ø)`
...tal/document_agent/ingestion/document_processor.py	`28.57% <28.57%> (ø)`

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

DocAgent: pip install --upgrade DocAgent

a6cf493

priyansh4320 changed the title ~~[Refactor]: pip install --upgrade DocAgent~~ Refactor: pip install --upgrade DocAgent Sep 2, 2025

priyansh4320 added 7 commits September 3, 2025 02:42

change name from DocAgent2 -> DocAgent

8ce5f7c

tests: base interfaces

8a0cb59

fix: add licence headers

9f9ac36

test:test_config.py

6ce2853

test:doc agent

e4138d2

tests: ingestion service

196d3ef

test: document_parser.py

033662f

priyansh4320 requested a review from marklysze September 3, 2025 02:20

fix: pr-checks

da2723a

priyansh4320 force-pushed the docagent-base-refactor branch from 87d30b5 to da2723a Compare September 3, 2025 19:51

randombet reviewed Sep 5, 2025

View reviewed changes

autogen/agents/experimental/document_agent/core/base_interfaces.py Show resolved Hide resolved

priyansh4320 self-assigned this Sep 7, 2025

priyansh4320 added 3 commits September 8, 2025 07:56

Merge branch 'main' into docagent-base-refactor

9db05e4

Merge branch 'main' into docagent-base-refactor

6d78552

Merge branch 'main' into docagent-base-refactor

34358b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: pip install --upgrade DocAgent#2076

Refactor: pip install --upgrade DocAgent#2076
priyansh4320 wants to merge 12 commits intomainfrom
docagent-base-refactor

priyansh4320 commented Sep 2, 2025 •

edited

Loading

Uh oh!

joggrbot bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

priyansh4320 commented Sep 2, 2025 •

edited

Loading

Uh oh!

qingyun-wu commented Sep 3, 2025

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

priyansh4320 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

joggrbot bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Documentation Analysis

Uh oh!

priyansh4320 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DocAgent Refactor

Current state of DocAgent

the new design will feature 4 layers:

example usage

Uh oh!

qingyun-wu commented Sep 3, 2025

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

priyansh4320 commented Sep 2, 2025 •

edited

Loading

joggrbot bot commented Sep 2, 2025 •

edited

Loading

priyansh4320 commented Sep 2, 2025 •

edited

Loading