|
1 | | -# 🧠 AutoDocThinker: Intelligent Search Engine with Reasoning + Tool Usage Logic |
| 1 | +# 🧠 AutoDocThinker: Agentic RAG System with Intelligent Search Engine |
2 | 2 |
|
3 | | -[](https://github.com/user-attachments/assets/8305c81b-2d33-43fc-ab70-b4b036399355) |
4 | | - |
5 | | ---- |
| 3 | +[](https://github.com/user-attachments/assets/8d5c8a4c-cdc8-4569-8ade-af06b8318db9) |
6 | 4 |
|
7 | 5 | ## 🎯 **Project Overview** |
8 | 6 |
|
9 | | -This is document search engine project presents an **Agentic AI system** built using the **LangGraph** framework and **LLaMA-3** model via **Groq** API. The system leverages **modular agents** (planner, executor, tools) with short-term memory and tool reasoning to solve user queries using: |
10 | | - |
11 | | -The system is capable of: |
12 | | - |
13 | | -* Dynamically deciding between document search vs. web search |
14 | | -* Summarizing and responding to user queries in bullet points |
15 | | -* Retaining the last 3 interactions for continuity |
16 | | - |
17 | | ---- |
18 | | - |
19 | | -## 📌 **Problem Statement** |
20 | | - |
21 | | -In the modern information era, users are overwhelmed with documents (PDFs, resumes, research papers) and an ever-growing web of online content. Searching, filtering, and understanding these sources efficiently remains a major challenge—especially when users need: |
22 | | - |
23 | | -* Answers extracted *only* from uploaded documents (e.g., resumes, proposals). |
24 | | -* Fresh and real-time information from the web (e.g., recent news, trends). |
25 | | -* Condensed summaries rather than raw search results. |
26 | | -* A system that can *reason*, plan, and decide how to answer. |
| 7 | +The Agentic RAG System is an AI-powered document intelligence platform that enables users to extract insights from uploaded files (PDFs, Word docs, text) or web URLs through natural language queries. Built with Python/Flask and LangChain, the system uses a multi-agent workflow to intelligently process documents, retrieve relevant information from a vector database (ChromaDB), and generate human-like answers—seamlessly falling back to Wikipedia when needed. The responsive web interface (HTML/CSS/Bootstrap) allows users to ask questions conversationally, while the modular backend demonstrates robust error handling, logging, and secure file processing. |
27 | 8 |
|
28 | 9 | --- |
29 | 10 |
|
| 11 | +## 🚀 **Live Demo** |
30 | 12 |
|
31 | | -## 🚀 Live Demo |
32 | | - |
33 | | -🖥️ **Try it now**: [AutoDocThinker: Intelligent Search Engine with Reasoning + Tool Usage Logic](https://autodocthinker.onrender.com/) |
| 13 | +🖥️ **Try it now**: [AutoDocThinker: Agentic RAG System with Intelligent Search Engine](https://autodocthinker.onrender.com/) |
34 | 14 |
|
35 | 15 | --- |
36 | 16 |
|
37 | | -## ⚙️ Features & Functionalities |
38 | | - |
39 | | -| ✅ Step | 🧠 Feature | ⚙️ Tech Stack / Tool Used | 📝 Implementation Details | |
40 | | -|---------|------------|--------------------------|--------------------------| |
41 | | -| 1️⃣ | **LLM-based Query Understanding** | Groq (LLaMA-3-70B) | `ChatGroq` initialized with temperature=0.2 | |
42 | | -| 2️⃣ | **Document Processing** | PyPDFLoader + RecursiveTextSplitter | PDF chunking (500 chars with 100 overlap) | |
43 | | -| 3️⃣ | **Vector Embeddings** | HuggingFace (all-MiniLM-L6-v2) | Sentence transformers for semantic search | |
44 | | -| 4️⃣ | **Vector Database** | ChromaDB | Persistent storage at `../chroma_db` | |
45 | | -| 5️⃣ | **Web Search Tool** | DuckDuckGoSearchRun | Real-time information fallback | |
46 | | -| 6️⃣ | **Tool Routing** | Custom `tool_router()` | Keyword-based tool selection | |
47 | | -| 7️⃣ | **Short-Term Memory** | `deque(maxlen=3)` | Last 3 contexts tracking | |
48 | | -| 8️⃣ | **Planner Agent** | LangGraph Planner Node | Generates execution plans | |
49 | | -| 9️⃣ | **Executor Agent** | LangGraph Node | Orchestrates tool calls | |
50 | | -| 🔟 | **Summarization** | Groq LLM | Context condensation | |
51 | | -| 🖼️ | **Streamlit UI** | Streamlit | Interactive web interface | |
52 | | -| 🐳 | **Containerization** | Docker | Portable deployment | |
53 | | -| 🔁 | **CI/CD Pipeline** | GitHub Actions | Automated linting/testing | |
| 17 | +## ⚙️ **Features & Functionalities** |
| 18 | + |
| 19 | +| # | Module | Technology Stack | Your Implementation Details | |
| 20 | +|----|----------------------|------------------------------|------------------------------------------| |
| 21 | +| 1 | **LLM Processing** | Groq + LLaMA-3-70B | Configured with optimal temperature (0.2) and token limits | |
| 22 | +| 2 | **Document Parsing** | PyMuPDF + python-docx | Handled PDF, DOCX, TXT with metadata preservation | |
| 23 | +| 3 | **Text Chunking** | RecursiveCharacterTextSplitter| 500-character chunks with 20% overlap for context | |
| 24 | +| 4 | **Vector Embeddings**| all-MiniLM-L6-v2 | Efficient 384-dimensional embeddings | |
| 25 | +| 5 | **Vector Database** | ChromaDB | Local persistent storage with cosine similarity | |
| 26 | +| 6 | **Agent Workflow** | LangGraph | 7 specialized nodes with conditional routing | |
| 27 | +| 7 | **Planner Agent** | LangGraph Planner Node | Generates execution plans | |
| 28 | +| 8 | **Executor Agent** | LangGraph Node | Orchestrates tool calls | |
| 29 | +| 9 | **Web Fallback** | Wikipedia API | Auto-triggered when document confidence < threshold | |
| 30 | +| 10 | **Memory System** | deque(maxlen=3) | Maintained conversation history buffer | |
| 31 | +| 11 | **User Interface** | HTML, CSS, Bootstrap, JS | Interactive web app with file, URL, Text upload | |
| 32 | +| 12 | **Containerization** | Docker | Portable deployment | |
| 33 | +| 13 | **CI/CD Pipeline** | GitHub Actions | Automated linting/testing | |
54 | 34 |
|
55 | 35 | --- |
56 | 36 |
|
57 | | -## 🧱 Project Structure |
| 37 | +## 🧱 **Project Structure** |
58 | 38 |
|
59 | 39 | ``` |
60 | 40 | AutoDocThinker/ |
61 | 41 | ├── .github/ |
62 | | -│ └── workflows/ |
63 | | -│ └── ci.yml # GitHub Actions CI/CD pipeline |
64 | | -| |
| 42 | +│ └── workflows/ |
| 43 | +│ └── main.yml |
| 44 | +│ |
| 45 | +├── agents/ |
| 46 | +│ ├── init.py |
| 47 | +│ ├── document_processor.py |
| 48 | +│ └── orchestration.py |
| 49 | +│ |
65 | 50 | ├── data/ |
66 | | -│ └── sample.pdf # Sample document |
67 | | -| |
68 | | -├── notebook/ |
69 | | -│ └── experiment.ipynb # Full application logic |
70 | | -│ |
71 | | -├── chroma_db/ |
72 | | -│ |
| 51 | +│ └── sample.pdf |
| 52 | +│ |
| 53 | +├── notebooks/ |
| 54 | +│ └── experiment.ipynb |
| 55 | +│ |
| 56 | +├── static/ |
| 57 | +│ ├── css/ |
| 58 | +│ │ └── style.css |
| 59 | +│ └── js/ |
| 60 | +│ └── script.js |
| 61 | +│ |
| 62 | +├── templates/ |
| 63 | +│ └── index.html |
| 64 | +│ |
73 | 65 | ├── tests/ |
74 | | -│ └──test_app.py # Conversation memory |
75 | | -| |
76 | | -├── app.py # Streamlit UI |
77 | | -├── __init__.py |
78 | | -├── vectorstore.py # ChromaDB initialization |
79 | | -├── agents.py # LangGraph-based agent |
80 | | -|── logger.py # Logger configuration |
81 | | -├── logging_config.py # Configuration |
82 | | -├── setup.py # Package setup |
83 | | -├── main.py # Main py file |
84 | | -├── gitignore # Git ignore file |
85 | | -├── Dockerfile # Docker image |
86 | | -├── app.png # Demo picture |
87 | | -├── demo.webm # Demo video |
88 | | -├── requirements.txt # Python dependencies |
89 | | -├── README.md # Project documentation |
90 | | -├── LICENSE # Project license |
| 66 | +│ └── test_app.py |
| 67 | +│ |
| 68 | +├── uploads/ |
| 69 | +│ |
| 70 | +├── vector_db/ |
| 71 | +│ └── chroma_collection/ |
| 72 | +│ └── chroma.sqlite3 |
| 73 | +│ |
| 74 | +├── app.log |
| 75 | +├── app.py |
| 76 | +├── demo.mp4 |
| 77 | +├── demo.png |
| 78 | +├── Dockerfile |
| 79 | +├── LICENSE |
| 80 | +├── render.yaml |
| 81 | +├── README.md |
| 82 | +├── requirements.txt |
| 83 | +└── setup.py |
91 | 84 | ``` |
92 | 85 |
|
93 | 86 | --- |
94 | 87 |
|
95 | | -## 🧱 System Architecture |
| 88 | +## 🧱 **System Architecture** |
96 | 89 |
|
97 | 90 | ```mermaid |
98 | | -flowchart TD |
99 | | - %% Main Flow |
100 | | - A[User Query] --> B(Planner Agent) |
101 | | - B --> C["Generate Plan\n(SelectTool → Retrieve → Summarize)"] |
102 | | - C --> D{Tool Router} |
103 | | - D -->|Document Query| E[Document Retriever] |
104 | | - D -->|Web Search| F[DuckDuckGo] |
105 | | - E --> G[ChromaDB] |
106 | | - F --> H[Web Results] |
107 | | - G & H --> I[Summarizer] |
108 | | - I --> J[Update Memory] |
109 | | - J --> K[Final Answer] |
110 | | -
|
111 | | - %% Components |
112 | | - subgraph "Core System" |
113 | | - B -->|LangGraph| C |
114 | | - C -->|LangGraph| D |
115 | | - I -->|Groq LLaMA3| K |
116 | | - end |
117 | | -
|
118 | | - subgraph "Data Sources" |
119 | | - G[(ChromaDB\nVector Store)] |
120 | | - H[[Live Web]] |
121 | | - end |
122 | | -
|
123 | | - subgraph "Memory" |
124 | | - J[(Short-Term\nMemory)] |
125 | | - end |
126 | | -
|
127 | | - %% Style |
128 | | - linkStyle 0,1,2,3,4,5,6,7,8 stroke:#555,stroke-width:2px |
129 | | - style A fill:#4CAF50,color:white |
130 | | - style B fill:#2196F3,color:white |
131 | | - style D fill:#FF9800,color:black |
132 | | - style G fill:#9C27B0,color:white |
133 | | - style H fill:#009688,color:white |
134 | | - style J fill:#607D8B,color:white |
| 91 | +%% Agentic RAG System Architecture - Colorful Version |
| 92 | +graph TD |
| 93 | + A[User Interface]:::ui -->|Upload/Input| B[Flask Web Server]:::server |
| 94 | + B --> C[Tool Router Agent]:::router |
| 95 | + C -->|File| D[Document Processor]:::processor |
| 96 | + C -->|URL| E[Web Scraper]:::scraper |
| 97 | + C -->|Text| F[Text Preprocessor]:::preprocessor |
| 98 | + |
| 99 | + D --> G[PDF/DOCX/TXT Parser]:::parser |
| 100 | + E --> H[URL Content Extractor]:::extractor |
| 101 | + F --> I[Text Chunker]:::chunker |
| 102 | + |
| 103 | + G --> J[Chunking & Embedding]:::embedding |
| 104 | + H --> J |
| 105 | + I --> J |
| 106 | + |
| 107 | + J --> K[Vector Database]:::database |
| 108 | + |
| 109 | + B -->|Query| L[Planner Agent]:::planner |
| 110 | + L -->|Has Documents| M[Retriever Agent]:::retriever |
| 111 | + L -->|No Documents| N[Fallback Agent]:::fallback |
| 112 | + |
| 113 | + M --> K |
| 114 | + K --> O[LLM Answer Agent]:::llm |
| 115 | + N --> P[Wikipedia API]:::api |
| 116 | + P --> O |
| 117 | + |
| 118 | + O --> Q[Response Formatter]:::formatter |
| 119 | + Q --> B |
| 120 | + B --> A |
| 121 | +
|
| 122 | + classDef ui fill:#4e79a7,color:white,stroke:#333; |
| 123 | + classDef server fill:#f28e2b,color:white,stroke:#333; |
| 124 | + classDef router fill:#e15759,color:white,stroke:#333; |
| 125 | + classDef processor fill:#76b7b2,color:white,stroke:#333; |
| 126 | + classDef scraper fill:#59a14f,color:white,stroke:#333; |
| 127 | + classDef preprocessor fill:#edc948,color:#333,stroke:#333; |
| 128 | + classDef parser fill:#b07aa1,color:white,stroke:#333; |
| 129 | + classDef extractor fill:#ff9da7,color:#333,stroke:#333; |
| 130 | + classDef chunker fill:#9c755f,color:white,stroke:#333; |
| 131 | + classDef embedding fill:#bab0ac,color:#333,stroke:#333; |
| 132 | + classDef database fill:#8cd17d,color:#333,stroke:#333; |
| 133 | + classDef planner fill:#499894,color:white,stroke:#333; |
| 134 | + classDef retriever fill:#86bcb6,color:#333,stroke:#333; |
| 135 | + classDef fallback fill:#f1ce63,color:#333,stroke:#333; |
| 136 | + classDef llm fill:#d37295,color:white,stroke:#333; |
| 137 | + classDef api fill:#a0d6e5,color:#333,stroke:#333; |
| 138 | + classDef formatter fill:#b3b3b3,color:#333,stroke:#333; |
135 | 139 | ``` |
136 | 140 |
|
137 | 141 | --- |
138 | 142 |
|
| 143 | +### 🌍 **Real-World Applications** |
| 144 | + |
| 145 | + 1. **Corporate HR Automation** |
| 146 | + 2. **Legal Document Review** |
| 147 | + 3. **Academic Research** |
| 148 | + 4. **Customer Support** |
| 149 | + 5. **Healthcare Compliance** |
| 150 | + 6. **Financial Analysis** |
| 151 | + 7. **Media Monitoring** |
| 152 | + 8. **Education** |
| 153 | + 9. **Technical Documentation** |
| 154 | + 10. **Government Transparency** |
| 155 | + |
| 156 | +--- |
| 157 | + |
139 | 158 | ## 📥 Installation |
140 | 159 |
|
141 | 160 | ```bash |
@@ -213,10 +232,4 @@ jobs: |
213 | 232 | 🔗 Facebook: [mdemon.hasan2001/](https://www.facebook.com/mdemon.hasan2001/) |
214 | 233 | 🔗 WhatsApp: [8801834363533](https://wa.me/8801834363533) |
215 | 234 |
|
216 | | ---- |
217 | | -
|
218 | | -## 📄 License |
219 | | -
|
220 | | -MIT License — Free to use, share, and contribute. |
221 | | -
|
222 | 235 | --- |
0 commit comments