Md-Emon-Hasan
diff --git a/‎Dockerfile‎
Lines changed: 6 additions & 6 deletions b/‎Dockerfile‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎README.md‎
Lines changed: 127 additions & 114 deletions b/‎README.md‎
Lines changed: 127 additions & 114 deletions
diff --git a/‎__pycache__/agents.cpython-311.pyc‎
-5.35 KB b/‎__pycache__/agents.cpython-311.pyc‎
-5.35 KB
diff --git a/‎__pycache__/config.cpython-311.pyc‎
-455 Bytes b/‎__pycache__/config.cpython-311.pyc‎
-455 Bytes
diff --git a/‎__pycache__/tools.cpython-311.pyc‎
-3.96 KB b/‎__pycache__/tools.cpython-311.pyc‎
-3.96 KB
diff --git a/‎__pycache__/vectorstore.cpython-311.pyc‎
-2.46 KB b/‎__pycache__/vectorstore.cpython-311.pyc‎
-2.46 KB
@@ -2,16 +2,16 @@
 FROM python:3.11-slim
 
 # Set the working directory
-WORKDIR /main
+WORKDIR /app
 
 # Copy the current directory contents into the container
-COPY . /main
+COPY . /app
 
 # Install the dependencies
 RUN pip install --no-cache-dir -r requirements.txt
 
-# Expose Streamlit default port
-EXPOSE 8501
+# Expose the port Flask runs on
+EXPOSE 5000
 
-# Correct command to run Streamlit app
-CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
+# Command to run the Flask app
+CMD ["python", "app.py"]
@@ -1,141 +1,160 @@
-# 🧠 AutoDocThinker: Intelligent Search Engine with Reasoning + Tool Usage Logic
+# 🧠 AutoDocThinker: Agentic RAG System with Intelligent Search Engine
 
-[![AutoDocThinker](https://github.com/user-attachments/assets/8305c81b-2d33-43fc-ab70-b4b036399355)](https://github.com/user-attachments/assets/8305c81b-2d33-43fc-ab70-b4b036399355)
-
----
+[![AutoDocThinker](https://github.com/user-attachments/assets/8d5c8a4c-cdc8-4569-8ade-af06b8318db9)](https://github.com/user-attachments/assets/8d5c8a4c-cdc8-4569-8ade-af06b8318db9)
 
 ## 🎯 **Project Overview**
 
-This is document search engine project presents an **Agentic AI system** built using the **LangGraph** framework and **LLaMA-3** model via **Groq** API. The system leverages **modular agents** (planner, executor, tools) with short-term memory and tool reasoning to solve user queries using:
-
-The system is capable of:
-
-* Dynamically deciding between document search vs. web search
-* Summarizing and responding to user queries in bullet points
-* Retaining the last 3 interactions for continuity
-
----
-
-## 📌 **Problem Statement**
-
-In the modern information era, users are overwhelmed with documents (PDFs, resumes, research papers) and an ever-growing web of online content. Searching, filtering, and understanding these sources efficiently remains a major challenge—especially when users need:
-
-* Answers extracted *only* from uploaded documents (e.g., resumes, proposals).
-* Fresh and real-time information from the web (e.g., recent news, trends).
-* Condensed summaries rather than raw search results.
-* A system that can *reason*, plan, and decide how to answer.
+The Agentic RAG System is an AI-powered document intelligence platform that enables users to extract insights from uploaded files (PDFs, Word docs, text) or web URLs through natural language queries. Built with Python/Flask and LangChain, the system uses a multi-agent workflow to intelligently process documents, retrieve relevant information from a vector database (ChromaDB), and generate human-like answers—seamlessly falling back to Wikipedia when needed. The responsive web interface (HTML/CSS/Bootstrap) allows users to ask questions conversationally, while the modular backend demonstrates robust error handling, logging, and secure file processing.
 
 ---
 
+## 🚀 **Live Demo**
 
-## 🚀 Live Demo
-
-🖥️ **Try it now**: [AutoDocThinker: Intelligent Search Engine with Reasoning + Tool Usage Logic](https://autodocthinker.onrender.com/)
+🖥️ **Try it now**: [AutoDocThinker: Agentic RAG System with Intelligent Search Engine](https://autodocthinker.onrender.com/)
 
 ---
 
-## ⚙️ Features & Functionalities
-
-| ✅ Step | 🧠 Feature | ⚙️ Tech Stack / Tool Used | 📝 Implementation Details |
-|---------|------------|--------------------------|--------------------------|
-| 1️⃣ | **LLM-based Query Understanding** | Groq (LLaMA-3-70B) | `ChatGroq` initialized with temperature=0.2 |
-| 2️⃣ | **Document Processing** | PyPDFLoader + RecursiveTextSplitter | PDF chunking (500 chars with 100 overlap) |
-| 3️⃣ | **Vector Embeddings** | HuggingFace (all-MiniLM-L6-v2) | Sentence transformers for semantic search |
-| 4️⃣ | **Vector Database** | ChromaDB | Persistent storage at `../chroma_db` |
-| 5️⃣ | **Web Search Tool** | DuckDuckGoSearchRun | Real-time information fallback |
-| 6️⃣ | **Tool Routing** | Custom `tool_router()` | Keyword-based tool selection |
-| 7️⃣ | **Short-Term Memory** | `deque(maxlen=3)` | Last 3 contexts tracking |
-| 8️⃣ | **Planner Agent** | LangGraph Planner Node | Generates execution plans |
-| 9️⃣ | **Executor Agent** | LangGraph Node | Orchestrates tool calls |
-| 🔟 | **Summarization** | Groq LLM | Context condensation |
-| 🖼️ | **Streamlit UI** | Streamlit | Interactive web interface |
-| 🐳 | **Containerization** | Docker | Portable deployment |
-| 🔁 | **CI/CD Pipeline** | GitHub Actions | Automated linting/testing |
+## ⚙️ **Features & Functionalities**
+
+| #  | Module               | Technology Stack             | Your Implementation Details              |
+|----|----------------------|------------------------------|------------------------------------------|
+| 1  | **LLM Processing**   | Groq + LLaMA-3-70B           | Configured with optimal temperature (0.2) and token limits |
+| 2  | **Document Parsing** | PyMuPDF + python-docx        | Handled PDF, DOCX, TXT with metadata preservation |
+| 3  | **Text Chunking**    | RecursiveCharacterTextSplitter| 500-character chunks with 20% overlap for context |
+| 4  | **Vector Embeddings**| all-MiniLM-L6-v2             | Efficient 384-dimensional embeddings |
+| 5  | **Vector Database**  | ChromaDB                     | Local persistent storage with cosine similarity |
+| 6  | **Agent Workflow**   | LangGraph                    | 7 specialized nodes with conditional routing |
+| 7 | **Planner Agent** | LangGraph Planner Node | Generates execution plans |
+| 8 | **Executor Agent** | LangGraph Node | Orchestrates tool calls |
+| 9 | **Web Fallback**     | Wikipedia API                | Auto-triggered when document confidence < threshold |
+| 10 | **Memory System**    | deque(maxlen=3)              | Maintained conversation history buffer |
+| 11 | **User Interface**   | HTML, CSS, Bootstrap, JS                   | Interactive web app with file, URL, Text upload |
+| 12 | **Containerization** | Docker | Portable deployment |
+| 13 | **CI/CD Pipeline** | GitHub Actions | Automated linting/testing |
 
 ---
 
-## 🧱 Project Structure
+## 🧱 **Project Structure**
 
 ```
 AutoDocThinker/
 ├── .github/
-│   └── workflows/
-│       └── ci.yml                 # GitHub Actions CI/CD pipeline
-|
+│ └── workflows/
+│     └── main.yml
+│  
+├── agents/
+│ ├── init.py
+│ ├── document_processor.py
+│ └── orchestration.py
+│  
 ├── data/
-│   └── sample.pdf                 # Sample document
-|
-├── notebook/
-│   └── experiment.ipynb           # Full application logic
-│        
-├── chroma_db/
-│   
+│ └── sample.pdf
+│  
+├── notebooks/
+│ └── experiment.ipynb
+│  
+├── static/
+│ ├── css/
+│ │ └── style.css
+│ └── js/
+│   └── script.js
+│  
+├── templates/
+│ └── index.html
+│  
 ├── tests/
-│   └──test_app.py                 # Conversation memory
-|
-├── app.py                         # Streamlit UI
-├── __init__.py                    
-├── vectorstore.py                 # ChromaDB initialization 
-├── agents.py                      # LangGraph-based agent
-|── logger.py                      # Logger configuration
-├── logging_config.py              # Configuration
-├── setup.py                       # Package setup
-├── main.py                        # Main py file
-├── gitignore                      # Git ignore file
-├── Dockerfile                     # Docker image
-├── app.png                        # Demo picture
-├── demo.webm                      # Demo video
-├── requirements.txt               # Python dependencies
-├── README.md                      # Project documentation
-├── LICENSE                        # Project license
+│ └── test_app.py
+│  
+├── uploads/
+│  
+├── vector_db/
+│ └── chroma_collection/
+│   └── chroma.sqlite3
+│
+├── app.log
+├── app.py
+├── demo.mp4
+├── demo.png
+├── Dockerfile
+├── LICENSE
+├── render.yaml
+├── README.md
+├── requirements.txt
+└── setup.py
 ```
 
 ---
 
-## 🧱 System Architecture
+## 🧱 **System Architecture**
 
 ```mermaid
-flowchart TD
-    %% Main Flow
-    A[User Query] --> B(Planner Agent)
-    B --> C["Generate Plan\n(SelectTool → Retrieve → Summarize)"]
-    C --> D{Tool Router}
-    D -->|Document Query| E[Document Retriever]
-    D -->|Web Search| F[DuckDuckGo]
-    E --> G[ChromaDB]
-    F --> H[Web Results]
-    G & H --> I[Summarizer]
-    I --> J[Update Memory]
-    J --> K[Final Answer]
-
-    %% Components
-    subgraph "Core System"
-        B -->|LangGraph| C
-        C -->|LangGraph| D
-        I -->|Groq LLaMA3| K
-    end
-
-    subgraph "Data Sources"
-        G[(ChromaDB\nVector Store)]
-        H[[Live Web]]
-    end
-
-    subgraph "Memory"
-        J[(Short-Term\nMemory)]
-    end
-
-    %% Style
-    linkStyle 0,1,2,3,4,5,6,7,8 stroke:#555,stroke-width:2px
-    style A fill:#4CAF50,color:white
-    style B fill:#2196F3,color:white
-    style D fill:#FF9800,color:black
-    style G fill:#9C27B0,color:white
-    style H fill:#009688,color:white
-    style J fill:#607D8B,color:white
+%% Agentic RAG System Architecture - Colorful Version
+graph TD
+    A[User Interface]:::ui -->|Upload/Input| B[Flask Web Server]:::server
+    B --> C[Tool Router Agent]:::router
+    C -->|File| D[Document Processor]:::processor
+    C -->|URL| E[Web Scraper]:::scraper
+    C -->|Text| F[Text Preprocessor]:::preprocessor
+    
+    D --> G[PDF/DOCX/TXT Parser]:::parser
+    E --> H[URL Content Extractor]:::extractor
+    F --> I[Text Chunker]:::chunker
+    
+    G --> J[Chunking & Embedding]:::embedding
+    H --> J
+    I --> J
+    
+    J --> K[Vector Database]:::database
+    
+    B -->|Query| L[Planner Agent]:::planner
+    L -->|Has Documents| M[Retriever Agent]:::retriever
+    L -->|No Documents| N[Fallback Agent]:::fallback
+    
+    M --> K
+    K --> O[LLM Answer Agent]:::llm
+    N --> P[Wikipedia API]:::api
+    P --> O
+    
+    O --> Q[Response Formatter]:::formatter
+    Q --> B
+    B --> A
+
+    classDef ui fill:#4e79a7,color:white,stroke:#333;
+    classDef server fill:#f28e2b,color:white,stroke:#333;
+    classDef router fill:#e15759,color:white,stroke:#333;
+    classDef processor fill:#76b7b2,color:white,stroke:#333;
+    classDef scraper fill:#59a14f,color:white,stroke:#333;
+    classDef preprocessor fill:#edc948,color:#333,stroke:#333;
+    classDef parser fill:#b07aa1,color:white,stroke:#333;
+    classDef extractor fill:#ff9da7,color:#333,stroke:#333;
+    classDef chunker fill:#9c755f,color:white,stroke:#333;
+    classDef embedding fill:#bab0ac,color:#333,stroke:#333;
+    classDef database fill:#8cd17d,color:#333,stroke:#333;
+    classDef planner fill:#499894,color:white,stroke:#333;
+    classDef retriever fill:#86bcb6,color:#333,stroke:#333;
+    classDef fallback fill:#f1ce63,color:#333,stroke:#333;
+    classDef llm fill:#d37295,color:white,stroke:#333;
+    classDef api fill:#a0d6e5,color:#333,stroke:#333;
+    classDef formatter fill:#b3b3b3,color:#333,stroke:#333;
 ```
 
 ---
 
+### 🌍 **Real-World Applications**
+
+  1. **Corporate HR Automation**
+  2. **Legal Document Review**
+  3. **Academic Research** 
+  4. **Customer Support**
+  5. **Healthcare Compliance** 
+  6. **Financial Analysis**
+  7. **Media Monitoring**
+  8. **Education**
+  9. **Technical Documentation**
+  10. **Government Transparency**
+
+---
+
 ## 📥 Installation
 
 ```bash
@@ -213,10 +232,4 @@ jobs:
 🔗 Facebook: [mdemon.hasan2001/](https://www.facebook.com/mdemon.hasan2001/)
 🔗 WhatsApp: [8801834363533](https://wa.me/8801834363533)
 
----
-
-## 📄 License
-
-MIT License — Free to use, share, and contribute.
-
 ---