Md-Emon-Hasan
diff --git a/‎.github/workflows/main.yml‎
Lines changed: 38 additions & 0 deletions b/‎.github/workflows/main.yml‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 17 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 221 additions & 0 deletions b/‎README.md‎
Lines changed: 221 additions & 0 deletions
diff --git a/‎__pycache__/config.cpython-311.pyc‎
377 Bytes b/‎__pycache__/config.cpython-311.pyc‎
377 Bytes
diff --git a/‎__pycache__/logger.cpython-311.pyc‎
1.37 KB b/‎__pycache__/logger.cpython-311.pyc‎
1.37 KB
diff --git a/‎app.png‎
84.9 KB b/‎app.png‎
84.9 KB
diff --git a/‎app.py‎
Lines changed: 46 additions & 0 deletions b/‎app.py‎
Lines changed: 46 additions & 0 deletions
diff --git a/‎config.py‎
Lines changed: 7 additions & 0 deletions b/‎config.py‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎data/Invoice.pdf‎
98.7 KB b/‎data/Invoice.pdf‎
98.7 KB
diff --git a/‎ingestion/__pycache__/loader.cpython-311.pyc‎
2.02 KB b/‎ingestion/__pycache__/loader.cpython-311.pyc‎
2.02 KB
@@ -0,0 +1,38 @@
+name: Docker Image CI-CD
+
+on:
+  push:
+    branches: [ "master" ]
+  pull_request:
+    branches: [ "master" ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v3
+
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.x'
+
+    - name: Set up Docker Buildx
+      uses: docker/setup-buildx-action@v2
+
+    - name: Cache Docker layers
+      uses: actions/cache@v3
+      with:
+        path: /tmp/.buildx-cache
+        key: ${{ runner.os }}-buildx-${{ github.sha }}
+        restore-keys: |
+          ${{ runner.os }}-buildx-
+
+    - name: Build Docker image
+      run: docker build -t auto-doc-thinker .
+
+    - name: Test the application (Run tests inside container)
+      run: docker run --rm auto-doc-thinker pytest tests/
@@ -0,0 +1,17 @@
+# Use an official Python runtime as a parent image
+FROM python:3.11-slim
+
+# Set the working directory
+WORKDIR /main
+
+# Copy the current directory contents into the container
+COPY . /main
+
+# Install the dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Expose Streamlit default port
+EXPOSE 8501
+
+# Correct command to run Streamlit app
+CMD ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
@@ -0,0 +1,221 @@
+# 🧠 AutoDocThinker: Intelligent Search Engine with Reasoning + Tool Usage Logic
+
+## 📌 Overview
+
+This project is an **AI-powered Question Answering Web App** that allows users to upload documents (PDF, DOCX, HTML) or provide URLs, and ask questions about the content. The system uses a **Retrieval-Augmented Generation (RAG)** architecture combined with **LangGraph** for structured reasoning, **Gemini LLM** for responses, and **ChromaDB** for vector storage.
+
+✅ File & URL ingestion
+✅ Chunking with Recursive Splitter
+✅ SentenceTransformer Embeddings
+✅ ChromaDB Vector Store
+✅ LangGraph Reasoning Pipeline
+✅ DuckDuckGo External Tool
+✅ Streamlit Interface
+✅ Logger Integration
+✅ Short-term Memory
+✅ 🐳 Docker + CI/CD ready
+
+---
+
+## 🧱 Project Structure
+
+```
+AutoDocThinker/
+├── main.py                        # Main Streamlit UI
+├── data/
+│   └── Invoice.pdf                # Sample document
+├── ingestion/
+│   └── loader.py                  # Handles document & URL ingestion
+├── processing/
+│   ├── chunker.py                 # Chunk documents
+│   └── embeddings.py              # Load HuggingFace embeddings
+├── vectorstore/
+│   └── chroma_store.py            # ChromaDB initialization
+├── reasoning/
+│   └── langgraph_chain.py         # LangGraph-based RAG pipeline
+├── utils/
+│   └── memory.py                  # Conversation memory
+├── tests/
+│   └──test_app.py                 # Conversation memory
+|── logger.py                      # Logger configuration
+├── config.py                      # Configuration for secrets and constants
+├── setup.py                       # Package setup
+├── gitignore                      # Git ignore file
+├── Dockerfile                     # Docker image
+├── .github/
+│   └── workflows/
+│       └── ci.yml                 # GitHub Actions CI/CD pipeline
+├── requirements.txt               # Python dependencies
+├── README.md                      # Project documentation
+```
+
+---
+
+## ⚙️ Features
+
+| Feature                       | Description                                      |
+| ----------------------------- | ------------------------------------------------ |
+| 📄 Multi-format ingestion     | Supports PDF, DOCX, HTML, and URLs               |
+| 🔄 Modular design             | Organized into reusable components               |
+| 🧩 LangGraph planner-executor | Custom planner & executor with LLM               |
+| 🧠 Memory                     | Maintains short-term conversational context      |
+| 🌐 DuckDuckGo tool            | External real-time info search                   |
+| 📦 ChromaDB                   | Embedded document storage & retrieval            |
+| 🖼️ Streamlit UI              | Elegant and interactive web interface            |
+| 🐳 Docker                     | Containerized for easy deployment                |
+| ✅ CI/CD                       | GitHub Actions pipeline for linting and testing  |
+
+---
+
+## 📥 Installation
+
+```bash
+# 1. Clone the repository
+git clone https://github.com/Md-Emon-Hasan/AutoDocThinker.git
+cd AutoDocThinker
+
+# 2. Install dependencies
+pip install -r requirements.txt
+```
+
+Or with Docker:
+
+```bash
+# Build Docker Image
+docker build -t auto-doc-thinker .
+
+# Run the container
+docker run -p 8501:8501 auto-doc-thinker
+```
+
+---
+
+## 🔑 Configuration
+
+Edit `config.py`:
+
+```python
+GOOGLE_API_KEY = "your_google_gemini_api_key"
+CHROMA_DB_DIR = "./chroma_db"
+EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"
+CHUNK_SIZE = 500
+CHUNK_OVERLAP = 100
+TOP_K = 5
+```
+
+---
+
+## 🧠 How It Works
+
+1. **Ingestion**
+
+   * Accepts file upload or URL.
+   * Loads content via proper loader.
+
+2. **Chunking**
+
+   * Breaks documents using recursive splitting.
+
+3. **Embeddings + Vector Store**
+
+   * Converts chunks into embeddings via SentenceTransformers.
+   * Stores in ChromaDB.
+
+4. **LangGraph Reasoning**
+
+   * Uses `planner → executor` structure.
+   * Planner routes user query to executor.
+   * Executor uses retriever to fetch documents, then Gemini LLM to generate response.
+
+5. **External Tools**
+
+   * If RAG fails or needs additional info, falls back to DuckDuckGo tool.
+
+6. **Conversation Memory**
+
+   * Short-term memory for context in multi-turn dialogue.
+
+---
+
+## 🐳 Docker Setup
+
+**Dockerfile:**
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+COPY . .
+
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt
+
+EXPOSE 8501
+
+CMD ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
+```
+
+---
+
+## 🔁 GitHub Actions CI/CD
+
+**.github/workflows/main.yml**
+
+```yaml
+name: CI
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  build-and-test:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: |
+          pip install -r requirements.txt
+      - name: Lint with flake8
+        run: |
+          pip install flake8
+          flake8 .
+```
+
+---
+
+## 📝 Future Enhancements
+
+* ✅ Multilingual document ingestion
+* ✅ Audio document ingestion + whisper
+* ⏳ Long-term memory + history viewer
+* ⏳ MongoDB/FAISS alternative for Chroma
+* ✅ More tools (WolframAlpha, SerpAPI)
+* ⏳ Model selection dropdown (Gemini, LLaMA, GPT-4)
+
+---
+
+## 👨‍💻 Author
+
+**Md Emon Hasan**
+📧 [iconicemon01@gmail.com](mailto:iconicemon01@gmail.com)
+🔗 [LinkedIn](https://www.linkedin.com/in/md-emon-hasan)
+🔗 [GitHub](https://github.com/Md-Emon-Hasan)
+🔗 [Facebook](https://www.facebook.com/mdemon.hasan2001/)
+🔗 [WhatsApp](https://wa.me/8801834363533)
+
+---
+
+## 📄 License
+
+MIT License — Free to use, share, and contribute.
+
+---
@@ -0,0 +1,46 @@
+from ingestion.loader import load_documents
+from processing.chunker import chunk_documents
+from processing.embeddings import get_embedding_model
+from vectorstore.chroma_store import build_vector_store, get_retriever
+from reasoning.langgraph_chain import create_langgraph
+from reasoning.tools import get_tools
+from utils.memory import get_memory
+from config import GOOGLE_API_KEY
+from logger import setup_logger
+
+import sys
+sys.stdout.reconfigure(encoding='utf-8')
+
+from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain.agents import initialize_agent
+from langchain.agents.agent_types import AgentType
+
+logger = setup_logger(__name__)
+
+def main():
+    logger.info("Starting Document Search AI System")
+    llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", google_api_key="")
+    
+    docs = load_documents("data/Invoice.pdf")
+    chunks = chunk_documents(docs)
+    embeddings = get_embedding_model()
+    vectordb = build_vector_store(chunks, embeddings)
+    retriever = get_retriever(vectordb)
+    
+    buffer_memory, _ = get_memory(llm)
+    tools = get_tools()
+    
+    agent_executor = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, memory=buffer_memory)
+    graph = create_langgraph(llm)
+
+    question = "How to get jobs in Generative AI?"
+    state = {"question": question, "retriever": retriever}
+    
+    logger.info("Running reasoning chain via LangGraph")
+    result = graph.invoke(state)
+    print("\n📌 Answer:", result["answer"])
+
+    print("\n🌐 Tool Result:", agent_executor.run("Find latest GenAI job postings."))
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,7 @@
+import os
+
+PERSIST_DIR = "./chroma_db"
+EMBEDDING_MODEL = "all-MiniLM-L6-v2"
+CHUNK_SIZE = 500
+CHUNK_OVERLAP = 100
+GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")