Skip to content

A Retrieval Augmented Generation (RAG) system that transforms your Outline knowledge base into an intelligent, queryable AI assistant

Notifications You must be signed in to change notification settings

mateodevia/agentic-rag-outline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Outline RAG

A Retrieval Augmented Generation (RAG) system that transforms your Outline knowledge base into an intelligent, queryable AI assistant. This project ingests documentation from Outline, processes it using advanced chunking techniques, and provides a conversational interface powered by large language models.

⚠️ Disclaimer: This is not a production-ready project, read the Project Reflection & Recommendations before using this project in a real-world scenario.

🎯 What This Project Does

This system creates a bridge between your Outline documentation and AI, enabling you to:

  • Query your knowledge base using natural language questions
  • Get contextual answers based on your actual documentation content
  • Access your knowledge through multiple interfaces (chat, MCP server)
  • Maintain up-to-date embeddings of your documentation

Key Features

  • πŸ”„ Ingestion: Fetches and processes documents from Outline API
  • 🧠 Intelligent Chunking: Uses agentic chunking to create semantically meaningful document segments
  • πŸ” Vector Search: Leverages PostgreSQL with pgvector for fast similarity search
  • πŸ€– AI-Powered Responses: Uses LLM models for generation and embeddings
  • πŸ”Œ MCP Integration: Exposes functionality through Model Context Protocol for external clients
  • πŸ’¬ Interactive Chat: Exposes a cli-base chat interface for querying your RAG locally

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Outline API   │───▢│   Ingestion      │───▢│  Vector Store   β”‚
β”‚  (Documents)    β”‚    β”‚ (Agentic Chunker)β”‚    β”‚(PostgreSQL +    β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚  pgvector)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
                                                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chat Interface │◄───│   RAG Workflow   │◄───│   Retriever     β”‚
β”‚  or MCP Client  β”‚    β”‚ (LangGraph +     β”‚    β”‚                 β”‚
β”‚                 β”‚    β”‚  LangChain)      β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Node.js 18+ and npm
  • PostgreSQL database with pgvector extension
  • Outline instance with API access
  • API keys for Groq and Hugging Face

1. Installation

git git clone https://github.com/mateodevia/agentic-rag-outline.git
cd agentic-rag-outline
npm install

2. Environment Setup

Create a .env file with the following variables:

# Database Configuration
PG_CONNECTION_STRING=postgresql://user:password@localhost:5432/your_db

# Outline API Configuration
OUTLINE_URL=https://your-outline-instance.com
OUTLINE_API_KEY=your_outline_api_key

# AI Model Configuration
GROQ_API_KEY=your_groq_api_key
HUGGINGFACEHUB_API_KEY=your_huggingface_api_key

# RAG Configuration (New!)
LANGUAGE=english
COMPANY_CONTEXT=Your Company Name and relevant context for better responses

# Optional: LangSmith Tracing
LANGSMITH_TRACING=false
LANGSMITH_API_KEY=your_langsmith_api_key

3. Database Setup

Ensure your PostgreSQL database has the pgvector extension installed:

CREATE EXTENSION IF NOT EXISTS vector;

4. Ingest Your Data

# Development mode
npm run dev:ingest

# Production mode
npm run start:ingest

5. Start Querying

Option A: Chat Interface

# Development mode
npm run dev:chat

# Production mode
npm run start:chat
╔════════════════════════════════════════╗
β•‘          AI Assistant Terminal         β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Welcome to the interactive chat interface!
Type your questions or use the following commands:
/context - Toggle context visibility
/exit    - Quit the application
/help    - Show this help message

───────────────────────────────────────────

You: How do I set up API authentication in Outline?

You: How do I set up API authentication in Outline?
Assistant: To set up API authentication for your Outline instance, you need to configure the following environment variables in your .env file:

1. OUTLINE_URL - Your Outline instance URL (e.g., https://your-outline-instance.com)
2. OUTLINE_API_KEY - Your Outline API key

You can obtain your API key from your Outline settings under the API section. Make sure the API key has the necessary permissions to read documents and collections.

───────────────────────────────────────────

You: /context

Context visibility is now ON

You: What are the main features?

You: What are the main features?
Assistant: Based on your documentation, the main features include:
- Intelligent document ingestion from Outline
- Advanced agentic chunking for better context
- Vector search using PostgreSQL + pgvector
- AI-powered responses using LLM models
- Multiple interfaces (chat and MCP server)

Context: [Retrieved documents would appear here in magenta when context is enabled]

───────────────────────────────────────────

You: 

Option B: MCP Server (for integration with MCP clients like Claude Desktop or Cursor)

# Development mode
npm run dev:mcp

# Production mode
npm run start:mcp

To get more information on how to setup your MCP server check the MCP-README.md file

πŸ”„ How It Works

1. Data Ingestion (src/rag/ingestion.ts)

  • Fetches documents from your Outline instance using the API
  • Enriches documents with semantic context (parent documents, collections)
  • Intelligent Chunking (src/rag/agentic-chunker.ts)
    • Uses an agentic approach to create semantically coherent chunks
    • Maintains context and relationships between document sections
    • Optimizes chunk size for embedding and retrieval performance
  • Generates embeddings and stores them in PostgreSQL with pgvector

2. RAG Workflow (src/rag/rag-workflow.ts)

  • Retrieval: Performs similarity search to find relevant documents
  • Generation: Uses retrieved context to generate answers with LLM
  • Built with LangGraph for robust pipeline management

3. Multiple Interfaces

  • Chat Interface: Direct conversation with your knowledge base
  • MCP Server: Standardized protocol for integration with AI assistants

πŸ› οΈ Technology Stack

  • TypeScript/Node.js: Core runtime and type safety
  • Lang Chain & LangGraph: RAG workflow orchestration and LLM management
  • PostgreSQL + pgvector: Vector database for embeddings
  • Model Context Protocol: Standardized AI tool integration
  • Multiple LLMs: Easily interchangeble LLMs for each process
    • Groq: Used for chunking and RAG Queriying
    • Hugging Face: Used for vector embeding

πŸ“ Project Structure

src/
β”œβ”€β”€ database/                    # Database connection and utilities
β”‚   β”œβ”€β”€ database.ts             # PostgreSQL + pgvector setup
β”‚   └── singleton-connection.ts # Connection management
β”œβ”€β”€ interfaces/                  # User interfaces
β”‚   β”œβ”€β”€ chat.ts                 # Interactive CLI chat interface
β”‚   └── mcp-server.ts           # Model Context Protocol server
β”œβ”€β”€ outline-api/                 # Outline API integration
β”‚   β”œβ”€β”€ collection-service.ts   # Collection operations
β”‚   β”œβ”€β”€ document-service.ts     # Document operations & simplification
β”‚   └── types.ts                # API response types
└── rag/                        # RAG pipeline components
    β”œβ”€β”€ ingestion.ts            # Document ingestion workflow
    β”œβ”€β”€ rag-workflow.ts         # Enhanced RAG pipeline with full document context
    β”œβ”€β”€ rag-prompt.ts           # Custom RAG prompt templates (New!)
    β”œβ”€β”€ agentic-chunker.ts      # Intelligent document chunking
    β”œβ”€β”€ agentic-chunker-prompt.ts # Chunking prompt templates
    β”œβ”€β”€ document-retriever.ts   # Document retrieval utilities (New!)
    β”œβ”€β”€ retriever.ts            # Vector search configuration
    β”œβ”€β”€ llm-config.ts           # LLM model configurations
    └── types.ts                # RAG-specific type definitions

πŸ“ Project Reflection & Recommendations

This project was developed as an educational exploration of advanced RAG (Retrieval Augmented Generation) architectures, focusing on learning and experimentation rather than production deployment.

Key Learnings

Through this implementation, we discovered that simpler approaches often yield better results with significantly less complexity. For production use cases requiring Outline integration, we recommend evaluating simpler alternatives such as this MCP Outline implementation, which offers:

  • Reduced complexity: Easier to understand, implement, and maintain
  • Faster setup: Minimal configuration requirements
  • Lower barrier to entry: Less technical overhead for teams, and lower deployment costs

Recommendation

Before implementing this solution, we encourage you to evaluate simpler alternatives that may better suit your specific use case and technical requirements. This project serves as a valuable learning resource for understanding advanced RAG patterns, but most be over-engineered for many practical applications.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

About

A Retrieval Augmented Generation (RAG) system that transforms your Outline knowledge base into an intelligent, queryable AI assistant

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published