Skip to content

Commit 2c59459

Browse files
committed
update README with complete project features and architecture
1 parent 88ede2a commit 2c59459

File tree

2 files changed

+89
-15
lines changed

2 files changed

+89
-15
lines changed

README.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
11
# Mimir
22

3-
A comprehensive documentation RAG (Retrieval Augmented Generation) system with MCP (Model Context Protocol) integration. Mimir ingests documentation from GitHub repositories into a Supabase vector store and provides powerful querying capabilities through both REST API and MCP protocol.
3+
A comprehensive **contextual RAG** (Retrieval Augmented Generation) system with MCP (Model Context Protocol) integration for both **documentation and TypeScript codebases**. Mimir ingests documentation and TypeScript code from GitHub repositories into a Supabase vector store and provides powerful querying capabilities through both REST API and MCP protocol. Unlike basic RAG, contextual RAG provides rich context around each code entity, including full file content, imports, and surrounding code.
44

55
## Projects
66

77
This repository contains two main components:
88

99
### [mimir-rag](./mimir-rag)
1010

11-
The core RAG server that handles documentation ingestion and querying.
11+
The core RAG server that handles ingestion and querying of both **documentation (MDX)** and **TypeScript codebases**.
1212

1313
**Features:**
14-
- Ingests documentation from GitHub repositories into Supabase vector store
14+
- Ingests documentation and TypeScript code from GitHub repositories into Supabase vector store
15+
- Supports separate repositories for code and documentation
16+
- Automatically extracts TypeScript entities (functions, classes, interfaces, exported const functions)
1517
- Supports multiple LLM providers (OpenAI, Anthropic, Google, Mistral)
1618
- OpenAI-compatible chat completions endpoint (`/v1/chat/completions`)
1719
- MCP endpoint for semantic document search (`/mcp/ask`)
18-
- GitHub webhook integration for automatic re-ingestion
20+
- GitHub webhook integration for automatic ingestion on new code/MDX updates in repository
1921
- Streaming responses support (OpenAI-compatible and custom SSE)
2022
- Configurable chunking and embedding strategies
23+
- Flexible parser configuration (exclude test files, control entity extraction)
2124

2225
**Quick Start:**
2326
```bash
@@ -87,18 +90,20 @@ See the [mimir-mcp README](./mimir-mcp/README.md) for detailed setup instruction
8790
┌─────────────────────┐ ┌─────────────────────┐
8891
│ mimir-mcp │◄──────┤ AI Assistant │
8992
│ (MCP Server) │ │ - Claude Code │
90-
└─────────────────────┘ │ - Cline
93+
└─────────────────────┘ │ - VSCode
9194
│ - Claude Desktop │
9295
└─────────────────────┘
9396
```
9497

9598
## Workflow
9699

97100
1. **Ingestion Phase:**
98-
- mimir-rag fetches documentation from configured GitHub repository
99-
- Documents are chunked into smaller segments
101+
- mimir-rag fetches documentation (MDX) and TypeScript code from configured GitHub repository(ies)
102+
- TypeScript files are parsed to extract entities (functions, classes, interfaces, exported const functions)
103+
- **Contextual RAG**: Each entity is enriched with surrounding context - full file content, imports, parent classes, and related code
104+
- Documents are chunked into smaller segments with rich contextual information
100105
- Chunks are embedded using your chosen LLM provider
101-
- Embeddings are stored in Supabase vector database
106+
- Embeddings are stored in Supabase vector database with source URLs pointing to GitHub
102107

103108
2. **Query Phase (via MCP):**
104109
- User asks a question in their AI assistant
@@ -116,18 +121,21 @@ See the [mimir-mcp README](./mimir-mcp/README.md) for detailed setup instruction
116121

117122
## Use Cases
118123

124+
- **AI-Powered Code Assistant**: Let your AI coding assistant query your TypeScript codebase in real-time - find functions, classes, and understand code structure
119125
- **AI-Powered Documentation Assistant**: Let your AI coding assistant query your docs in real-time
126+
- **Codebase Understanding**: Index your entire TypeScript project - functions, classes, interfaces, and exported const functions
120127
- **Internal Knowledge Base**: Index internal wikis, API docs, or technical documentation
121128
- **Customer Support**: Provide accurate, context-aware answers from your documentation
122-
- **Developer Onboarding**: Help new developers quickly find information in your codebase docs
129+
- **Developer Onboarding**: Help new developers quickly find information in your codebase and documentation
123130
- **API Documentation**: Make API documentation instantly queryable
131+
- **Code Reference**: Ask questions about your codebase and get answers with direct links to GitHub source code
124132

125133
## Requirements
126134

127135
- **Node.js**: 20 or later
128136
- **Supabase**: Vector store for embeddings and document storage
129137
- **LLM Provider**: API key for OpenAI, Anthropic, Google, or Mistral
130-
- **GitHub**: Repository with documentation to ingest (optional)
138+
- **GitHub**: Repository with documentation (MDX) and/or TypeScript code to ingest (optional)
131139

132140
## Getting Started
133141

mimir-rag/README.md

Lines changed: 71 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# mimir-rag
22

3-
Utility CLI + API that ingests docs into Supabase and exposes OpenAI-compatible chat completions, MCP endpoints, and ingestion endpoints.
3+
Utility CLI + API that ingests **documentation (MDX) and TypeScript codebases** into Supabase using **contextual RAG** and exposes OpenAI-compatible chat completions, MCP endpoints, and ingestion endpoints. Perfect for making your entire codebase and documentation queryable by AI assistants with rich contextual understanding.
44

55
## Quick Start
66

@@ -138,20 +138,86 @@ All configuration is managed through environment variables in the `.env` file. S
138138
Key configuration variables include:
139139

140140
- **Server**: `MIMIR_SERVER_API_KEY` (required), `MIMIR_SERVER_GITHUB_WEBHOOK_SECRET`, `MIMIR_SERVER_FALLBACK_INGEST_INTERVAL_MINUTES`
141-
- **Supabase**: `MIMIR_SUPABASE_URL` (required), `MIMIR_SUPABASE_SERVICE_ROLE_KEY` (required), `MIMIR_SUPABASE_TABLE`
142-
- **GitHub**: `MIMIR_GITHUB_URL`, `MIMIR_GITHUB_TOKEN`, `MIMIR_GITHUB_DIRECTORY`, `MIMIR_GITHUB_BRANCH`
141+
- **Supabase**: `MIMIR_SUPABASE_URL` (required), `MIMIR_SUPABASE_SERVICE_ROLE_KEY` (required), `MIMIR_SUPABASE_TABLE` (optional, default: "docs")
142+
- **GitHub**:
143+
- `MIMIR_GITHUB_URL` - Main repository URL (fallback if separate repos not set)
144+
- `MIMIR_GITHUB_CODE_URL` - Separate repository for TypeScript code (optional)
145+
- `MIMIR_GITHUB_DOCS_URL` - Separate repository for MDX documentation (optional)
146+
- `MIMIR_GITHUB_TOKEN`, `MIMIR_GITHUB_DIRECTORY`, `MIMIR_GITHUB_BRANCH`
147+
- `MIMIR_GITHUB_CODE_DIRECTORY`, `MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES` - Code repo specific settings
148+
- `MIMIR_GITHUB_DOCS_DIRECTORY`, `MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES` - Docs repo specific settings
149+
- **Parser**:
150+
- `MIMIR_EXTRACT_VARIABLES` - Extract top-level variables (default: false)
151+
- `MIMIR_EXTRACT_METHODS` - Extract class methods (default: true)
152+
- `MIMIR_EXCLUDE_PATTERNS` - Comma-separated patterns to exclude (e.g., "*.test.ts,test/,__tests__/")
143153
- **LLM Embedding**: `MIMIR_LLM_EMBEDDING_PROVIDER`, `MIMIR_LLM_EMBEDDING_MODEL`, `MIMIR_LLM_EMBEDDING_API_KEY`
144154
- **LLM Chat**: `MIMIR_LLM_CHAT_PROVIDER`, `MIMIR_LLM_CHAT_MODEL`, `MIMIR_LLM_CHAT_API_KEY`, `MIMIR_LLM_CHAT_TEMPERATURE`
155+
- **Documentation**: `MIMIR_DOCS_BASE_URL`, `MIMIR_DOCS_CONTENT_PATH` - For generating docs URLs
145156

146157
### LLM Providers
147158

148159
`MIMIR_LLM_EMBEDDING_PROVIDER` supports `openai`, `google`, and `mistral`. The chat provider (`MIMIR_LLM_CHAT_PROVIDER`) can be set independently to `openai`, `google`, `anthropic`, or `mistral`, letting you mix providers (e.g., OpenAI embeddings with Mistral chat completions). Provide the appropriate API key/endpoint per provider. Anthropic currently lacks an embeddings API, so embeddings still need to come from OpenAI, Google, or Mistral.
149160

161+
### Separate Code and Documentation Repositories
162+
163+
You can configure separate repositories for TypeScript code and MDX documentation:
164+
165+
```bash
166+
# Main repository (fallback)
167+
MIMIR_GITHUB_URL=https://github.com/user/main-repo
168+
169+
# Separate code repository
170+
MIMIR_GITHUB_CODE_URL=https://github.com/user/code-repo
171+
MIMIR_GITHUB_CODE_DIRECTORY=src
172+
MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES=src,lib
173+
174+
# Separate documentation repository
175+
MIMIR_GITHUB_DOCS_URL=https://github.com/user/docs-repo
176+
MIMIR_GITHUB_DOCS_DIRECTORY=docs
177+
MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guides
178+
```
179+
180+
When configured, TypeScript files will be ingested from the code repository and MDX files from the docs repository. Source URLs for TypeScript files will automatically use the code repository URL.
181+
182+
### Parser Configuration
183+
184+
Control what gets extracted from your codebase:
185+
186+
- **`MIMIR_EXTRACT_VARIABLES`** (default: `false`): Extract top-level variable declarations. Note: Exported `const` functions are always extracted regardless of this setting.
187+
- **`MIMIR_EXTRACT_METHODS`** (default: `true`): Extract class methods as separate entities.
188+
- **`MIMIR_EXCLUDE_PATTERNS`**: Comma-separated list of patterns to exclude:
189+
- File patterns: `*.test.ts`, `*.spec.ts`
190+
- Directory patterns: `test/`, `__tests__/`, `tests/`
191+
192+
Example: `MIMIR_EXCLUDE_PATTERNS=*.test.ts,*.spec.ts,test/,__tests__/,tests/`
193+
194+
### TypeScript Entity Extraction
195+
196+
mimir-rag automatically extracts and indexes TypeScript entities from your codebase:
197+
198+
- **Functions**: `export function myFunction() {}`
199+
- **Exported Const Functions**: `export const myFunction = () => {}` (always extracted)
200+
- **Classes**: `export class MyClass {}`
201+
- **Interfaces**: `export interface MyInterface {}`
202+
- **Types**: `export type MyType = ...`
203+
- **Enums**: `export enum MyEnum {}`
204+
- **Methods**: Class methods (if `MIMIR_EXTRACT_METHODS=true`)
205+
206+
Each entity is stored as a separate chunk with **rich contextual information**:
207+
- Full code snippet
208+
- **Contextual RAG**: Surrounding file content, imports, and parent class context
209+
- JSDoc comments (if present)
210+
- Parameters and return types
211+
- Line numbers for source linking
212+
- GitHub URL for direct code access
213+
214+
This contextual RAG approach allows the AI to understand not just the entity itself, but also how it fits into the larger codebase - what it imports, what it's part of, and how it's used. This enables more accurate and contextually-aware answers with direct links to source code.
215+
150216
## API Endpoints
151217

152218
### POST /v1/chat/completions
153219

154-
OpenAI-compatible chat completions endpoint that queries your documentation with RAG. Requires API key authentication.
220+
OpenAI-compatible chat completions endpoint that queries your documentation and codebase using contextual RAG. Requires API key authentication.
155221

156222
**Headers:**
157223
- `x-api-key: <MIMIR_SERVER_API_KEY>` or `Authorization: Bearer <MIMIR_SERVER_API_KEY>`
@@ -209,7 +275,7 @@ Semantic search endpoint via MCP (Model Context Protocol) that returns matching
209275
}
210276
```
211277

212-
**Note:** This endpoint performs semantic search using OpenAI embeddings and returns document chunks with their full content. The calling AI assistant can then synthesize answers from the retrieved content, avoiding additional LLM API calls on the server side.
278+
**Note:** This endpoint performs contextual RAG - semantic search using OpenAI embeddings that returns document chunks with their full content and surrounding context. The calling AI assistant can then synthesize answers from the retrieved content, avoiding additional LLM API calls on the server side.
213279

214280
### POST /ingest
215281

0 commit comments

Comments
 (0)