|
1 | 1 | # mimir-rag |
2 | 2 |
|
3 | | -Utility CLI + API that ingests docs into Supabase and exposes OpenAI-compatible chat completions, MCP endpoints, and ingestion endpoints. |
| 3 | +Utility CLI + API that ingests **documentation (MDX) and TypeScript codebases** into Supabase using **contextual RAG** and exposes OpenAI-compatible chat completions, MCP endpoints, and ingestion endpoints. Perfect for making your entire codebase and documentation queryable by AI assistants with rich contextual understanding. |
4 | 4 |
|
5 | 5 | ## Quick Start |
6 | 6 |
|
@@ -138,20 +138,86 @@ All configuration is managed through environment variables in the `.env` file. S |
138 | 138 | Key configuration variables include: |
139 | 139 |
|
140 | 140 | - **Server**: `MIMIR_SERVER_API_KEY` (required), `MIMIR_SERVER_GITHUB_WEBHOOK_SECRET`, `MIMIR_SERVER_FALLBACK_INGEST_INTERVAL_MINUTES` |
141 | | -- **Supabase**: `MIMIR_SUPABASE_URL` (required), `MIMIR_SUPABASE_SERVICE_ROLE_KEY` (required), `MIMIR_SUPABASE_TABLE` |
142 | | -- **GitHub**: `MIMIR_GITHUB_URL`, `MIMIR_GITHUB_TOKEN`, `MIMIR_GITHUB_DIRECTORY`, `MIMIR_GITHUB_BRANCH` |
| 141 | +- **Supabase**: `MIMIR_SUPABASE_URL` (required), `MIMIR_SUPABASE_SERVICE_ROLE_KEY` (required), `MIMIR_SUPABASE_TABLE` (optional, default: "docs") |
| 142 | +- **GitHub**: |
| 143 | + - `MIMIR_GITHUB_URL` - Main repository URL (fallback if separate repos not set) |
| 144 | + - `MIMIR_GITHUB_CODE_URL` - Separate repository for TypeScript code (optional) |
| 145 | + - `MIMIR_GITHUB_DOCS_URL` - Separate repository for MDX documentation (optional) |
| 146 | + - `MIMIR_GITHUB_TOKEN`, `MIMIR_GITHUB_DIRECTORY`, `MIMIR_GITHUB_BRANCH` |
| 147 | + - `MIMIR_GITHUB_CODE_DIRECTORY`, `MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES` - Code repo specific settings |
| 148 | + - `MIMIR_GITHUB_DOCS_DIRECTORY`, `MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES` - Docs repo specific settings |
| 149 | +- **Parser**: |
| 150 | + - `MIMIR_EXTRACT_VARIABLES` - Extract top-level variables (default: false) |
| 151 | + - `MIMIR_EXTRACT_METHODS` - Extract class methods (default: true) |
| 152 | + - `MIMIR_EXCLUDE_PATTERNS` - Comma-separated patterns to exclude (e.g., "*.test.ts,test/,__tests__/") |
143 | 153 | - **LLM Embedding**: `MIMIR_LLM_EMBEDDING_PROVIDER`, `MIMIR_LLM_EMBEDDING_MODEL`, `MIMIR_LLM_EMBEDDING_API_KEY` |
144 | 154 | - **LLM Chat**: `MIMIR_LLM_CHAT_PROVIDER`, `MIMIR_LLM_CHAT_MODEL`, `MIMIR_LLM_CHAT_API_KEY`, `MIMIR_LLM_CHAT_TEMPERATURE` |
| 155 | +- **Documentation**: `MIMIR_DOCS_BASE_URL`, `MIMIR_DOCS_CONTENT_PATH` - For generating docs URLs |
145 | 156 |
|
146 | 157 | ### LLM Providers |
147 | 158 |
|
148 | 159 | `MIMIR_LLM_EMBEDDING_PROVIDER` supports `openai`, `google`, and `mistral`. The chat provider (`MIMIR_LLM_CHAT_PROVIDER`) can be set independently to `openai`, `google`, `anthropic`, or `mistral`, letting you mix providers (e.g., OpenAI embeddings with Mistral chat completions). Provide the appropriate API key/endpoint per provider. Anthropic currently lacks an embeddings API, so embeddings still need to come from OpenAI, Google, or Mistral. |
149 | 160 |
|
| 161 | +### Separate Code and Documentation Repositories |
| 162 | + |
| 163 | +You can configure separate repositories for TypeScript code and MDX documentation: |
| 164 | + |
| 165 | +```bash |
| 166 | +# Main repository (fallback) |
| 167 | +MIMIR_GITHUB_URL=https://github.com/user/main-repo |
| 168 | + |
| 169 | +# Separate code repository |
| 170 | +MIMIR_GITHUB_CODE_URL=https://github.com/user/code-repo |
| 171 | +MIMIR_GITHUB_CODE_DIRECTORY=src |
| 172 | +MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES=src,lib |
| 173 | + |
| 174 | +# Separate documentation repository |
| 175 | +MIMIR_GITHUB_DOCS_URL=https://github.com/user/docs-repo |
| 176 | +MIMIR_GITHUB_DOCS_DIRECTORY=docs |
| 177 | +MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guides |
| 178 | +``` |
| 179 | + |
| 180 | +When configured, TypeScript files will be ingested from the code repository and MDX files from the docs repository. Source URLs for TypeScript files will automatically use the code repository URL. |
| 181 | + |
| 182 | +### Parser Configuration |
| 183 | + |
| 184 | +Control what gets extracted from your codebase: |
| 185 | + |
| 186 | +- **`MIMIR_EXTRACT_VARIABLES`** (default: `false`): Extract top-level variable declarations. Note: Exported `const` functions are always extracted regardless of this setting. |
| 187 | +- **`MIMIR_EXTRACT_METHODS`** (default: `true`): Extract class methods as separate entities. |
| 188 | +- **`MIMIR_EXCLUDE_PATTERNS`**: Comma-separated list of patterns to exclude: |
| 189 | + - File patterns: `*.test.ts`, `*.spec.ts` |
| 190 | + - Directory patterns: `test/`, `__tests__/`, `tests/` |
| 191 | + |
| 192 | + Example: `MIMIR_EXCLUDE_PATTERNS=*.test.ts,*.spec.ts,test/,__tests__/,tests/` |
| 193 | + |
| 194 | +### TypeScript Entity Extraction |
| 195 | + |
| 196 | +mimir-rag automatically extracts and indexes TypeScript entities from your codebase: |
| 197 | + |
| 198 | +- **Functions**: `export function myFunction() {}` |
| 199 | +- **Exported Const Functions**: `export const myFunction = () => {}` (always extracted) |
| 200 | +- **Classes**: `export class MyClass {}` |
| 201 | +- **Interfaces**: `export interface MyInterface {}` |
| 202 | +- **Types**: `export type MyType = ...` |
| 203 | +- **Enums**: `export enum MyEnum {}` |
| 204 | +- **Methods**: Class methods (if `MIMIR_EXTRACT_METHODS=true`) |
| 205 | + |
| 206 | +Each entity is stored as a separate chunk with **rich contextual information**: |
| 207 | +- Full code snippet |
| 208 | +- **Contextual RAG**: Surrounding file content, imports, and parent class context |
| 209 | +- JSDoc comments (if present) |
| 210 | +- Parameters and return types |
| 211 | +- Line numbers for source linking |
| 212 | +- GitHub URL for direct code access |
| 213 | + |
| 214 | +This contextual RAG approach allows the AI to understand not just the entity itself, but also how it fits into the larger codebase - what it imports, what it's part of, and how it's used. This enables more accurate and contextually-aware answers with direct links to source code. |
| 215 | + |
150 | 216 | ## API Endpoints |
151 | 217 |
|
152 | 218 | ### POST /v1/chat/completions |
153 | 219 |
|
154 | | -OpenAI-compatible chat completions endpoint that queries your documentation with RAG. Requires API key authentication. |
| 220 | +OpenAI-compatible chat completions endpoint that queries your documentation and codebase using contextual RAG. Requires API key authentication. |
155 | 221 |
|
156 | 222 | **Headers:** |
157 | 223 | - `x-api-key: <MIMIR_SERVER_API_KEY>` or `Authorization: Bearer <MIMIR_SERVER_API_KEY>` |
@@ -209,7 +275,7 @@ Semantic search endpoint via MCP (Model Context Protocol) that returns matching |
209 | 275 | } |
210 | 276 | ``` |
211 | 277 |
|
212 | | -**Note:** This endpoint performs semantic search using OpenAI embeddings and returns document chunks with their full content. The calling AI assistant can then synthesize answers from the retrieved content, avoiding additional LLM API calls on the server side. |
| 278 | +**Note:** This endpoint performs contextual RAG - semantic search using OpenAI embeddings that returns document chunks with their full content and surrounding context. The calling AI assistant can then synthesize answers from the retrieved content, avoiding additional LLM API calls on the server side. |
213 | 279 |
|
214 | 280 | ### POST /ingest |
215 | 281 |
|
|
0 commit comments