|
| 1 | +# OpenDeepWiki Development Guide for AI Agents |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +**OpenDeepWiki** is an AI-driven code knowledge base system built on **.NET 9** and **Semantic Kernel**. It analyzes code repositories, generates documentation, creates directory structures, and supports MCP (Model Context Protocol) for AI integration. |
| 6 | + |
| 7 | +### Core Purpose |
| 8 | +- Convert GitHub/GitLab/Gitee repositories into searchable knowledge bases |
| 9 | +- Auto-generate documentation, READMEs, and code analysis via LLM |
| 10 | +- Support multiple AI providers (OpenAI, AzureOpenAI, Anthropic) |
| 11 | +- Provide MCP endpoints for AI agents to query repository knowledge |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Architecture |
| 16 | + |
| 17 | +### Full-Stack Structure |
| 18 | +``` |
| 19 | +Backend: .NET 9 ASP.NET Core + Entity Framework Core + Semantic Kernel |
| 20 | +Frontend: React 19 + TypeScript + Vite + TailwindCSS + Shadcn/ui |
| 21 | +Database: SQLite/PostgreSQL/MySQL/SQL Server (configurable) |
| 22 | +Deployment: Docker Compose or Sealos |
| 23 | +``` |
| 24 | + |
| 25 | +### Backend Layer Breakdown |
| 26 | + |
| 27 | +**`src/KoalaWiki/`** - Main ASP.NET Core application |
| 28 | +- **`BackendService/`** - Background task orchestration (warehouse sync, document processing) |
| 29 | +- **`KoalaWarehouse/`** - Core document analysis engine: |
| 30 | + - **`Pipeline/`** - Resilient document processing pipeline with 5 ordered steps |
| 31 | + - **`GenerateThinkCatalogue/`** - AI-powered directory structure generation |
| 32 | + - **`DocumentPending/`** - Incomplete document task handling |
| 33 | + - **`MiniMapService.cs`** - Generates knowledge graphs via Mermaid |
| 34 | + |
| 35 | +**`KoalaWiki.Core/`** - Data access layer |
| 36 | +- **`DataAccess/IKoalaWikiContext.cs`** - DbSet definitions for 18+ entity types |
| 37 | +- **`ServiceExtensions.cs`** - DI registration for database providers |
| 38 | + |
| 39 | +**`KoalaWiki.Domains/`** - Domain models |
| 40 | +- **`Warehouse.cs`** - Repository metadata and configuration |
| 41 | +- **`Document.cs`** - Document content and metadata |
| 42 | +- **`DocumentFile/`** - File structure and catalog definitions |
| 43 | +- **`FineTuning/`** - Training dataset generation |
| 44 | +- **`MCP/`** - Model Context Protocol entities |
| 45 | + |
| 46 | +**`Provider/`** - Database implementations |
| 47 | +- `KoalaWiki.Provider.PostgreSQL` |
| 48 | +- `KoalaWiki.Provider.MySQL` |
| 49 | +- `KoalaWiki.Provider.SqlServer` |
| 50 | + |
| 51 | +### Frontend Layer Breakdown |
| 52 | + |
| 53 | +**`web-site/src/`** - React application |
| 54 | +- **`pages/`** - Route-based page components: `home`, `auth`, `admin`, `repository`, `chat` |
| 55 | +- **`components/`** - Reusable UI components (RepositoryLayout, AdminLayout) |
| 56 | +- **`services/`** - HTTP API clients and API wrappers |
| 57 | +- **`stores/`** - Zustand state management stores |
| 58 | +- **`i18n/`** - Internationalization (Chinese, English, French) |
| 59 | +- **`routes/`** - React Router configuration with lazy loading |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Critical Data Flows |
| 64 | + |
| 65 | +### 1. Repository Analysis Flow (README from README.md) |
| 66 | +``` |
| 67 | +Clone Repository → .gitignore Filtering → Directory Scanning → |
| 68 | +AI Smart Filter (if file count > threshold) → Directory JSON → |
| 69 | +Generate README → Project Classification → Project Overview → |
| 70 | +Save to Database → Generate Task List (Think Catalogue) → |
| 71 | +Process Documents Recursively → Generate Commit Log |
| 72 | +``` |
| 73 | + |
| 74 | +### 2. Document Processing Pipeline (5-Step Architecture) |
| 75 | +Located in `KoalaWarehouse/Extensions/ServiceCollectionExtensions.cs`: |
| 76 | + |
| 77 | +**Execution Order:** |
| 78 | +1. **ReadmeGenerationStep** - Generate README.md |
| 79 | +2. **CatalogueGenerationStep** - Create directory structure |
| 80 | +3. **ProjectClassificationStep** - Classify project type |
| 81 | +4. **DocumentStructureGenerationStep** - Build document TOC |
| 82 | +5. **DocumentContentGenerationStep** - Generate document content |
| 83 | + |
| 84 | +**Key Classes:** |
| 85 | +- `ResilientDocumentProcessingPipeline` - Wraps pipeline with retry/fallback logic |
| 86 | +- `DocumentProcessingContext` - Carries data through steps |
| 87 | +- `DocumentProcessingOrchestrator` - Orchestrates with OpenTelemetry tracing |
| 88 | + |
| 89 | +### 3. AI Kernel Initialization (KernelFactory Pattern) |
| 90 | +`KernelFactory.GetKernel()` initializes Semantic Kernel with: |
| 91 | +- **LLM Provider Selection**: OpenAI or AzureOpenAI via `OpenAIOptions.ModelProvider` |
| 92 | +- **Plugins Loaded**: |
| 93 | + - Code Analysis plugins (in `plugins/CodeAnalysis/`) with `.skprompt.txt` prompts |
| 94 | + - FileTool plugin - reads repository files with token limits |
| 95 | + - AgentTool plugin - MCP integration |
| 96 | + - Dynamic MCP service loading from `DocumentOptions.McpStreamable` |
| 97 | +- **Custom HttpClient** - Handles gzip/brotli decompression |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +## Key Development Workflows |
| 102 | + |
| 103 | +### Build & Run |
| 104 | + |
| 105 | +**Frontend:** |
| 106 | +```bash |
| 107 | +cd web-site |
| 108 | +npm install |
| 109 | +npm run dev # Dev server at localhost:5173 |
| 110 | +npm run build # Build to ../src/KoalaWiki/wwwroot |
| 111 | +npm run build:analyze # Bundle analysis |
| 112 | +npm run lint # ESLint check |
| 113 | +``` |
| 114 | + |
| 115 | +**Backend:** |
| 116 | +```bash |
| 117 | +dotnet build KoalaWiki.sln |
| 118 | +dotnet run --project src/KoalaWiki/KoalaWiki.csproj |
| 119 | +# API at http://localhost:5085, OpenAPI at /scalar |
| 120 | +``` |
| 121 | + |
| 122 | +**Docker (with make/Makefile):** |
| 123 | +```bash |
| 124 | +make build # Build all images |
| 125 | +make build-frontend # Frontend only |
| 126 | +make dev # Run all services with logs |
| 127 | +make dev-backend # Backend only |
| 128 | +make build-arm # ARM64 architecture |
| 129 | +make build-amd # AMD64 architecture |
| 130 | +``` |
| 131 | + |
| 132 | +### Database Migrations |
| 133 | + |
| 134 | +Entity Framework Core migrations (in `KoalaWiki.Core/`): |
| 135 | +```bash |
| 136 | +dotnet ef migrations add <MigrationName> --project KoalaWiki.Core --startup-project src/KoalaWiki/KoalaWiki.csproj |
| 137 | +dotnet ef database update --project KoalaWiki.Core --startup-project src/KoalaWiki/KoalaWiki.csproj |
| 138 | +``` |
| 139 | + |
| 140 | +### Environment Configuration |
| 141 | + |
| 142 | +Critical environment variables in `docker-compose.yml`: |
| 143 | +- **`CHAT_MODEL`** (required) - Must support function calling (DeepSeek-V3, GPT-4-turbo) |
| 144 | +- **`ANALYSIS_MODEL`** (optional) - Defaults to CHAT_MODEL; recommend GPT-4.1 for better dir structure |
| 145 | +- **`CHAT_API_KEY`** - LLM API credential |
| 146 | +- **`ENDPOINT`** - API base URL (e.g., https://api.openai.com/v1) |
| 147 | +- **`MODEL_PROVIDER`** - OpenAI or AzureOpenAI |
| 148 | +- **`DB_TYPE`** - sqlite, postgres, mysql, sqlserver |
| 149 | +- **`DB_CONNECTION_STRING`** - Database connection |
| 150 | +- **`LANGUAGE`** - Document generation language (default: Chinese) |
| 151 | +- **`READ_MAX_TOKENS`** - Token limit for file reading (recommended: 70% of model max) |
| 152 | +- **`MCP_STREAMABLE`** - Format: `serviceName=url` (e.g., `claude=http://localhost:8080/api/mcp`) |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Project-Specific Patterns & Conventions |
| 157 | + |
| 158 | +### 1. FastAPI Service Pattern |
| 159 | +Services inherit from `FastApi` (from FastService NuGet): |
| 160 | +```csharp |
| 161 | +public class RepositoryService(IKoalaWikiContext db) : FastApi |
| 162 | +{ |
| 163 | + [HttpGet("/repos")] |
| 164 | + public async Task<List<Warehouse>> GetRepositories() |
| 165 | + { |
| 166 | + // Endpoint auto-exposed via FastService |
| 167 | + } |
| 168 | +} |
| 169 | +``` |
| 170 | +- Automatically registers routes without explicit Route attributes |
| 171 | +- DI via constructor parameters |
| 172 | +- Response mapping via Mapster |
| 173 | + |
| 174 | +### 2. Entity & Domain Model Structure |
| 175 | +Base entity in `KoalaWiki.Domains/Entity.cs`: |
| 176 | +```csharp |
| 177 | +public class Entity<TKey> : IEntity<TKey>, ICreateEntity |
| 178 | +{ |
| 179 | + public TKey Id { get; set; } |
| 180 | + public DateTime CreatedAt { get; set; } |
| 181 | +} |
| 182 | +``` |
| 183 | +- All domain entities inherit this with generic TKey (usually int/string) |
| 184 | +- `ICreateEntity` marks automatic timestamp tracking |
| 185 | +- Models in `KoalaWiki.Domains/` mapped to database via EF Core |
| 186 | + |
| 187 | +### 3. Semantic Kernel Prompt Files |
| 188 | +Located in `src/KoalaWiki/plugins/CodeAnalysis/`: |
| 189 | +``` |
| 190 | +plugins/ |
| 191 | +├── GenerateReadme/ |
| 192 | +│ ├── config.json # Plugin metadata |
| 193 | +│ └── skprompt.txt # Semantic Kernel prompt template |
| 194 | +├── CommitAnalyze/ |
| 195 | +├── GenerateDescription/ |
| 196 | +└── FunctionPrompt/ |
| 197 | +``` |
| 198 | +- `config.json` - Defines function signature, input/output schema |
| 199 | +- `skprompt.txt` - Template with `{{$variable}}` syntax (Semantic Kernel format, NOT Handlebars) |
| 200 | +- Loaded dynamically in `KernelFactory.GetKernel()` |
| 201 | + |
| 202 | +### 4. Pipeline Context Flow Pattern |
| 203 | +```csharp |
| 204 | +// DocumentProcessingContext carries state through pipeline steps |
| 205 | +public class DocumentProcessingContext |
| 206 | +{ |
| 207 | + public Document Document { get; init; } |
| 208 | + public Warehouse Warehouse { get; init; } |
| 209 | + public IKoalaWikiContext DbContext { get; init; } |
| 210 | + public Kernel? KernelInstance { get; set; } // Set in pipeline |
| 211 | + public string? GeneratedReadme { get; set; } |
| 212 | + public DocumentCatalog? Catalogue { get; set; } |
| 213 | +} |
| 214 | +``` |
| 215 | +- Each step reads input, modifies context, passes to next step |
| 216 | +- Stored kernel instance reused across steps to save initialization overhead |
| 217 | + |
| 218 | +### 5. i18n Convention (Frontend) |
| 219 | +`web-site/src/i18n/` structure: |
| 220 | +- **`locales/`** - JSON translation files (en.json, zh.json, fr.json) |
| 221 | +- **`mergeBundles.ts`** - Combines namespace bundles into single files |
| 222 | +- **`i18n.ts`** - i18next initialization |
| 223 | +- Usage: `const { t } = useTranslation('common')` |
| 224 | +- Build command: `npm run merge-i18n` |
| 225 | + |
| 226 | +### 6. Component Lazy Loading (Frontend) |
| 227 | +Routes use `lazy()` + `Suspense`: |
| 228 | +```tsx |
| 229 | +const RepositoryLayout = lazy(() => import('@/components/layout/RepositoryLayout')) |
| 230 | + |
| 231 | +<Suspense fallback={<Loading />}> |
| 232 | + <RepositoryLayout /> |
| 233 | +</Suspense> |
| 234 | +``` |
| 235 | +- Reduces initial bundle size |
| 236 | +- Fallback component shows during load |
| 237 | + |
| 238 | +### 7. State Management (Frontend) |
| 239 | +Zustand stores in `web-site/src/stores/`: |
| 240 | +```typescript |
| 241 | +const useAuthStore = create((set) => ({ |
| 242 | + isAuthenticated: false, |
| 243 | + setAuthenticated: (value) => set({ isAuthenticated: value }), |
| 244 | +})) |
| 245 | +``` |
| 246 | +- Lightweight, zero-boilerplate state |
| 247 | +- Avoid Redux complexity |
| 248 | + |
| 249 | +### 8. MCP Integration Points |
| 250 | +- **Backend MCP Server**: `src/KoalaWiki/MCP/` exposes repository knowledge |
| 251 | +- **MCP Client Tools**: `KernelFactory.GetKernel()` loads tools from external MCPs |
| 252 | +- **Streamable Config**: `DocumentOptions.McpStreamable` parses `MCP_STREAMABLE` env var |
| 253 | + |
| 254 | +--- |
| 255 | + |
| 256 | +## Integration Points & External Dependencies |
| 257 | + |
| 258 | +### LLM Providers |
| 259 | +- **OpenAI / AzureOpenAI** - Via Semantic Kernel connectors |
| 260 | +- **Anthropic** - Planned support |
| 261 | +- **DeepSeek** - Tested with DeepSeek-V3 model |
| 262 | +- **Custom Endpoints** - Use `ENDPOINT` env var for API-compatible services |
| 263 | + |
| 264 | +### Git Integration |
| 265 | +- **LibGit2Sharp** - Clone, read .gitignore, commit history |
| 266 | +- **Octokit** - GitHub API for repo metadata (optional) |
| 267 | +- Repository cloned to `KOALAWIKI_REPOSITORIES` directory |
| 268 | + |
| 269 | +### Data Storage |
| 270 | +- **Entity Framework Core** - ORM with provider abstraction |
| 271 | +- **4 Database Backends** - Pluggable at compile time via Provider projects |
| 272 | + |
| 273 | +### Frontend UI Framework |
| 274 | +- **Shadcn/ui** - Headless component library (based on Radix UI) |
| 275 | +- **TailwindCSS** - Utility-first styling with Vite plugin |
| 276 | +- **Lucide React** - Icon library |
| 277 | +- **React Hook Form** + **Zod** - Form handling & validation |
| 278 | + |
| 279 | +### Build Tools |
| 280 | +- **Vite 7.x** - Frontend bundler with gzip/brotli compression |
| 281 | +- **SWC** - Faster TypeScript compilation (via `@vitejs/plugin-react-swc`) |
| 282 | +- **.NET 9** - C# 13 language features |
| 283 | +- **Docker** - Multi-stage builds for production |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +## Common Commands Quick Reference |
| 288 | + |
| 289 | +| Task | Command | |
| 290 | +|------|---------| |
| 291 | +| **Frontend dev** | `cd web-site && npm run dev` | |
| 292 | +| **Frontend build** | `cd web-site && npm run build` | |
| 293 | +| **Backend run** | `dotnet run --project src/KoalaWiki/KoalaWiki.csproj` | |
| 294 | +| **Build all Docker** | `make build` (or `docker-compose build`) | |
| 295 | +| **Run all services** | `make dev` (shows logs) | |
| 296 | +| **Stop services** | `docker-compose down` | |
| 297 | +| **View logs** | `docker-compose logs -f` | |
| 298 | +| **DB migration** | `dotnet ef migrations add MigrationName --project KoalaWiki.Core` | |
| 299 | +| **Lint frontend** | `cd web-site && npm run lint` | |
| 300 | +| **Clean build** | `make clean` | |
| 301 | + |
| 302 | +--- |
| 303 | + |
| 304 | +## Debugging & Tracing |
| 305 | + |
| 306 | +### OpenTelemetry Integration |
| 307 | +- **`DocumentProcessingOrchestrator`** uses `ActivitySource` for tracing |
| 308 | +- **Dashboard**: Aspire Dashboard at `http://localhost:18888` (in docker-compose) |
| 309 | +- Tags automatically captured: warehouse ID, document ID, processing duration |
| 310 | + |
| 311 | +### Logging |
| 312 | +- **Serilog** configured in `Program.cs` |
| 313 | +- **Sinks**: Console, File |
| 314 | +- **Configuration**: `appsettings.json`, `appsettings.Development.json` |
| 315 | +- Backend logs shown in: `docker-compose logs -f koalawiki` |
| 316 | + |
| 317 | +### Frontend DevTools |
| 318 | +- **React DevTools** - Component inspection |
| 319 | +- **Network tab** - API calls to `/api/` proxied to backend |
| 320 | +- **Console** - Error/warning output |
| 321 | +- **Vite HMR** - Hot module replacement on file save |
| 322 | + |
| 323 | +--- |
| 324 | + |
| 325 | +## File Structure Reference |
| 326 | + |
| 327 | +**Key Files for Common Tasks:** |
| 328 | +- **Add database entity**: `KoalaWiki.Domains/` + migration in `KoalaWiki.Core/` |
| 329 | +- **Add API endpoint**: Create `Services/*.cs` inheriting `FastApi` |
| 330 | +- **Add frontend page**: Create in `web-site/src/pages/` + route in `web-site/src/routes/index.tsx` |
| 331 | +- **Update prompts**: Edit `src/KoalaWiki/plugins/CodeAnalysis/*/skprompt.txt` |
| 332 | +- **Add i18n strings**: Update `web-site/src/i18n/locales/*.json` |
| 333 | +- **Configure build**: `web-site/vite.config.ts` for frontend, `src/KoalaWiki/KoalaWiki.csproj` for backend |
| 334 | + |
| 335 | +--- |
| 336 | + |
| 337 | +## Notes for AI Agents |
| 338 | + |
| 339 | +1. **Token Budget**: Set `READ_MAX_TOKENS` to 70% of model max tokens to leave headroom for processing |
| 340 | +2. **Model Requirements**: CHAT_MODEL must support function calling (GPT-4, DeepSeek-V3, Claude 3.5) |
| 341 | +3. **MCP Extensibility**: Add tools to pipeline by registering MCPs in `DocumentOptions.McpStreamable` |
| 342 | +4. **Database Flexibility**: Each database provider is a separate project; migrate to new DB by swapping reference |
| 343 | +5. **Frontend Caching**: Built frontend deployed as static files in `wwwroot/`; no need to rebuild frontend for backend-only changes |
| 344 | +6. **Async-First**: Most services use `async/await`; pipeline steps must be async |
| 345 | +7. **Error Handling**: Pipeline has resilient wrapper (`ResilientDocumentProcessingPipeline`); step failures logged but may fall back |
0 commit comments