|
| 1 | +# GitProxy Architecture |
| 2 | + |
| 3 | +**Version**: 2.0.0-rc.3 |
| 4 | +**Last Updated**: 2025-01-10 |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +GitProxy is a security-focused Git proxy that intercepts push operations between developers and Git remote endpoints (GitHub, GitLab, etc.) to enforce security policies, compliance rules, and workflows. It supports both **HTTP/HTTPS** and **SSH** protocols with identical security scanning through a shared processor chain. |
| 9 | + |
| 10 | +## High-Level Architecture |
| 11 | + |
| 12 | +```mermaid |
| 13 | +graph TB |
| 14 | + subgraph "Client Side" |
| 15 | + DEV[Developer] |
| 16 | + GIT[Git Client] |
| 17 | + end |
| 18 | +
|
| 19 | + subgraph "GitProxy" |
| 20 | + subgraph "Protocol Handlers" |
| 21 | + HTTP[HTTP/HTTPS Handler] |
| 22 | + SSH[SSH Handler] |
| 23 | + end |
| 24 | +
|
| 25 | + subgraph "Core Processing" |
| 26 | + PACK[Pack Data Capture] |
| 27 | + CHAIN[Security Processor Chain] |
| 28 | + AUTH[Authorization Engine] |
| 29 | + end |
| 30 | +
|
| 31 | + subgraph "Storage" |
| 32 | + DB[(Database)] |
| 33 | + CACHE[(Cache)] |
| 34 | + end |
| 35 | + end |
| 36 | +
|
| 37 | + subgraph "Remote Side" |
| 38 | + GITHUB[GitHub/GitLab/etc] |
| 39 | + end |
| 40 | +
|
| 41 | + DEV --> GIT |
| 42 | + GIT --> HTTP |
| 43 | + GIT --> SSH |
| 44 | + HTTP --> PACK |
| 45 | + SSH --> PACK |
| 46 | + PACK --> CHAIN |
| 47 | + CHAIN --> AUTH |
| 48 | + AUTH --> GITHUB |
| 49 | + CHAIN --> DB |
| 50 | + AUTH --> CACHE |
| 51 | +``` |
| 52 | + |
| 53 | +## Core Components |
| 54 | + |
| 55 | +### 1. Protocol Handlers |
| 56 | + |
| 57 | +#### HTTP/HTTPS Handler (`src/proxy/routes/index.ts`) |
| 58 | + |
| 59 | +- **Purpose**: Handles HTTP/HTTPS Git operations |
| 60 | +- **Entry Point**: Express middleware |
| 61 | +- **Key Features**: |
| 62 | + - Pack data extraction via `getRawBody` middleware |
| 63 | + - Request validation and routing |
| 64 | + - Error response formatting (Git protocol) |
| 65 | + - Streaming support up to 1GB |
| 66 | + |
| 67 | +#### SSH Handler (`src/proxy/ssh/server.ts`) |
| 68 | + |
| 69 | +- **Purpose**: Handles SSH Git operations |
| 70 | +- **Entry Point**: SSH2 server |
| 71 | +- **Key Features**: |
| 72 | + - SSH key-based authentication |
| 73 | + - Stream-based pack data capture |
| 74 | + - SSH user context preservation |
| 75 | + - Error response formatting (stderr) |
| 76 | + |
| 77 | +### 2. Security Processor Chain (`src/proxy/chain.ts`) |
| 78 | + |
| 79 | +The heart of GitProxy's security model - a shared 17-processor chain used by both protocols: |
| 80 | + |
| 81 | +```typescript |
| 82 | +const pushActionChain = [ |
| 83 | + proc.push.parsePush, // Extract commit data from pack |
| 84 | + proc.push.checkEmptyBranch, // Validate branch is not empty |
| 85 | + proc.push.checkRepoInAuthorisedList, // Repository authorization |
| 86 | + proc.push.checkCommitMessages, // Commit message validation |
| 87 | + proc.push.checkAuthorEmails, // Author email validation |
| 88 | + proc.push.checkUserPushPermission, // User push permissions |
| 89 | + proc.push.pullRemote, // Clone remote repository |
| 90 | + proc.push.writePack, // Write pack data locally |
| 91 | + proc.push.checkHiddenCommits, // Hidden commit detection |
| 92 | + proc.push.checkIfWaitingAuth, // Check authorization status |
| 93 | + proc.push.preReceive, // Pre-receive hooks |
| 94 | + proc.push.getDiff, // Generate diff |
| 95 | + proc.push.gitleaks, // Secret scanning |
| 96 | + proc.push.clearBareClone, // Cleanup |
| 97 | + proc.push.scanDiff, // Diff analysis |
| 98 | + proc.push.captureSSHKey, // SSH key capture |
| 99 | + proc.push.blockForAuth, // Authorization workflow |
| 100 | +]; |
| 101 | +``` |
| 102 | + |
| 103 | +### 3. Database Abstraction (`src/db/index.ts`) |
| 104 | + |
| 105 | +Two implementations for different deployment scenarios: |
| 106 | + |
| 107 | +#### NeDB (Development) |
| 108 | + |
| 109 | +- **File-based**: Local JSON files |
| 110 | +- **Use Case**: Development and testing |
| 111 | +- **Performance**: Good for small to medium datasets |
| 112 | + |
| 113 | +#### MongoDB (Production) |
| 114 | + |
| 115 | +- **Document-based**: Full-featured database |
| 116 | +- **Use Case**: Production deployments |
| 117 | +- **Performance**: Scalable for large datasets |
| 118 | + |
| 119 | +### 4. Configuration Management (`src/config/`) |
| 120 | + |
| 121 | +Hierarchical configuration system: |
| 122 | + |
| 123 | +1. **Schema Definition**: `config.schema.json` |
| 124 | +2. **Generated Types**: `src/config/generated/config.ts` |
| 125 | +3. **User Config**: `proxy.config.json` |
| 126 | +4. **Configuration Loader**: `src/config/index.ts` |
| 127 | + |
| 128 | +## Request Flow |
| 129 | + |
| 130 | +### HTTP/HTTPS Flow |
| 131 | + |
| 132 | +```mermaid |
| 133 | +sequenceDiagram |
| 134 | + participant Client |
| 135 | + participant Express |
| 136 | + participant Middleware |
| 137 | + participant Chain |
| 138 | + participant Remote |
| 139 | +
|
| 140 | + Client->>Express: POST /repo.git/git-receive-pack |
| 141 | + Express->>Middleware: extractRawBody() |
| 142 | + Middleware->>Middleware: Capture pack data (1GB limit) |
| 143 | + Middleware->>Chain: Execute security chain |
| 144 | + Chain->>Chain: Run 17 processors |
| 145 | + Chain->>Remote: Forward if approved |
| 146 | + Remote->>Client: Response |
| 147 | +``` |
| 148 | + |
| 149 | +### SSH Flow |
| 150 | + |
| 151 | +```mermaid |
| 152 | +sequenceDiagram |
| 153 | + participant Client |
| 154 | + participant SSH Server |
| 155 | + participant Stream Handler |
| 156 | + participant Chain |
| 157 | + participant Remote |
| 158 | +
|
| 159 | + Client->>SSH Server: git-receive-pack 'repo' |
| 160 | + SSH Server->>Stream Handler: Capture pack data |
| 161 | + Stream Handler->>Stream Handler: Buffer chunks (500MB limit) |
| 162 | + Stream Handler->>Chain: Execute security chain |
| 163 | + Chain->>Chain: Run 17 processors |
| 164 | + Chain->>Remote: Forward if approved |
| 165 | + Remote->>Client: Response |
| 166 | +``` |
| 167 | + |
| 168 | +## Security Model |
| 169 | + |
| 170 | +### Pack Data Processing |
| 171 | + |
| 172 | +Both protocols follow the same pattern: |
| 173 | + |
| 174 | +1. **Capture**: Extract pack data from request/stream |
| 175 | +2. **Parse**: Extract commit information and ref updates |
| 176 | +3. **Clone**: Create local repository copy |
| 177 | +4. **Analyze**: Run security scans and validations |
| 178 | +5. **Authorize**: Apply approval workflow |
| 179 | +6. **Forward**: Send to remote if approved |
| 180 | + |
| 181 | +### Security Scans |
| 182 | + |
| 183 | +#### Gitleaks Integration |
| 184 | + |
| 185 | +- **Purpose**: Detect secrets, API keys, passwords |
| 186 | +- **Implementation**: External gitleaks binary |
| 187 | +- **Scope**: Full pack data scanning |
| 188 | +- **Performance**: Optimized for large repositories |
| 189 | + |
| 190 | +#### Diff Analysis |
| 191 | + |
| 192 | +- **Purpose**: Analyze code changes for security issues |
| 193 | +- **Implementation**: Custom pattern matching |
| 194 | +- **Scope**: Only changed files |
| 195 | +- **Performance**: Fast incremental analysis |
| 196 | + |
| 197 | +#### Hidden Commit Detection |
| 198 | + |
| 199 | +- **Purpose**: Detect manipulated or hidden commits |
| 200 | +- **Implementation**: Pack data integrity checks |
| 201 | +- **Scope**: Full commit history validation |
| 202 | +- **Performance**: Minimal overhead |
| 203 | + |
| 204 | +### Authorization Workflow |
| 205 | + |
| 206 | +#### Auto-Approval |
| 207 | + |
| 208 | +- **Trigger**: All security checks pass |
| 209 | +- **Process**: Automatic approval and forwarding |
| 210 | +- **Logging**: Full audit trail maintained |
| 211 | + |
| 212 | +#### Manual Approval |
| 213 | + |
| 214 | +- **Trigger**: Security check failure or policy requirement |
| 215 | +- **Process**: Human review via web interface |
| 216 | +- **Logging**: Detailed approval/rejection reasons |
| 217 | + |
| 218 | +## Plugin System |
| 219 | + |
| 220 | +### Architecture (`src/plugin.ts`) |
| 221 | + |
| 222 | +Extensible processor system for custom validation: |
| 223 | + |
| 224 | +```typescript |
| 225 | +class MyPlugin { |
| 226 | + async exec(req: any, action: Action): Promise<Action> { |
| 227 | + // Custom validation logic |
| 228 | + return action; |
| 229 | + } |
| 230 | +} |
| 231 | +``` |
| 232 | + |
| 233 | +### Plugin Types |
| 234 | + |
| 235 | +- **Push Plugins**: Inserted after `parsePush` (position 1) |
| 236 | +- **Pull Plugins**: Inserted at start (position 0) |
| 237 | + |
| 238 | +### Plugin Lifecycle |
| 239 | + |
| 240 | +1. **Loading**: Discovered from configuration |
| 241 | +2. **Initialization**: Constructor called with config |
| 242 | +3. **Execution**: `exec()` called for each request |
| 243 | +4. **Cleanup**: Resources cleaned up on shutdown |
| 244 | + |
| 245 | +## Error Handling |
| 246 | + |
| 247 | +### Protocol-Specific Error Responses |
| 248 | + |
| 249 | +#### HTTP/HTTPS |
| 250 | + |
| 251 | +```typescript |
| 252 | +res.set('content-type', 'application/x-git-receive-pack-result'); |
| 253 | +res.status(200).send(handleMessage(errorMessage)); |
| 254 | +``` |
| 255 | + |
| 256 | +#### SSH |
| 257 | + |
| 258 | +```typescript |
| 259 | +stream.stderr.write(`Error: ${errorMessage}\n`); |
| 260 | +stream.exit(1); |
| 261 | +stream.end(); |
| 262 | +``` |
| 263 | + |
| 264 | +### Error Categories |
| 265 | + |
| 266 | +- **Validation Errors**: Invalid requests or data |
| 267 | +- **Authorization Errors**: Access denied or insufficient permissions |
| 268 | +- **Security Errors**: Policy violations or security issues |
| 269 | +- **System Errors**: Internal errors or resource exhaustion |
| 270 | + |
| 271 | +## Performance Characteristics |
| 272 | + |
| 273 | +### Memory Management |
| 274 | + |
| 275 | +#### HTTP/HTTPS |
| 276 | + |
| 277 | +- **Streaming**: Native Express streaming |
| 278 | +- **Memory**: PassThrough streams minimize buffering |
| 279 | +- **Size Limit**: 1GB (configurable) |
| 280 | + |
| 281 | +#### SSH |
| 282 | + |
| 283 | +- **Streaming**: Custom buffer management |
| 284 | +- **Memory**: In-memory buffering up to 500MB |
| 285 | +- **Size Limit**: 500MB (configurable) |
| 286 | + |
| 287 | +### Performance Optimizations |
| 288 | + |
| 289 | +#### Caching |
| 290 | + |
| 291 | +- **Repository Clones**: Temporary local clones |
| 292 | +- **Configuration**: Cached configuration values |
| 293 | +- **Authentication**: Cached user sessions |
| 294 | + |
| 295 | +#### Concurrency |
| 296 | + |
| 297 | +- **HTTP/HTTPS**: Express handles multiple requests |
| 298 | +- **SSH**: One command per SSH session |
| 299 | +- **Processing**: Async processor chain execution |
| 300 | + |
| 301 | +## Monitoring and Observability |
| 302 | + |
| 303 | +### Logging |
| 304 | + |
| 305 | +- **Structured Logging**: JSON-formatted logs |
| 306 | +- **Log Levels**: Debug, Info, Warn, Error |
| 307 | +- **Context**: Request ID, user, repository tracking |
| 308 | + |
| 309 | +### Metrics |
| 310 | + |
| 311 | +- **Request Counts**: Total requests by protocol |
| 312 | +- **Processing Time**: Chain execution duration |
| 313 | +- **Error Rates**: Failed requests by category |
| 314 | +- **Resource Usage**: Memory and CPU utilization |
| 315 | + |
| 316 | +### Audit Trail |
| 317 | + |
| 318 | +- **User Actions**: All user operations logged |
| 319 | +- **Security Events**: Policy violations and approvals |
| 320 | +- **System Events**: Configuration changes and errors |
| 321 | + |
| 322 | +## Deployment Architecture |
| 323 | + |
| 324 | +### Development |
| 325 | + |
| 326 | +``` |
| 327 | +Developer → GitProxy (NeDB) → GitHub |
| 328 | +``` |
| 329 | + |
| 330 | +### Production |
| 331 | + |
| 332 | +``` |
| 333 | +Developer → Load Balancer → GitProxy (MongoDB) → GitHub |
| 334 | +``` |
| 335 | + |
| 336 | +### High Availability |
| 337 | + |
| 338 | +``` |
| 339 | +Developer → Load Balancer → Multiple GitProxy Instances → GitHub |
| 340 | +``` |
| 341 | + |
| 342 | +## Security Considerations |
| 343 | + |
| 344 | +### Data Protection |
| 345 | + |
| 346 | +- **Encryption**: SSH keys encrypted at rest |
| 347 | +- **Transit**: HTTPS/TLS for all communications |
| 348 | +- **Secrets**: No secrets in logs or configuration |
| 349 | + |
| 350 | +### Access Control |
| 351 | + |
| 352 | +- **Authentication**: Multiple provider support |
| 353 | +- **Authorization**: Granular permission system |
| 354 | +- **Audit**: Complete operation logging |
| 355 | + |
| 356 | +### Compliance |
| 357 | + |
| 358 | +- **Regulatory**: Financial services compliance |
| 359 | +- **Standards**: Industry security standards |
| 360 | +- **Reporting**: Detailed compliance reports |
| 361 | + |
| 362 | +## Future Enhancements |
| 363 | + |
| 364 | +### Planned Features |
| 365 | + |
| 366 | +- **Rate Limiting**: Per-user and per-repository limits |
| 367 | +- **Streaming to Disk**: For very large pack files |
| 368 | +- **Performance Monitoring**: Real-time metrics |
| 369 | +- **Advanced Caching**: Repository and diff caching |
| 370 | + |
| 371 | +### Scalability |
| 372 | + |
| 373 | +- **Horizontal Scaling**: Multiple instance support |
| 374 | +- **Database Sharding**: Large-scale data distribution |
| 375 | +- **CDN Integration**: Global content distribution |
| 376 | + |
| 377 | +--- |
| 378 | + |
| 379 | +**Architecture Status**: ✅ **Production Ready** |
| 380 | +**Scalability**: ✅ **Horizontal Scaling Supported** |
| 381 | +**Security**: ✅ **Enterprise Grade** |
| 382 | +**Maintainability**: ✅ **Well Documented** |
0 commit comments