Skip to content

Commit 2cc7553

Browse files
committed
feat: add comprehensive performance tests for HTTP/HTTPS and SSH protocols
1 parent 3150f5d commit 2cc7553

File tree

4 files changed

+1077
-2
lines changed

4 files changed

+1077
-2
lines changed

ARCHITECTURE.md

Lines changed: 382 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,382 @@
1+
# GitProxy Architecture
2+
3+
**Version**: 2.0.0-rc.3
4+
**Last Updated**: 2025-01-10
5+
6+
## Overview
7+
8+
GitProxy is a security-focused Git proxy that intercepts push operations between developers and Git remote endpoints (GitHub, GitLab, etc.) to enforce security policies, compliance rules, and workflows. It supports both **HTTP/HTTPS** and **SSH** protocols with identical security scanning through a shared processor chain.
9+
10+
## High-Level Architecture
11+
12+
```mermaid
13+
graph TB
14+
subgraph "Client Side"
15+
DEV[Developer]
16+
GIT[Git Client]
17+
end
18+
19+
subgraph "GitProxy"
20+
subgraph "Protocol Handlers"
21+
HTTP[HTTP/HTTPS Handler]
22+
SSH[SSH Handler]
23+
end
24+
25+
subgraph "Core Processing"
26+
PACK[Pack Data Capture]
27+
CHAIN[Security Processor Chain]
28+
AUTH[Authorization Engine]
29+
end
30+
31+
subgraph "Storage"
32+
DB[(Database)]
33+
CACHE[(Cache)]
34+
end
35+
end
36+
37+
subgraph "Remote Side"
38+
GITHUB[GitHub/GitLab/etc]
39+
end
40+
41+
DEV --> GIT
42+
GIT --> HTTP
43+
GIT --> SSH
44+
HTTP --> PACK
45+
SSH --> PACK
46+
PACK --> CHAIN
47+
CHAIN --> AUTH
48+
AUTH --> GITHUB
49+
CHAIN --> DB
50+
AUTH --> CACHE
51+
```
52+
53+
## Core Components
54+
55+
### 1. Protocol Handlers
56+
57+
#### HTTP/HTTPS Handler (`src/proxy/routes/index.ts`)
58+
59+
- **Purpose**: Handles HTTP/HTTPS Git operations
60+
- **Entry Point**: Express middleware
61+
- **Key Features**:
62+
- Pack data extraction via `getRawBody` middleware
63+
- Request validation and routing
64+
- Error response formatting (Git protocol)
65+
- Streaming support up to 1GB
66+
67+
#### SSH Handler (`src/proxy/ssh/server.ts`)
68+
69+
- **Purpose**: Handles SSH Git operations
70+
- **Entry Point**: SSH2 server
71+
- **Key Features**:
72+
- SSH key-based authentication
73+
- Stream-based pack data capture
74+
- SSH user context preservation
75+
- Error response formatting (stderr)
76+
77+
### 2. Security Processor Chain (`src/proxy/chain.ts`)
78+
79+
The heart of GitProxy's security model - a shared 17-processor chain used by both protocols:
80+
81+
```typescript
82+
const pushActionChain = [
83+
proc.push.parsePush, // Extract commit data from pack
84+
proc.push.checkEmptyBranch, // Validate branch is not empty
85+
proc.push.checkRepoInAuthorisedList, // Repository authorization
86+
proc.push.checkCommitMessages, // Commit message validation
87+
proc.push.checkAuthorEmails, // Author email validation
88+
proc.push.checkUserPushPermission, // User push permissions
89+
proc.push.pullRemote, // Clone remote repository
90+
proc.push.writePack, // Write pack data locally
91+
proc.push.checkHiddenCommits, // Hidden commit detection
92+
proc.push.checkIfWaitingAuth, // Check authorization status
93+
proc.push.preReceive, // Pre-receive hooks
94+
proc.push.getDiff, // Generate diff
95+
proc.push.gitleaks, // Secret scanning
96+
proc.push.clearBareClone, // Cleanup
97+
proc.push.scanDiff, // Diff analysis
98+
proc.push.captureSSHKey, // SSH key capture
99+
proc.push.blockForAuth, // Authorization workflow
100+
];
101+
```
102+
103+
### 3. Database Abstraction (`src/db/index.ts`)
104+
105+
Two implementations for different deployment scenarios:
106+
107+
#### NeDB (Development)
108+
109+
- **File-based**: Local JSON files
110+
- **Use Case**: Development and testing
111+
- **Performance**: Good for small to medium datasets
112+
113+
#### MongoDB (Production)
114+
115+
- **Document-based**: Full-featured database
116+
- **Use Case**: Production deployments
117+
- **Performance**: Scalable for large datasets
118+
119+
### 4. Configuration Management (`src/config/`)
120+
121+
Hierarchical configuration system:
122+
123+
1. **Schema Definition**: `config.schema.json`
124+
2. **Generated Types**: `src/config/generated/config.ts`
125+
3. **User Config**: `proxy.config.json`
126+
4. **Configuration Loader**: `src/config/index.ts`
127+
128+
## Request Flow
129+
130+
### HTTP/HTTPS Flow
131+
132+
```mermaid
133+
sequenceDiagram
134+
participant Client
135+
participant Express
136+
participant Middleware
137+
participant Chain
138+
participant Remote
139+
140+
Client->>Express: POST /repo.git/git-receive-pack
141+
Express->>Middleware: extractRawBody()
142+
Middleware->>Middleware: Capture pack data (1GB limit)
143+
Middleware->>Chain: Execute security chain
144+
Chain->>Chain: Run 17 processors
145+
Chain->>Remote: Forward if approved
146+
Remote->>Client: Response
147+
```
148+
149+
### SSH Flow
150+
151+
```mermaid
152+
sequenceDiagram
153+
participant Client
154+
participant SSH Server
155+
participant Stream Handler
156+
participant Chain
157+
participant Remote
158+
159+
Client->>SSH Server: git-receive-pack 'repo'
160+
SSH Server->>Stream Handler: Capture pack data
161+
Stream Handler->>Stream Handler: Buffer chunks (500MB limit)
162+
Stream Handler->>Chain: Execute security chain
163+
Chain->>Chain: Run 17 processors
164+
Chain->>Remote: Forward if approved
165+
Remote->>Client: Response
166+
```
167+
168+
## Security Model
169+
170+
### Pack Data Processing
171+
172+
Both protocols follow the same pattern:
173+
174+
1. **Capture**: Extract pack data from request/stream
175+
2. **Parse**: Extract commit information and ref updates
176+
3. **Clone**: Create local repository copy
177+
4. **Analyze**: Run security scans and validations
178+
5. **Authorize**: Apply approval workflow
179+
6. **Forward**: Send to remote if approved
180+
181+
### Security Scans
182+
183+
#### Gitleaks Integration
184+
185+
- **Purpose**: Detect secrets, API keys, passwords
186+
- **Implementation**: External gitleaks binary
187+
- **Scope**: Full pack data scanning
188+
- **Performance**: Optimized for large repositories
189+
190+
#### Diff Analysis
191+
192+
- **Purpose**: Analyze code changes for security issues
193+
- **Implementation**: Custom pattern matching
194+
- **Scope**: Only changed files
195+
- **Performance**: Fast incremental analysis
196+
197+
#### Hidden Commit Detection
198+
199+
- **Purpose**: Detect manipulated or hidden commits
200+
- **Implementation**: Pack data integrity checks
201+
- **Scope**: Full commit history validation
202+
- **Performance**: Minimal overhead
203+
204+
### Authorization Workflow
205+
206+
#### Auto-Approval
207+
208+
- **Trigger**: All security checks pass
209+
- **Process**: Automatic approval and forwarding
210+
- **Logging**: Full audit trail maintained
211+
212+
#### Manual Approval
213+
214+
- **Trigger**: Security check failure or policy requirement
215+
- **Process**: Human review via web interface
216+
- **Logging**: Detailed approval/rejection reasons
217+
218+
## Plugin System
219+
220+
### Architecture (`src/plugin.ts`)
221+
222+
Extensible processor system for custom validation:
223+
224+
```typescript
225+
class MyPlugin {
226+
async exec(req: any, action: Action): Promise<Action> {
227+
// Custom validation logic
228+
return action;
229+
}
230+
}
231+
```
232+
233+
### Plugin Types
234+
235+
- **Push Plugins**: Inserted after `parsePush` (position 1)
236+
- **Pull Plugins**: Inserted at start (position 0)
237+
238+
### Plugin Lifecycle
239+
240+
1. **Loading**: Discovered from configuration
241+
2. **Initialization**: Constructor called with config
242+
3. **Execution**: `exec()` called for each request
243+
4. **Cleanup**: Resources cleaned up on shutdown
244+
245+
## Error Handling
246+
247+
### Protocol-Specific Error Responses
248+
249+
#### HTTP/HTTPS
250+
251+
```typescript
252+
res.set('content-type', 'application/x-git-receive-pack-result');
253+
res.status(200).send(handleMessage(errorMessage));
254+
```
255+
256+
#### SSH
257+
258+
```typescript
259+
stream.stderr.write(`Error: ${errorMessage}\n`);
260+
stream.exit(1);
261+
stream.end();
262+
```
263+
264+
### Error Categories
265+
266+
- **Validation Errors**: Invalid requests or data
267+
- **Authorization Errors**: Access denied or insufficient permissions
268+
- **Security Errors**: Policy violations or security issues
269+
- **System Errors**: Internal errors or resource exhaustion
270+
271+
## Performance Characteristics
272+
273+
### Memory Management
274+
275+
#### HTTP/HTTPS
276+
277+
- **Streaming**: Native Express streaming
278+
- **Memory**: PassThrough streams minimize buffering
279+
- **Size Limit**: 1GB (configurable)
280+
281+
#### SSH
282+
283+
- **Streaming**: Custom buffer management
284+
- **Memory**: In-memory buffering up to 500MB
285+
- **Size Limit**: 500MB (configurable)
286+
287+
### Performance Optimizations
288+
289+
#### Caching
290+
291+
- **Repository Clones**: Temporary local clones
292+
- **Configuration**: Cached configuration values
293+
- **Authentication**: Cached user sessions
294+
295+
#### Concurrency
296+
297+
- **HTTP/HTTPS**: Express handles multiple requests
298+
- **SSH**: One command per SSH session
299+
- **Processing**: Async processor chain execution
300+
301+
## Monitoring and Observability
302+
303+
### Logging
304+
305+
- **Structured Logging**: JSON-formatted logs
306+
- **Log Levels**: Debug, Info, Warn, Error
307+
- **Context**: Request ID, user, repository tracking
308+
309+
### Metrics
310+
311+
- **Request Counts**: Total requests by protocol
312+
- **Processing Time**: Chain execution duration
313+
- **Error Rates**: Failed requests by category
314+
- **Resource Usage**: Memory and CPU utilization
315+
316+
### Audit Trail
317+
318+
- **User Actions**: All user operations logged
319+
- **Security Events**: Policy violations and approvals
320+
- **System Events**: Configuration changes and errors
321+
322+
## Deployment Architecture
323+
324+
### Development
325+
326+
```
327+
Developer → GitProxy (NeDB) → GitHub
328+
```
329+
330+
### Production
331+
332+
```
333+
Developer → Load Balancer → GitProxy (MongoDB) → GitHub
334+
```
335+
336+
### High Availability
337+
338+
```
339+
Developer → Load Balancer → Multiple GitProxy Instances → GitHub
340+
```
341+
342+
## Security Considerations
343+
344+
### Data Protection
345+
346+
- **Encryption**: SSH keys encrypted at rest
347+
- **Transit**: HTTPS/TLS for all communications
348+
- **Secrets**: No secrets in logs or configuration
349+
350+
### Access Control
351+
352+
- **Authentication**: Multiple provider support
353+
- **Authorization**: Granular permission system
354+
- **Audit**: Complete operation logging
355+
356+
### Compliance
357+
358+
- **Regulatory**: Financial services compliance
359+
- **Standards**: Industry security standards
360+
- **Reporting**: Detailed compliance reports
361+
362+
## Future Enhancements
363+
364+
### Planned Features
365+
366+
- **Rate Limiting**: Per-user and per-repository limits
367+
- **Streaming to Disk**: For very large pack files
368+
- **Performance Monitoring**: Real-time metrics
369+
- **Advanced Caching**: Repository and diff caching
370+
371+
### Scalability
372+
373+
- **Horizontal Scaling**: Multiple instance support
374+
- **Database Sharding**: Large-scale data distribution
375+
- **CDN Integration**: Global content distribution
376+
377+
---
378+
379+
**Architecture Status**: ✅ **Production Ready**
380+
**Scalability**: ✅ **Horizontal Scaling Supported**
381+
**Security**: ✅ **Enterprise Grade**
382+
**Maintainability**: ✅ **Well Documented**

0 commit comments

Comments
 (0)