1+ ---
2+ description: Server architectur of the project.
3+ globs:
4+ alwaysApply: false
5+ ---
6+ ## 1. System Overview
7+
8+ The Political Social Media Analysis Platform follows a modern, containerized microservices architecture designed for scalability, resilience, and maintainable development. This document outlines the overall system architecture, deployment strategy, and service interaction patterns.
9+
10+ ```
11+ ┌───────────────────────┐ ┌───────────────────────┐
12+ │ │ │ │
13+ │ Frontend (React/TS) │◄────┤ Backend (FastAPI) │
14+ │ │ │ │
15+ └───────────────────────┘ └───────────┬───────────┘
16+ │
17+ ▼
18+ ┌───────────────────────┐ ┌───────────────────────┐
19+ │ Database Layer │ │ Task Processing │
20+ │ │ │ │
21+ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
22+ │ │ PostgreSQL │ │ │ │ Celery Worker │ │
23+ │ │ (Relational) │ │ │ │ (Tasks) │ │
24+ │ └─────────────────┘ │ │ └─────────────────┘ │
25+ │ │ │ │
26+ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
27+ │ │ MongoDB │ │ │ │ Celery Beat │ │
28+ │ │ (Document) │ │ │ │ (Scheduling) │ │
29+ │ └─────────────────┘ │ │ └─────────────────┘ │
30+ │ │ │ │
31+ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
32+ │ │ Redis │ │ │ │ RabbitMQ │ │
33+ │ │ (Cache) │ │ │ │ (Message Queue)│ │
34+ │ └─────────────────┘ │ │ └─────────────────┘ │
35+ │ │ │ │
36+ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
37+ │ │ Pinecone │ │ │ │ Kafka │ │
38+ │ │ (Vector) │ │ │ │ (Streaming) │ │
39+ │ └─────────────────┘ │ │ └─────────────────┘ │
40+ │ │ │ │
41+ └───────────────────────┘ └───────────────────────┘
42+ ```
43+
44+ ## 2. Containerization Strategy
45+
46+ ### 2.1 Docker Compose Architecture
47+
48+ The system uses Docker Compose for container orchestration with a dual-file approach:
49+
50+ | File | Purpose | Usage |
51+ |------|---------|-------|
52+ | `docker-compose.yml` | Production-ready base configuration | Primary service definitions |
53+ | `docker-compose.override.yml` | Development environment customizations | Automatically merged during development |
54+
55+ ### 2.2 Service Organization
56+
57+ Services are organized into logical groups:
58+
59+ 1. **Frontend Services**
60+ - React frontend application
61+
62+ 2. **Backend Services**
63+ - FastAPI application
64+ - Prestart initialization service
65+
66+ 3. **Database Services**
67+ - PostgreSQL (relational data)
68+ - MongoDB (document data)
69+ - Redis (caching and real-time operations)
70+ - Pinecone (vector embeddings)
71+
72+ 4. **Message Processing**
73+ - RabbitMQ (message broker)
74+ - Celery Worker (task execution)
75+ - Celery Beat (task scheduling)
76+
77+ 5. **Stream Processing**
78+ - Kafka (event streaming)
79+ - Zookeeper (Kafka coordination)
80+
81+ 6. **Development Tools**
82+ - Adminer (PostgreSQL management)
83+ - MongoDB Express (MongoDB management)
84+ - Traefik Proxy (API gateway)
85+ - Mailcatcher (email testing)
86+ - Celery Flower (task monitoring)
87+
88+ ### 2.3 Development vs. Production
89+
90+ | Aspect | Development | Production |
91+ |--------|------------|------------|
92+ | Restart Policy | `restart: "no"` | `restart: always` |
93+ | Port Exposure | Ports exposed to host | Only necessary ports exposed |
94+ | Volume Mounts | Source code mounted | Built artifacts only |
95+ | Network Configuration | Local networks | External Traefik network |
96+ | Health Checks | Simple checks | Comprehensive checks with retries |
97+ | Environment | Development settings | Production settings |
98+ | Logging | Verbose logging | Production logging levels |
99+
100+ ## 3. Network Architecture
101+
102+ ### 3.1 Network Configuration
103+
104+ ```
105+ ┌─────────────────────────────────────────────────────────────┐
106+ │ traefik-public │
107+ │ │
108+ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
109+ │ │ Traefik │ │ Frontend│ │ Backend │ │ Adminer │ │
110+ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
111+ │ │
112+ └─────────────────────────────────────────────────────────────┘
113+
114+ ┌─────────────────────────────────────────────────────────────┐
115+ │ default │
116+ │ │
117+ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
118+ │ │PostgreSQL│ │ MongoDB │ │ Redis │ │RabbitMQ │ │
119+ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
120+ │ │
121+ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
122+ │ │ Kafka │ │Zookeeper│ │ Celery │ │ Celery │ │
123+ │ │ │ │ │ │ Worker │ │ Beat │ │
124+ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
125+ │ │
126+ └─────────────────────────────────────────────────────────────┘
127+ ```
128+
129+ ### 3.2 Traefik Integration
130+
131+ - **Production**: Uses external Traefik network with proper TLS termination
132+ - **Development**: Includes local Traefik instance with insecure dashboard
133+ - Routing follows pattern: `{service}.{domain}` → appropriate container
134+
135+ ### 3.3 HTTPS Configuration
136+
137+ - Automatic TLS certificate issuance via Let's Encrypt
138+ - HTTP to HTTPS redirection enforced
139+ - Custom middleware for security headers
140+
141+ ## 4. Data Architecture
142+
143+ ### 4.1 Hybrid Database Strategy
144+
145+ The system employs a polyglot persistence approach using specialized databases:
146+
147+ | Database | Purpose | Data Types |
148+ |----------|---------|------------|
149+ | PostgreSQL | Relational data, user accounts, structured entities | Users, political entities, relationships, configuration |
150+ | MongoDB | Document storage, social media content | Posts, comments, media items, engagement metrics |
151+ | Redis | Caching, real-time operations, task management | Session data, counters, leaderboards, task queues |
152+ | Pinecone | Vector embeddings for semantic search | Text embeddings, similarity models |
153+
154+ ### 4.2 Data Flow Patterns
155+
156+ ```
157+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
158+ │ │ │ │ │ │
159+ │ API Request │────►│ FastAPI │────►│ PostgreSQL │
160+ │ │ │ Handler │ │ │
161+ └─────────────┘ └──────┬──────┘ └─────────────┘
162+ │
163+ ▼
164+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
165+ │ │ │ │ │ │
166+ │ Celery Task │◄────┤ Task Queue │◄────┤ RabbitMQ │
167+ │ │ │ │ │ │
168+ └──────┬──────┘ └─────────────┘ └─────────────┘
169+ │
170+ ▼
171+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
172+ │ │ │ │ │ │
173+ │ MongoDB │ │ Redis Cache │ │ Pinecone │
174+ │ Storage │ │ │ │ Vectors │
175+ │ │ │ │ │ │
176+ └─────────────┘ └─────────────┘ └─────────────┘
177+ ```
178+
179+ ### 4.3 Data Persistence
180+
181+ - Volume mapping for all databases to ensure data persistence
182+ - Standardized volume naming: `{service-name}_data`
183+ - Consistent backup solutions for each database type
184+
185+ ## 5. Task Processing Architecture
186+
187+ ### 5.1 Celery Integration
188+
189+ ```
190+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
191+ │ │ │ │ │ │
192+ │ FastAPI │────►│ RabbitMQ │────►│ Celery │
193+ │ Backend │ │ Broker │ │ Worker │
194+ │ │ │ │ │ │
195+ └─────────────┘ └─────────────┘ └─────────────┘
196+ ▲ │
197+ │ │
198+ │ ┌─────────────┐ │
199+ │ │ │ │
200+ └───────────┤ Redis │◄────────────┘
201+ │ Results │
202+ │ │
203+ └─────────────┘
204+
205+ ┌─────────────┐ ┌─────────────┐
206+ │ │ │ │
207+ │ Celery Beat │────►│ Task │
208+ │ Scheduler │ │ Schedule │
209+ │ │ │ │
210+ └─────────────┘ └─────────────┘
211+ ```
212+
213+ ### 5.2 Task Types
214+
215+ - **Data Collection Tasks**: Social media scraping, data acquisition
216+ - **Analysis Tasks**: Content analysis, sentiment scoring, entity extraction
217+ - **Reporting Tasks**: Report generation, alert/notification creation
218+ - **Maintenance Tasks**: Database cleanup, analytics generation
219+
220+ ### 5.3 Kafka Stream Processing
221+
222+ - Event-driven architecture for real-time data streams
223+ - Topic-based segregation of event types
224+ - Consumer groups for scalable processing
225+
226+ ## 6. Security Architecture
227+
228+ ### 6.1 Authentication and Authorization
229+
230+ - JWT-based authentication with appropriate expiration
231+ - Role-based access control (RBAC)
232+ - OAuth2 password flow with secure password hashing
233+
234+ ### 6.2 Network Security
235+
236+ - Traefik as edge gateway with TLS termination
237+ - Internal network isolation
238+ - Minimal port exposure
239+
240+ ### 6.3 Secret Management
241+
242+ - Environment variable-based secret injection
243+ - No hardcoded credentials
244+ - Support for container secrets in production
245+
246+ ## 7. Deployment Strategy
247+
248+ ### 7.1 Development Workflow
249+
250+ ```
251+ Local Development → CI/CD Pipeline → Staging → Production
252+ ```
253+
254+ - **Local**: Docker Compose with override file
255+ - **CI/CD**: Automated testing and container building
256+ - **Staging**: Production-like environment for validation
257+ - **Production**: Optimized for performance and security
258+
259+ ### 7.2 Scaling Strategy
260+
261+ - Horizontal scaling of stateless services
262+ - Vertical scaling of database services
263+ - Load balancing through Traefik
264+
265+ ## 8. Monitoring and Observability
266+
267+ ### 8.1 Logging
268+
269+ - Structured logging format
270+ - Log aggregation across services
271+ - Sentry integration for error tracking
272+
273+ ### 8.2 Metrics
274+
275+ - Health check endpoints for all services
276+ - Prometheus-compatible metrics endpoints
277+ - Celery Flower for task monitoring
278+
279+ ## 9. Resilience Features
280+
281+ ### 9.1 Health Checks
282+
283+ - Database connectivity checks
284+ - API endpoint checks
285+ - Appropriate retry policies
286+
287+ ### 9.2 Failover Strategy
288+
289+ - Restart policies for critical services
290+ - Connection retry logic
291+ - Graceful degradation when components are unavailable
292+
293+ ## 10. Development Environment
294+
295+ ### 10.1 Local Setup
296+
297+ - Simple startup with `docker-compose up`
298+ - Hot-reloading for backend and frontend
299+ - Development admin interfaces for all databases
300+
301+ ### 10.2 Testing
302+
303+ - Environment-specific testing configuration
304+ - Integration tests with in-memory databases
305+ - E2E testing with Playwright
306+
307+ ## 11. Future Considerations
308+
309+ ### 11.1 Kubernetes Migration Path
310+
311+ - Current Docker Compose structure designed for easy K8s migration
312+ - Service definitions align with Kubernetes patterns
313+ - Volume definitions compatible with persistent volume claims
314+
315+ ### 11.2 Service Mesh Integration
316+
317+ - Prepared for Istio or Linkerd integration
318+ - Service-to-service communication patterns established
319+ - Observability foundations in place
320+
321+ ## Appendix A: Environment Variables
322+
323+ | Variable | Purpose | Example |
324+ |----------|---------|---------|
325+ | `DOMAIN` | Base domain for all services | `example.com` |
326+ | `POSTGRES_*` | PostgreSQL configuration | `POSTGRES_USER=postgres` |
327+ | `MONGO_*` | MongoDB configuration | `MONGO_USER=mongo` |
328+ | `RABBITMQ_*` | RabbitMQ configuration | `RABBITMQ_USER=guest` |
329+ | `REDIS_*` | Redis configuration | `REDIS_PORT=6379` |
330+ | `SECRET_KEY` | Application encryption key | `supersecretkey` |
331+ | `SENTRY_DSN` | Sentry error tracking | `https://...` |
332+
333+ ## Appendix B: Network Ports
334+
335+ | Service | Port | Purpose |
336+ |---------|------|---------|
337+ | Traefik | 80, 443 | HTTP/HTTPS |
338+ | PostgreSQL | 5432 | Database access |
339+ | MongoDB | 27017 | Database access |
340+ | Redis | 6379 | Cache access |
341+ | RabbitMQ | 5672, 15672 | AMQP and management |
342+ | Kafka | 9092 | Stream processing |
343+ | FastAPI | 8000 | API access |
344+ | Frontend | 5173 | Web UI (development) |
345+
346+ ## Appendix C: Related Documentation
347+
348+ - `backend-technical-stack.mdc` - Backend technology details
349+ - `database-architecture.mdc` - Detailed database design
350+ - `data-processing-architecture.mdc` - Data processing pipeline details
0 commit comments