Skip to content

Commit b38f784

Browse files
committed
docker modifications and prestart modification.
1 parent 33caf2d commit b38f784

File tree

7 files changed

+687
-10
lines changed

7 files changed

+687
-10
lines changed
Lines changed: 350 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,350 @@
1+
---
2+
description: Server architectur of the project.
3+
globs:
4+
alwaysApply: false
5+
---
6+
## 1. System Overview
7+
8+
The Political Social Media Analysis Platform follows a modern, containerized microservices architecture designed for scalability, resilience, and maintainable development. This document outlines the overall system architecture, deployment strategy, and service interaction patterns.
9+
10+
```
11+
┌───────────────────────┐ ┌───────────────────────┐
12+
│ │ │ │
13+
│ Frontend (React/TS) │◄────┤ Backend (FastAPI) │
14+
│ │ │ │
15+
└───────────────────────┘ └───────────┬───────────┘
16+
17+
18+
┌───────────────────────┐ ┌───────────────────────┐
19+
│ Database Layer │ │ Task Processing │
20+
│ │ │ │
21+
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
22+
│ │ PostgreSQL │ │ │ │ Celery Worker │ │
23+
│ │ (Relational) │ │ │ │ (Tasks) │ │
24+
│ └─────────────────┘ │ │ └─────────────────┘ │
25+
│ │ │ │
26+
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
27+
│ │ MongoDB │ │ │ │ Celery Beat │ │
28+
│ │ (Document) │ │ │ │ (Scheduling) │ │
29+
│ └─────────────────┘ │ │ └─────────────────┘ │
30+
│ │ │ │
31+
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
32+
│ │ Redis │ │ │ │ RabbitMQ │ │
33+
│ │ (Cache) │ │ │ │ (Message Queue)│ │
34+
│ └─────────────────┘ │ │ └─────────────────┘ │
35+
│ │ │ │
36+
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
37+
│ │ Pinecone │ │ │ │ Kafka │ │
38+
│ │ (Vector) │ │ │ │ (Streaming) │ │
39+
│ └─────────────────┘ │ │ └─────────────────┘ │
40+
│ │ │ │
41+
└───────────────────────┘ └───────────────────────┘
42+
```
43+
44+
## 2. Containerization Strategy
45+
46+
### 2.1 Docker Compose Architecture
47+
48+
The system uses Docker Compose for container orchestration with a dual-file approach:
49+
50+
| File | Purpose | Usage |
51+
|------|---------|-------|
52+
| `docker-compose.yml` | Production-ready base configuration | Primary service definitions |
53+
| `docker-compose.override.yml` | Development environment customizations | Automatically merged during development |
54+
55+
### 2.2 Service Organization
56+
57+
Services are organized into logical groups:
58+
59+
1. **Frontend Services**
60+
- React frontend application
61+
62+
2. **Backend Services**
63+
- FastAPI application
64+
- Prestart initialization service
65+
66+
3. **Database Services**
67+
- PostgreSQL (relational data)
68+
- MongoDB (document data)
69+
- Redis (caching and real-time operations)
70+
- Pinecone (vector embeddings)
71+
72+
4. **Message Processing**
73+
- RabbitMQ (message broker)
74+
- Celery Worker (task execution)
75+
- Celery Beat (task scheduling)
76+
77+
5. **Stream Processing**
78+
- Kafka (event streaming)
79+
- Zookeeper (Kafka coordination)
80+
81+
6. **Development Tools**
82+
- Adminer (PostgreSQL management)
83+
- MongoDB Express (MongoDB management)
84+
- Traefik Proxy (API gateway)
85+
- Mailcatcher (email testing)
86+
- Celery Flower (task monitoring)
87+
88+
### 2.3 Development vs. Production
89+
90+
| Aspect | Development | Production |
91+
|--------|------------|------------|
92+
| Restart Policy | `restart: "no"` | `restart: always` |
93+
| Port Exposure | Ports exposed to host | Only necessary ports exposed |
94+
| Volume Mounts | Source code mounted | Built artifacts only |
95+
| Network Configuration | Local networks | External Traefik network |
96+
| Health Checks | Simple checks | Comprehensive checks with retries |
97+
| Environment | Development settings | Production settings |
98+
| Logging | Verbose logging | Production logging levels |
99+
100+
## 3. Network Architecture
101+
102+
### 3.1 Network Configuration
103+
104+
```
105+
┌─────────────────────────────────────────────────────────────┐
106+
│ traefik-public │
107+
│ │
108+
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
109+
│ │ Traefik │ │ Frontend│ │ Backend │ │ Adminer │ │
110+
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
111+
│ │
112+
└─────────────────────────────────────────────────────────────┘
113+
114+
┌─────────────────────────────────────────────────────────────┐
115+
│ default │
116+
│ │
117+
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
118+
│ │PostgreSQL│ │ MongoDB │ │ Redis │ │RabbitMQ │ │
119+
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
120+
│ │
121+
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
122+
│ │ Kafka │ │Zookeeper│ │ Celery │ │ Celery │ │
123+
│ │ │ │ │ │ Worker │ │ Beat │ │
124+
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
125+
│ │
126+
└─────────────────────────────────────────────────────────────┘
127+
```
128+
129+
### 3.2 Traefik Integration
130+
131+
- **Production**: Uses external Traefik network with proper TLS termination
132+
- **Development**: Includes local Traefik instance with insecure dashboard
133+
- Routing follows pattern: `{service}.{domain}` → appropriate container
134+
135+
### 3.3 HTTPS Configuration
136+
137+
- Automatic TLS certificate issuance via Let's Encrypt
138+
- HTTP to HTTPS redirection enforced
139+
- Custom middleware for security headers
140+
141+
## 4. Data Architecture
142+
143+
### 4.1 Hybrid Database Strategy
144+
145+
The system employs a polyglot persistence approach using specialized databases:
146+
147+
| Database | Purpose | Data Types |
148+
|----------|---------|------------|
149+
| PostgreSQL | Relational data, user accounts, structured entities | Users, political entities, relationships, configuration |
150+
| MongoDB | Document storage, social media content | Posts, comments, media items, engagement metrics |
151+
| Redis | Caching, real-time operations, task management | Session data, counters, leaderboards, task queues |
152+
| Pinecone | Vector embeddings for semantic search | Text embeddings, similarity models |
153+
154+
### 4.2 Data Flow Patterns
155+
156+
```
157+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
158+
│ │ │ │ │ │
159+
│ API Request │────►│ FastAPI │────►│ PostgreSQL │
160+
│ │ │ Handler │ │ │
161+
└─────────────┘ └──────┬──────┘ └─────────────┘
162+
163+
164+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
165+
│ │ │ │ │ │
166+
│ Celery Task │◄────┤ Task Queue │◄────┤ RabbitMQ │
167+
│ │ │ │ │ │
168+
└──────┬──────┘ └─────────────┘ └─────────────┘
169+
170+
171+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
172+
│ │ │ │ │ │
173+
│ MongoDB │ │ Redis Cache │ │ Pinecone │
174+
│ Storage │ │ │ │ Vectors │
175+
│ │ │ │ │ │
176+
└─────────────┘ └─────────────┘ └─────────────┘
177+
```
178+
179+
### 4.3 Data Persistence
180+
181+
- Volume mapping for all databases to ensure data persistence
182+
- Standardized volume naming: `{service-name}_data`
183+
- Consistent backup solutions for each database type
184+
185+
## 5. Task Processing Architecture
186+
187+
### 5.1 Celery Integration
188+
189+
```
190+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
191+
│ │ │ │ │ │
192+
│ FastAPI │────►│ RabbitMQ │────►│ Celery │
193+
│ Backend │ │ Broker │ │ Worker │
194+
│ │ │ │ │ │
195+
└─────────────┘ └─────────────┘ └─────────────┘
196+
▲ │
197+
│ │
198+
│ ┌─────────────┐ │
199+
│ │ │ │
200+
└───────────┤ Redis │◄────────────┘
201+
│ Results │
202+
│ │
203+
└─────────────┘
204+
205+
┌─────────────┐ ┌─────────────┐
206+
│ │ │ │
207+
│ Celery Beat │────►│ Task │
208+
│ Scheduler │ │ Schedule │
209+
│ │ │ │
210+
└─────────────┘ └─────────────┘
211+
```
212+
213+
### 5.2 Task Types
214+
215+
- **Data Collection Tasks**: Social media scraping, data acquisition
216+
- **Analysis Tasks**: Content analysis, sentiment scoring, entity extraction
217+
- **Reporting Tasks**: Report generation, alert/notification creation
218+
- **Maintenance Tasks**: Database cleanup, analytics generation
219+
220+
### 5.3 Kafka Stream Processing
221+
222+
- Event-driven architecture for real-time data streams
223+
- Topic-based segregation of event types
224+
- Consumer groups for scalable processing
225+
226+
## 6. Security Architecture
227+
228+
### 6.1 Authentication and Authorization
229+
230+
- JWT-based authentication with appropriate expiration
231+
- Role-based access control (RBAC)
232+
- OAuth2 password flow with secure password hashing
233+
234+
### 6.2 Network Security
235+
236+
- Traefik as edge gateway with TLS termination
237+
- Internal network isolation
238+
- Minimal port exposure
239+
240+
### 6.3 Secret Management
241+
242+
- Environment variable-based secret injection
243+
- No hardcoded credentials
244+
- Support for container secrets in production
245+
246+
## 7. Deployment Strategy
247+
248+
### 7.1 Development Workflow
249+
250+
```
251+
Local Development → CI/CD Pipeline → Staging → Production
252+
```
253+
254+
- **Local**: Docker Compose with override file
255+
- **CI/CD**: Automated testing and container building
256+
- **Staging**: Production-like environment for validation
257+
- **Production**: Optimized for performance and security
258+
259+
### 7.2 Scaling Strategy
260+
261+
- Horizontal scaling of stateless services
262+
- Vertical scaling of database services
263+
- Load balancing through Traefik
264+
265+
## 8. Monitoring and Observability
266+
267+
### 8.1 Logging
268+
269+
- Structured logging format
270+
- Log aggregation across services
271+
- Sentry integration for error tracking
272+
273+
### 8.2 Metrics
274+
275+
- Health check endpoints for all services
276+
- Prometheus-compatible metrics endpoints
277+
- Celery Flower for task monitoring
278+
279+
## 9. Resilience Features
280+
281+
### 9.1 Health Checks
282+
283+
- Database connectivity checks
284+
- API endpoint checks
285+
- Appropriate retry policies
286+
287+
### 9.2 Failover Strategy
288+
289+
- Restart policies for critical services
290+
- Connection retry logic
291+
- Graceful degradation when components are unavailable
292+
293+
## 10. Development Environment
294+
295+
### 10.1 Local Setup
296+
297+
- Simple startup with `docker-compose up`
298+
- Hot-reloading for backend and frontend
299+
- Development admin interfaces for all databases
300+
301+
### 10.2 Testing
302+
303+
- Environment-specific testing configuration
304+
- Integration tests with in-memory databases
305+
- E2E testing with Playwright
306+
307+
## 11. Future Considerations
308+
309+
### 11.1 Kubernetes Migration Path
310+
311+
- Current Docker Compose structure designed for easy K8s migration
312+
- Service definitions align with Kubernetes patterns
313+
- Volume definitions compatible with persistent volume claims
314+
315+
### 11.2 Service Mesh Integration
316+
317+
- Prepared for Istio or Linkerd integration
318+
- Service-to-service communication patterns established
319+
- Observability foundations in place
320+
321+
## Appendix A: Environment Variables
322+
323+
| Variable | Purpose | Example |
324+
|----------|---------|---------|
325+
| `DOMAIN` | Base domain for all services | `example.com` |
326+
| `POSTGRES_*` | PostgreSQL configuration | `POSTGRES_USER=postgres` |
327+
| `MONGO_*` | MongoDB configuration | `MONGO_USER=mongo` |
328+
| `RABBITMQ_*` | RabbitMQ configuration | `RABBITMQ_USER=guest` |
329+
| `REDIS_*` | Redis configuration | `REDIS_PORT=6379` |
330+
| `SECRET_KEY` | Application encryption key | `supersecretkey` |
331+
| `SENTRY_DSN` | Sentry error tracking | `https://...` |
332+
333+
## Appendix B: Network Ports
334+
335+
| Service | Port | Purpose |
336+
|---------|------|---------|
337+
| Traefik | 80, 443 | HTTP/HTTPS |
338+
| PostgreSQL | 5432 | Database access |
339+
| MongoDB | 27017 | Database access |
340+
| Redis | 6379 | Cache access |
341+
| RabbitMQ | 5672, 15672 | AMQP and management |
342+
| Kafka | 9092 | Stream processing |
343+
| FastAPI | 8000 | API access |
344+
| Frontend | 5173 | Web UI (development) |
345+
346+
## Appendix C: Related Documentation
347+
348+
- `backend-technical-stack.mdc` - Backend technology details
349+
- `database-architecture.mdc` - Detailed database design
350+
- `data-processing-architecture.mdc` - Data processing pipeline details

.env

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ FRONTEND_HOST=http://localhost:5173
1313
# Environment: local, staging, production
1414
ENVIRONMENT=local
1515

16-
PROJECT_NAME="Full Stack FastAPI Project"
17-
STACK_NAME=full-stack-fastapi-project
16+
PROJECT_NAME="Political Social Media Analysis Platform"
17+
STACK_NAME=political-analysis-local
1818

1919
# Backend
2020
BACKEND_CORS_ORIGINS="http://localhost,http://localhost:5173,https://localhost,https://localhost:5173,http://localhost.tiangolo.com"
@@ -38,6 +38,22 @@ POSTGRES_DB=app
3838
POSTGRES_USER=postgres
3939
POSTGRES_PASSWORD=changethis
4040

41+
# MongoDB
42+
MONGO_USER=mongouser
43+
MONGO_PASSWORD=mongopassword
44+
MONGO_DB=socialmediadb
45+
46+
# RabbitMQ
47+
RABBITMQ_USER=rabbitmquser
48+
RABBITMQ_PASSWORD=rabbitmqpassword
49+
50+
# Add for local development when not using Docker
51+
# These will be automatically set in Docker based on service names
52+
# MONGO_SERVER=localhost
53+
# REDIS_SERVER=localhost
54+
# RABBITMQ_SERVER=localhost
55+
# KAFKA_BOOTSTRAP_SERVERS=localhost:9092
56+
4157
SENTRY_DSN=
4258

4359
# Configure these with your own Docker registry images

backend/app/alembic/env.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@
1818
# target_metadata = mymodel.Base.metadata
1919
# target_metadata = None
2020

21-
from app.models import SQLModel # noqa
21+
# Updated import for new architecture
22+
from sqlmodel import SQLModel # noqa
23+
from app.db.models import user, item # Import all models to register them
2224
from app.core.config import settings # noqa
2325

2426
target_metadata = SQLModel.metadata

0 commit comments

Comments
 (0)