1+ ---
2+ description: Technical Stack Specification for the /backend.
3+ globs: backend/*
4+ alwaysApply: false
5+ ---
6+ ## 1. Technology Stack Overview
7+
8+ | Component | Technology | Version | Purpose |
9+ |-----------|------------|---------|---------|
10+ | Framework | FastAPI | 0.114.2+ | Web API framework |
11+ | ORM | SQLModel | 0.0.21+ | Database ORM |
12+ | Primary Database | PostgreSQL | 13+ | Relational database |
13+ | Document Database | MongoDB | 6.0+ | Social media content storage |
14+ | In-memory Database | Redis | 7.0+ | Caching and real-time operations |
15+ | Vector Database | Pinecone | Latest | Semantic content analysis |
16+ | Authentication | JWT | 2.8.0+ | User authentication |
17+ | Password Hashing | Passlib + Bcrypt | 1.7.4+ | Secure password storage |
18+ | Dependency Management | uv | 0.5.11+ | Package management |
19+ | Migrations | Alembic | 1.12.1+ | Database schema migrations |
20+ | API Documentation | OpenAPI/Swagger | Built-in | API documentation |
21+ | Error Tracking | Sentry | 1.40.6+ | Error reporting |
22+ | Email Delivery | emails | 0.6+ | Email notifications |
23+ | Testing | pytest | 7.4.3+ | Unit and integration testing |
24+ | Linting | ruff | 0.2.2+ | Code quality |
25+ | Type Checking | mypy | 1.8.0+ | Static type checking |
26+ | Task Queue | Celery | 5.3.0+ | Asynchronous task processing |
27+ | Message Broker | RabbitMQ | 3.12+ | Task distribution |
28+ | Stream Processing | Apache Kafka | 3.4+ | Real-time data streaming |
29+ | NLP Processing | spaCy + Transformers | 3.6+ / 4.28+ | Content analysis |
30+
31+ ## 2. Architecture
32+
33+ ### 2.1 Architectural Pattern
34+
35+ The application follows a clean architecture pattern with clear separation of concerns:
36+
37+ ```
38+ Client Request → API Layer → Service Layer → Repository Layer → Database
39+ ↑
40+ Schema Layer
41+ ```
42+
43+ ### 2.2 Key Components
44+
45+ - **API Layer**: Handles HTTP requests/responses, input validation, and routing
46+ - **Service Layer**: Contains business logic and orchestrates repositories
47+ - **Repository Layer**: Provides data access patterns and database operations
48+ - **Schema Layer**: Defines data contracts for API input/output
49+ - **Model Layer**: Defines database models and relationships
50+
51+ ### 2.3 Directory Structure
52+
53+ ```
54+ /app
55+ ├── api/ # API endpoints and routing
56+ │ ├── api_v1/ # API version 1
57+ │ │ ├── endpoints/ # Resource endpoints
58+ │ │ └── api.py # Router configuration
59+ │ ├── deps.py # Dependency injection
60+ │ └── main.py # Main router
61+ ├── core/ # Core components
62+ │ ├── config.py # Environment configuration
63+ │ ├── security.py # Authentication
64+ │ └── errors.py # Error handling
65+ ├── db/ # Database configuration
66+ │ ├── models/ # SQLModel definitions
67+ │ └── session.py # Database session management
68+ ├── schemas/ # Pydantic models for API
69+ ├── services/ # Business logic
70+ │ └── repositories/ # Data access layer
71+ ├── tasks/ # Celery tasks for background processing
72+ │ ├── scraping/ # Social media scraping tasks
73+ │ ├── analysis/ # Content analysis tasks
74+ │ └── notifications/ # Alert and notification tasks
75+ ├── processing/ # Data processing components
76+ │ ├── models/ # ML model wrappers
77+ │ ├── streams/ # Kafka stream processors
78+ │ └── embeddings/ # Vector embedding generators
79+ ├── worker.py # Celery worker configuration
80+ └── main.py # Application entry point
81+ ```
82+
83+ ## 3. Database Design
84+
85+ ### 3.1 Hybrid Database Technology
86+
87+ The application employs a hybrid database architecture to address the diverse data requirements of political social media analysis:
88+
89+ | Component | Technology | Version | Purpose |
90+ |-----------|------------|---------|---------|
91+ | Relational Database | PostgreSQL | 13+ | Entity data and relationships |
92+ | Document Database | MongoDB | 6.0+ | Social media content and engagement |
93+ | In-memory Database | Redis | 7.0+ | Caching and real-time operations |
94+ | Vector Database | Pinecone | Latest | Semantic similarity analysis |
95+
96+ Refer to `database-architecture.mdc` for detailed implementation specifications.
97+
98+ ### 3.2 Primary Domain Models
99+
100+ - **User**: Authentication and user profile data
101+ - **PoliticalEntity**: Political figures, organizations, and campaigns
102+ - **SocialMediaAccount**: Platform account connections
103+ - **EntityRelationship**: Relationships between political entities
104+
105+ ### 3.3 Key Design Decisions
106+
107+ - **UUID Primary Keys**: All entities use UUID primary keys for security and distributed system compatibility
108+ - **Cross-database References**: Consistent reference patterns between databases
109+ - **Relationship Management**: Proper foreign key constraints with cascade delete
110+ - **String Field Constraints**: Appropriate length limits on all VARCHAR fields
111+ - **Migration Strategy**: Alembic for version-controlled schema changes
112+
113+ ### 3.4 Additional Dependencies
114+
115+ | Dependency | Version | Purpose |
116+ |------------|---------|---------|
117+ | motor | 3.2.0+ | Async MongoDB driver |
118+ | redis | 4.6.0+ | Redis client |
119+ | pinecone-client | 2.2.1+ | Pinecone Vector DB client |
120+ | pymongo | 4.5.0+ | MongoDB client |
121+
122+ Refer to `data-processing-architecture.mdc` for details on processing pipelines and analysis components.
123+
124+ ## 4. API Design
125+
126+ ### 4.1 API Structure
127+
128+ - RESTful API design principles
129+ - Resource-based URL structure
130+ - Version prefixing (`/api/v1/...`)
131+ - OpenAPI documentation
132+
133+ ### 4.2 Response Format
134+
135+ All API responses follow a standardized format:
136+
137+ ```json
138+ {
139+ "data": { ... }, // Actual response payload
140+ "success": true, // Success indicator
141+ "message": "Description" // Human-readable message
142+ }
143+ ```
144+
145+ Error responses:
146+
147+ ```json
148+ {
149+ "success": false,
150+ "error": "Error message",
151+ "details": { ... } // Additional error details
152+ }
153+ ```
154+
155+ ### 4.3 Authentication
156+
157+ - JWT token-based authentication
158+ - OAuth2 password flow with Bearer token
159+ - Role-based access control (standard user vs superuser)
160+
161+ ## 5. Performance Considerations
162+
163+ ### 5.1 Database Optimization
164+
165+ - Efficient query patterns via repositories
166+ - Appropriate indexing of frequently queried fields
167+ - Pagination for list endpoints
168+
169+ ### 5.2 API Performance
170+
171+ - Async/await support for I/O bound operations
172+ - Connection pooling for database operations
173+ - Request validation at the edge
174+
175+ ## 6. Security Features
176+
177+ ### 6.1 Authentication & Authorization
178+
179+ - Password hashing with bcrypt
180+ - JWT with appropriate expiration
181+ - Principle of least privilege in endpoint permissions
182+
183+ ### 6.2 Data Protection
184+
185+ - Input validation using Pydantic
186+ - CORS protection
187+ - Email-based password recovery flow
188+
189+ ## 7. Development Workflow
190+
191+ ### 7.1 Environment Setup
192+
193+ - Containerized development with Docker
194+ - Environment-specific configuration
195+ - Virtual environment management with uv
196+
197+ ### 7.2 Quality Assurance
198+
199+ - Type checking with mypy
200+ - Linting with ruff
201+ - Automated testing with pytest
202+ - Pre-commit hooks
203+
204+ ## 8. Deployment Strategy
205+
206+ ### 8.1 Containerization
207+
208+ - Docker-based deployment
209+ - Multi-stage build for optimized image size
210+ - Environment variable configuration
211+
212+ ### 8.2 Monitoring
213+
214+ - Sentry integration for error tracking
215+ - Health check endpoints
216+ - Structured logging
217+
218+ ## 9. Scalability Considerations
219+
220+ ### 9.1 Horizontal Scaling
221+
222+ - Stateless API design
223+ - Database connection pooling
224+ - External state management
225+
226+ ### 9.2 Future Extensibility
227+
228+ - Modular service architecture
229+ - Clear separation of concerns
230+ - Version-prefixed API endpoints
0 commit comments