Skip to content

Commit 71be88b

Browse files
authored
Merge pull request #1 from Astrea-Technologies/base-requirements
API Structure Reorganization and Backend Fixes
2 parents 445b5a4 + 7f8b67e commit 71be88b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+3820
-674
lines changed

.cursor/.DS_Store

0 Bytes
Binary file not shown.

.cursor/rules/backend-overview.mdc

Lines changed: 0 additions & 86 deletions
This file was deleted.
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
description: Technical Stack Specification for the /backend.
3+
globs: backend/*
4+
alwaysApply: false
5+
---
6+
## 1. Technology Stack Overview
7+
8+
| Component | Technology | Version | Purpose |
9+
|-----------|------------|---------|---------|
10+
| Framework | FastAPI | 0.114.2+ | Web API framework |
11+
| ORM | SQLModel | 0.0.21+ | Database ORM |
12+
| Primary Database | PostgreSQL | 13+ | Relational database |
13+
| Document Database | MongoDB | 6.0+ | Social media content storage |
14+
| In-memory Database | Redis | 7.0+ | Caching and real-time operations |
15+
| Vector Database | Pinecone | Latest | Semantic content analysis |
16+
| Authentication | JWT | 2.8.0+ | User authentication |
17+
| Password Hashing | Passlib + Bcrypt | 1.7.4+ | Secure password storage |
18+
| Dependency Management | uv | 0.5.11+ | Package management |
19+
| Migrations | Alembic | 1.12.1+ | Database schema migrations |
20+
| API Documentation | OpenAPI/Swagger | Built-in | API documentation |
21+
| Error Tracking | Sentry | 1.40.6+ | Error reporting |
22+
| Email Delivery | emails | 0.6+ | Email notifications |
23+
| Testing | pytest | 7.4.3+ | Unit and integration testing |
24+
| Linting | ruff | 0.2.2+ | Code quality |
25+
| Type Checking | mypy | 1.8.0+ | Static type checking |
26+
| Task Queue | Celery | 5.3.0+ | Asynchronous task processing |
27+
| Message Broker | RabbitMQ | 3.12+ | Task distribution |
28+
| Stream Processing | Apache Kafka | 3.4+ | Real-time data streaming |
29+
| NLP Processing | spaCy + Transformers | 3.6+ / 4.28+ | Content analysis |
30+
31+
## 2. Architecture
32+
33+
### 2.1 Architectural Pattern
34+
35+
The application follows a clean architecture pattern with clear separation of concerns:
36+
37+
```
38+
Client Request → API Layer → Service Layer → Repository Layer → Database
39+
40+
Schema Layer
41+
```
42+
43+
### 2.2 Key Components
44+
45+
- **API Layer**: Handles HTTP requests/responses, input validation, and routing
46+
- **Service Layer**: Contains business logic and orchestrates repositories
47+
- **Repository Layer**: Provides data access patterns and database operations
48+
- **Schema Layer**: Defines data contracts for API input/output
49+
- **Model Layer**: Defines database models and relationships
50+
51+
### 2.3 Directory Structure
52+
53+
```
54+
/app
55+
├── api/ # API endpoints and routing
56+
│ ├── api_v1/ # API version 1
57+
│ │ ├── endpoints/ # Resource endpoints
58+
│ │ └── api.py # Router configuration
59+
│ ├── deps.py # Dependency injection
60+
│ └── main.py # Main router
61+
├── core/ # Core components
62+
│ ├── config.py # Environment configuration
63+
│ ├── security.py # Authentication
64+
│ └── errors.py # Error handling
65+
├── db/ # Database configuration
66+
│ ├── models/ # SQLModel definitions
67+
│ └── session.py # Database session management
68+
├── schemas/ # Pydantic models for API
69+
├── services/ # Business logic
70+
│ └── repositories/ # Data access layer
71+
├── tasks/ # Celery tasks for background processing
72+
│ ├── scraping/ # Social media scraping tasks
73+
│ ├── analysis/ # Content analysis tasks
74+
│ └── notifications/ # Alert and notification tasks
75+
├── processing/ # Data processing components
76+
│ ├── models/ # ML model wrappers
77+
│ ├── streams/ # Kafka stream processors
78+
│ └── embeddings/ # Vector embedding generators
79+
├── worker.py # Celery worker configuration
80+
└── main.py # Application entry point
81+
```
82+
83+
## 3. Database Design
84+
85+
### 3.1 Hybrid Database Technology
86+
87+
The application employs a hybrid database architecture to address the diverse data requirements of political social media analysis:
88+
89+
| Component | Technology | Version | Purpose |
90+
|-----------|------------|---------|---------|
91+
| Relational Database | PostgreSQL | 13+ | Entity data and relationships |
92+
| Document Database | MongoDB | 6.0+ | Social media content and engagement |
93+
| In-memory Database | Redis | 7.0+ | Caching and real-time operations |
94+
| Vector Database | Pinecone | Latest | Semantic similarity analysis |
95+
96+
Refer to `database-architecture.mdc` for detailed implementation specifications.
97+
98+
### 3.2 Primary Domain Models
99+
100+
- **User**: Authentication and user profile data
101+
- **PoliticalEntity**: Political figures, organizations, and campaigns
102+
- **SocialMediaAccount**: Platform account connections
103+
- **EntityRelationship**: Relationships between political entities
104+
105+
### 3.3 Key Design Decisions
106+
107+
- **UUID Primary Keys**: All entities use UUID primary keys for security and distributed system compatibility
108+
- **Cross-database References**: Consistent reference patterns between databases
109+
- **Relationship Management**: Proper foreign key constraints with cascade delete
110+
- **String Field Constraints**: Appropriate length limits on all VARCHAR fields
111+
- **Migration Strategy**: Alembic for version-controlled schema changes
112+
113+
### 3.4 Additional Dependencies
114+
115+
| Dependency | Version | Purpose |
116+
|------------|---------|---------|
117+
| motor | 3.2.0+ | Async MongoDB driver |
118+
| redis | 4.6.0+ | Redis client |
119+
| pinecone-client | 2.2.1+ | Pinecone Vector DB client |
120+
| pymongo | 4.5.0+ | MongoDB client |
121+
122+
Refer to `data-processing-architecture.mdc` for details on processing pipelines and analysis components.
123+
124+
## 4. API Design
125+
126+
### 4.1 API Structure
127+
128+
- RESTful API design principles
129+
- Resource-based URL structure
130+
- Version prefixing (`/api/v1/...`)
131+
- OpenAPI documentation
132+
133+
### 4.2 Response Format
134+
135+
All API responses follow a standardized format:
136+
137+
```json
138+
{
139+
"data": { ... }, // Actual response payload
140+
"success": true, // Success indicator
141+
"message": "Description" // Human-readable message
142+
}
143+
```
144+
145+
Error responses:
146+
147+
```json
148+
{
149+
"success": false,
150+
"error": "Error message",
151+
"details": { ... } // Additional error details
152+
}
153+
```
154+
155+
### 4.3 Authentication
156+
157+
- JWT token-based authentication
158+
- OAuth2 password flow with Bearer token
159+
- Role-based access control (standard user vs superuser)
160+
161+
## 5. Performance Considerations
162+
163+
### 5.1 Database Optimization
164+
165+
- Efficient query patterns via repositories
166+
- Appropriate indexing of frequently queried fields
167+
- Pagination for list endpoints
168+
169+
### 5.2 API Performance
170+
171+
- Async/await support for I/O bound operations
172+
- Connection pooling for database operations
173+
- Request validation at the edge
174+
175+
## 6. Security Features
176+
177+
### 6.1 Authentication & Authorization
178+
179+
- Password hashing with bcrypt
180+
- JWT with appropriate expiration
181+
- Principle of least privilege in endpoint permissions
182+
183+
### 6.2 Data Protection
184+
185+
- Input validation using Pydantic
186+
- CORS protection
187+
- Email-based password recovery flow
188+
189+
## 7. Development Workflow
190+
191+
### 7.1 Environment Setup
192+
193+
- Containerized development with Docker
194+
- Environment-specific configuration
195+
- Virtual environment management with uv
196+
197+
### 7.2 Quality Assurance
198+
199+
- Type checking with mypy
200+
- Linting with ruff
201+
- Automated testing with pytest
202+
- Pre-commit hooks
203+
204+
## 8. Deployment Strategy
205+
206+
### 8.1 Containerization
207+
208+
- Docker-based deployment
209+
- Multi-stage build for optimized image size
210+
- Environment variable configuration
211+
212+
### 8.2 Monitoring
213+
214+
- Sentry integration for error tracking
215+
- Health check endpoints
216+
- Structured logging
217+
218+
## 9. Scalability Considerations
219+
220+
### 9.1 Horizontal Scaling
221+
222+
- Stateless API design
223+
- Database connection pooling
224+
- External state management
225+
226+
### 9.2 Future Extensibility
227+
228+
- Modular service architecture
229+
- Clear separation of concerns
230+
- Version-prefixed API endpoints

0 commit comments

Comments
 (0)