Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f0b4833
Merge pull request #4 from Astrea-Technologies/mongo-repository
andrade94 Mar 24, 2025
0c8c5ed
redis repository.
andrade94 Mar 24, 2025
49b8da6
Merge pull request #5 from Astrea-Technologies/redis-repository
andrade94 Mar 24, 2025
c8b1843
Merge branch 'master' into redis-takeoff
andrade94 Mar 24, 2025
d23ba40
Merge pull request #6 from Astrea-Technologies/redis-takeoff
andrade94 Mar 24, 2025
5062009
topic logic added.
andrade94 Mar 24, 2025
6f3e29a
Merge pull request #7 from Astrea-Technologies/topic-repository
andrade94 Mar 24, 2025
e99f89b
pinecone fix and vector embedding implementation.
andrade94 Mar 24, 2025
8a1885a
fix extra error.
andrade94 Mar 24, 2025
cdd67ab
localhost.
andrade94 Mar 24, 2025
4d6fa60
fixes in mongodb.
andrade94 Mar 25, 2025
881cfda
pinecone fix.
andrade94 Mar 25, 2025
3e6a241
Merge pull request #8 from Astrea-Technologies/vector-repository
andrade94 Mar 25, 2025
1f0c820
task manager.
andrade94 Mar 26, 2025
092db0b
changes in docs.
andrade94 Mar 26, 2025
5c01a36
Merge pull request #9 from Astrea-Technologies/celery-4.1
andrade94 Mar 26, 2025
724e001
instagram post modification.
andrade94 Mar 27, 2025
4b752bc
documents fix.
andrade94 Mar 27, 2025
07dbe2c
changes to comments schema.
andrade94 Mar 27, 2025
f3e6b33
instagram collector fixed.
andrade94 Mar 27, 2025
00f3748
instagram finished.
andrade94 Mar 27, 2025
75902b5
response data from different actors.
andrade94 Mar 27, 2025
f4b9f38
facebook testing.
andrade94 Mar 27, 2025
5fec53c
x testing.
andrade94 Mar 27, 2025
4f291c7
changes in metadata.
andrade94 Mar 27, 2025
f4fed52
added tiktok.
andrade94 Mar 27, 2025
6e4a1cc
Merge pull request #10 from Astrea-Technologies/apify-tasks
andrade94 Mar 28, 2025
8d643fb
changes in documentation.
andrade94 Mar 28, 2025
e7b0d93
architecture modification.
andrade94 Mar 28, 2025
a4430a5
Merge pull request #11 from Astrea-Technologies/apify-tasks
andrade94 Mar 28, 2025
4125eca
changes in requirements from mvp.
andrade94 Apr 3, 2025
48faf84
changes.
andrade94 Apr 3, 2025
fc42b9e
Merge pull request #12 from Astrea-Technologies/less-mvp
andrade94 Apr 3, 2025
d40d40b
implementation.
andrade94 Apr 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
Binary file modified .cursor/.DS_Store
Binary file not shown.
117 changes: 73 additions & 44 deletions .cursor/rules/backend-technical-stack.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,32 @@ description: Technical Stack Specification for the /backend.
globs: backend/*
alwaysApply: false
---
# Technical Stack Specification for the /backend

## 1. Technology Stack Overview

| Component | Technology | Version | Purpose |
|-----------|------------|---------|---------|
| Framework | FastAPI | 0.114.2+ | Web API framework |
| ORM | SQLModel | 0.0.21+ | Database ORM |
| Primary Database | PostgreSQL | 13+ | Relational database |
| Document Database | MongoDB | 6.0+ | Social media content storage |
| In-memory Database | Redis | 7.0+ | Caching and real-time operations |
| Vector Database | Pinecone | Latest | Semantic content analysis |
| Authentication | JWT | 2.8.0+ | User authentication |
| Password Hashing | Passlib + Bcrypt | 1.7.4+ | Secure password storage |
| Dependency Management | uv | 0.5.11+ | Package management |
| Migrations | Alembic | 1.12.1+ | Database schema migrations |
| API Documentation | OpenAPI/Swagger | Built-in | API documentation |
| Error Tracking | Sentry | 1.40.6+ | Error reporting |
| Email Delivery | emails | 0.6+ | Email notifications |
| Testing | pytest | 7.4.3+ | Unit and integration testing |
| Linting | ruff | 0.2.2+ | Code quality |
| Type Checking | mypy | 1.8.0+ | Static type checking |
| Task Queue | Celery | 5.3.0+ | Asynchronous task processing |
| Message Broker | RabbitMQ | 3.12+ | Task distribution |
| Stream Processing | Apache Kafka | 3.4+ | Real-time data streaming |
| NLP Processing | spaCy + Transformers | 3.6+ / 4.28+ | Content analysis |
| Component | Technology | Version | Purpose | MVP Status |
|-----|---|---|---|---|
| Framework | FastAPI | 0.114.2+ | Web API framework | ✅ Included |
| ORM | SQLModel | 0.0.21+ | Database ORM | ✅ Included |
| Primary Database | PostgreSQL | 13+ | Relational database | ✅ Included |
| Document Database | MongoDB | 6.0+ | Social media content storage | ✅ Included |
| In-memory Database | Redis | 7.0+ | Caching and real-time operations | ❌ **NOT in MVP** |
| Vector Database | Pinecone | Latest | Semantic content analysis | ✅ Included |
| Authentication | JWT | 2.8.0+ | User authentication | ✅ Included |
| Password Hashing | Passlib + Bcrypt | 1.7.4+ | Secure password storage | ✅ Included |
| Dependency Management | uv | 0.5.11+ | Package management | ✅ Included |
| Migrations | Alembic | 1.12.1+ | Database schema migrations | ✅ Included |
| API Documentation | OpenAPI/Swagger | Built-in | API documentation | ✅ Included |
| Error Tracking | Sentry | 1.40.6+ | Error reporting | ✅ Included |
| Email Delivery | emails | 0.6+ | Email notifications | ✅ Included |
| Testing | pytest | 7.4.3+ | Unit and integration testing | ✅ Included |
| Linting | ruff | 0.2.2+ | Code quality | ✅ Included |
| Type Checking | mypy | 1.8.0+ | Static type checking | ✅ Included |
| Task Queue | Celery | 5.3.0+ | Asynchronous task processing | ❌ **NOT in MVP** |
| Message Broker | RabbitMQ | 3.12+ | Task distribution | ❌ **NOT in MVP** |
| Stream Processing | Apache Kafka | 3.4+ | Real-time data streaming | ❌ **NOT in MVP** |
| NLP Processing | spaCy + Transformers | 3.6+ / 4.28+ | Content analysis | ✅ Included |

## 2. Architecture

Expand All @@ -51,7 +53,7 @@ Client Request → API Layer → Service Layer → Repository Layer → Database
### 2.3 Directory Structure

```
/app
backend/app
├── api/ # API endpoints and routing
│ ├── api_v1/ # API version 1
│ │ ├── endpoints/ # Resource endpoints
Expand All @@ -68,15 +70,13 @@ Client Request → API Layer → Service Layer → Repository Layer → Database
├── schemas/ # Pydantic models for API
├── services/ # Business logic
│ └── repositories/ # Data access layer
├── tasks/ # Celery tasks for background processing
│ ├── scraping/ # Social media scraping tasks
│ ├── analysis/ # Content analysis tasks
│ └── notifications/ # Alert and notification tasks
├── tasks/ # Task processing system (MVP version)
│ ├── task_manager.py # In-memory task management
│ ├── task_types.py # Task type definitions
│ └── README.md # Task system documentation
├── processing/ # Data processing components
│ ├── models/ # ML model wrappers
│ ├── streams/ # Kafka stream processors
│ └── embeddings/ # Vector embedding generators
├── worker.py # Celery worker configuration
└── main.py # Application entry point
```

Expand All @@ -86,14 +86,14 @@ Client Request → API Layer → Service Layer → Repository Layer → Database

The application employs a hybrid database architecture to address the diverse data requirements of political social media analysis:

| Component | Technology | Version | Purpose |
|-----------|------------|---------|---------|
| Relational Database | PostgreSQL | 13+ | Entity data and relationships |
| Document Database | MongoDB | 6.0+ | Social media content and engagement |
| In-memory Database | Redis | 7.0+ | Caching and real-time operations |
| Vector Database | Pinecone | Latest | Semantic similarity analysis |
| Component | Technology | Version | Purpose | MVP Status |
|-----|---|---|---|---|
| Relational Database | PostgreSQL | 13+ | Entity data and relationships | ✅ Included |
| Document Database | MongoDB | 6.0+ | Social media content and engagement | ✅ Included |
| In-memory Database | Redis | 7.0+ | Caching and real-time operations | ❌ **NOT in MVP** |
| Vector Database | Pinecone | Latest | Semantic similarity analysis | ✅ Included |

Refer to `database-architecture.mdc` for detailed implementation specifications.
Refer to `database-architecture.md` for detailed implementation specifications.

### 3.2 Primary Domain Models

Expand All @@ -112,14 +112,14 @@ Refer to `database-architecture.mdc` for detailed implementation specifications.

### 3.4 Additional Dependencies

| Dependency | Version | Purpose |
|------------|---------|---------|
| motor | 3.2.0+ | Async MongoDB driver |
| redis | 4.6.0+ | Redis client |
| pinecone-client | 2.2.1+ | Pinecone Vector DB client |
| pymongo | 4.5.0+ | MongoDB client |
| Dependency | Version | Purpose | MVP Status |
|---|---|---|---|
| motor | 3.2.0+ | Async MongoDB driver | ✅ Included |
| redis | 4.6.0+ | Redis client | ❌ **NOT in MVP** |
| pinecone-client | 2.2.1+ | Pinecone Vector DB client | ✅ Included |
| pymongo | 4.5.0+ | MongoDB client | ✅ Included |

Refer to `data-processing-architecture.mdc` for details on processing pipelines and analysis components.
Refer to `data-processing-architecture.md` for details on processing pipelines and analysis components.

## 4. API Design

Expand Down Expand Up @@ -227,4 +227,33 @@ Error responses:

- Modular service architecture
- Clear separation of concerns
- Version-prefixed API endpoints
- Version-prefixed API endpoints

## 10. Task Processing System (MVP)

### 10.1 MVP Implementation

The MVP version uses a simplified approach for task processing:

- **TaskManager**: In-memory task management system
- **FastAPI BackgroundTasks**: Used for asynchronous execution
- **Task Status Tracking**: Maintains task state (pending, running, completed, failed)
- **Simple API**: Endpoints for task creation, status checking, and listing

### 10.2 MVP Limitations

- **No Persistent Storage**: Tasks stored in memory only, lost on server restart
- **No Distributed Processing**: All tasks run on the same server instance
- **No Scheduled Tasks**: No mechanism for recurring tasks
- **No Task Queue**: Tasks execute in the order they're received
- **Limited Scaling**: Cannot handle high volume of concurrent tasks

### 10.3 Post-MVP Task Processing

In future versions beyond MVP, the system will be upgraded to:

- **Celery**: For robust task queue system
- **Redis**: For task result storage and caching
- **RabbitMQ**: For reliable message broker
- **Scheduled Tasks**: For recurring operations
- **Distributed Processing**: For scalable task execution
Loading
Loading