Skip to content

TweetPulse Pro: Real-time Twitter sentiment analytics using Kafka, Spark (PySpark), MongoDB, Flask API, and a Django dashboard. Docker-ready.

License

Notifications You must be signed in to change notification settings

Manula-Fernando/TweetPulse-Pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ TweetPulse Pro - Advanced Real-Time Twitter Sentiment Analytics

Python Docker Performance AI Release API Image Producer Image Consumer Image Dashboard Image

🎯 Overview

TweetPulse Pro is a cutting-edge, high-performance real-time Twitter sentiment analysis platform that processes tweets with lightning speed and accuracy. Built with a modern black & green aesthetic and optimized for enterprise use.

Project Architecture

Overview

TweetPulse Pro is a modern, production-ready analytics platform for real-time sentiment analysis of Twitter data. It leverages industry-standard technologiesβ€”Apache Kafka, Apache Spark, MongoDB, Django, Flask REST API, and Dockerβ€”to deliver scalable, reliable, and extensible analytics and visualization.

Author: Manula Fernando
Last Updated: August 15, 2025


Key Features

  • Real-Time Data Pipeline: Kafka ingests tweets, Spark Streaming processes and classifies sentiment, MongoDB stores results.
  • RESTful Analytics API: Flask-based API exposes analytics endpoints for dashboards and external integrations.
  • Modern Dashboard: Django + Bootstrap 5 dashboard with advanced, interactive visualizations (Chart.js, matplotlib, seaborn).
  • Modular, Configurable Code: All scripts use YAML config, logging, and CLI overrides for easy customization and deployment.
  • Full Docker Orchestration: One-command startup with Docker Compose for all services (Kafka, Zookeeper, MongoDB, Producer, Consumer, API, Dashboard).
  • Production-Ready Practices: Error handling, logging, environment variables, and clear separation of concerns.

Repository Structure

Real-Time-Twitter-Sentiment-Analysis/
β”œβ”€β”€ tweetpulse-dashboard/       # Django dashboard (Bootstrap, Chart.js, user features)
β”‚   β”œβ”€β”€ manage.py
β”‚   β”œβ”€β”€ dashboard/              # Django app code
β”‚   β”œβ”€β”€ templates/              # HTML templates
β”‚   └── logistic_regression_model.pkl/  # Model for dashboard
β”œβ”€β”€ tweetpulse-pipeline/        # Kafka producer & Spark consumer (YAML-configurable)
β”‚   β”œβ”€β”€ kafka_producer.py
β”‚   β”œβ”€β”€ kafka_spark_consumer.py
β”‚   β”œβ”€β”€ producer_config.yaml
β”‚   β”œβ”€β”€ consumer_config.yaml
β”‚   β”œβ”€β”€ analytics_api.py        # Flask REST API for analytics
β”‚   β”œβ”€β”€ Dockerfile.producer
β”‚   β”œβ”€β”€ Dockerfile.consumer
β”‚   β”œβ”€β”€ Dockerfile.api
β”‚   └── docker-compose.analytics.yml
β”œβ”€β”€ tweetpulse-ml-model/        # Jupyter notebooks, datasets, trained models
β”‚   β”œβ”€β”€ Big_Data.ipynb
β”‚   β”œβ”€β”€ twitter_training.csv
β”‚   β”œβ”€β”€ twitter_validation.csv
β”‚   └── logistic_regression_model.pkl/
β”œβ”€β”€ imgs/                       # Architecture and dashboard images
β”‚   β”œβ”€β”€ Flow_DIagram.png
β”‚   β”œβ”€β”€ Dashboard_1.png, Dashboard_2.png, Dashboard_3.png, Dashboard_4.png
β”‚   β”œβ”€β”€ Login_Page.png, Register_Page.png
β”‚   β”œβ”€β”€ MongoDB_Connection.png, Docker_Container.png
β”‚   └── Confusion_matrix.png, Text_Classifer.png
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ zk-single-kafka-single.yml  # Kafka/Zookeeper Docker Compose
└── README.md                   # Project documentation

Quick Start (Recommended: Dockerized Workflow)

Recommended: Use Docker Compose for a reproducible, production-like environment. All dependencies and services are containerized.

1. Prerequisites

2. Clone the Repository

git clone <your-repo-url>
cd TweetPulse-Pro

3. Build and Start the Full Analytics Stack

docker compose -f tweetpulse-pipeline/docker-compose.analytics.yml up --build

This will launch:

  • Zookeeper & Kafka (real-time ingestion)
  • MongoDB (storage)
  • Producer (tweets to Kafka)
  • Consumer (Spark streaming, sentiment analysis)
  • REST API (analytics endpoints)
  • Django Dashboard (visualization)

4. Access the Platform


Full Setup & Manual Steps (For Advanced Users)

1. Python Environment (Windows)

  • Install Python 3.10+ and create a virtual environment:
    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    pip install -r requirements.txt

2. Kafka & Zookeeper

  • Start with Docker Compose:
    docker compose -f zk-single-kafka-single.yml up -d

3. MongoDB

  • Start MongoDB (Docker or local install). Use MongoDB Compass for GUI.

4. Producer & Consumer

  • Edit tweetpulse-pipeline/producer_config.yaml and consumer_config.yaml as needed.
  • Run producer:
    python tweetpulse-pipeline/kafka_producer.py --config tweetpulse-pipeline/producer_config.yaml
  • Run consumer:
    $env:JAVA_HOME = "C:\\Program Files\\Java\\jdk-17"  # adjust if needed
    $env:PATH = "$env:JAVA_HOME\bin;$env:PATH"
    python tweetpulse-pipeline/kafka_spark_consumer.py --config tweetpulse-pipeline/consumer_config.yaml

5. Analytics API

  • Run Flask API:
    python tweetpulse-pipeline/analytics_api.py

6. Django Dashboard

  • Collect static files:
    python tweetpulse-dashboard/manage.py collectstatic --noinput
  • Run server:
    python tweetpulse-dashboard/manage.py runserver

Notes for Windows

  • Ensure Docker Desktop is running and WSL2 backend is enabled.
  • If running services outside Docker, install Java 17 (required by Spark) and set JAVA_HOME.
  • If Kafka inside Docker and apps on host, use localhost:9092. If apps inside Docker, they use kafka:9092 via compose.


Advanced Usage & Manual Workflow

For development, debugging, or custom deployments, you can run individual services/scripts manually. See each folder's README or script docstrings for details.


Best Practices & Industry Standards

  • Containerization: All services are Dockerized for reproducibility and scalability.
  • Configuration Management: Use YAML config files and environment variables for all scripts/services.
  • Logging & Monitoring: All components use structured logging; integrate with ELK/Prometheus for production.
  • Modular Codebase: Producer, consumer, and API are fully modular and independently deployable.
  • Security: Never commit secrets; use .env files and Docker secrets for credentials.
  • Testing: Unit/integration tests recommended for all modules (see /tests if present).
  • Documentation: Keep this README and all configs up to date; use docstrings and comments in code.
  • Naming Consistency: Use the project name "TweetPulse Pro" in all documentation, scripts, and UI for clarity and branding.
  • Author: Manula Fernando (2025)

Data & Model


Screenshots

Dashboard Home

Sentiment Distribution & Trends

Docker Containers

MongoDB Connection

Model Confusion Matrix

Author

  • Manula Fernando

For previous contributors and academic context, see project history.


Support & Contribution

  • Open issues or pull requests for improvements, bugfixes, or questions.
  • For custom deployments, advanced analytics, or consulting, contact the author via GitHub.

Happy coding! Explore, extend, and build on TweetPulse Pro for your own analytics needs.

About

TweetPulse Pro: Real-time Twitter sentiment analytics using Kafka, Spark (PySpark), MongoDB, Flask API, and a Django dashboard. Docker-ready.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages