TweetPulse Pro is a cutting-edge, high-performance real-time Twitter sentiment analysis platform that processes tweets with lightning speed and accuracy. Built with a modern black & green aesthetic and optimized for enterprise use.
TweetPulse Pro is a modern, production-ready analytics platform for real-time sentiment analysis of Twitter data. It leverages industry-standard technologiesβApache Kafka, Apache Spark, MongoDB, Django, Flask REST API, and Dockerβto deliver scalable, reliable, and extensible analytics and visualization.
Author: Manula Fernando
Last Updated: August 15, 2025
- Real-Time Data Pipeline: Kafka ingests tweets, Spark Streaming processes and classifies sentiment, MongoDB stores results.
- RESTful Analytics API: Flask-based API exposes analytics endpoints for dashboards and external integrations.
- Modern Dashboard: Django + Bootstrap 5 dashboard with advanced, interactive visualizations (Chart.js, matplotlib, seaborn).
- Modular, Configurable Code: All scripts use YAML config, logging, and CLI overrides for easy customization and deployment.
- Full Docker Orchestration: One-command startup with Docker Compose for all services (Kafka, Zookeeper, MongoDB, Producer, Consumer, API, Dashboard).
- Production-Ready Practices: Error handling, logging, environment variables, and clear separation of concerns.
Real-Time-Twitter-Sentiment-Analysis/
βββ tweetpulse-dashboard/ # Django dashboard (Bootstrap, Chart.js, user features)
β βββ manage.py
β βββ dashboard/ # Django app code
β βββ templates/ # HTML templates
β βββ logistic_regression_model.pkl/ # Model for dashboard
βββ tweetpulse-pipeline/ # Kafka producer & Spark consumer (YAML-configurable)
β βββ kafka_producer.py
β βββ kafka_spark_consumer.py
β βββ producer_config.yaml
β βββ consumer_config.yaml
β βββ analytics_api.py # Flask REST API for analytics
β βββ Dockerfile.producer
β βββ Dockerfile.consumer
β βββ Dockerfile.api
β βββ docker-compose.analytics.yml
βββ tweetpulse-ml-model/ # Jupyter notebooks, datasets, trained models
β βββ Big_Data.ipynb
β βββ twitter_training.csv
β βββ twitter_validation.csv
β βββ logistic_regression_model.pkl/
βββ imgs/ # Architecture and dashboard images
β βββ Flow_DIagram.png
β βββ Dashboard_1.png, Dashboard_2.png, Dashboard_3.png, Dashboard_4.png
β βββ Login_Page.png, Register_Page.png
β βββ MongoDB_Connection.png, Docker_Container.png
β βββ Confusion_matrix.png, Text_Classifer.png
βββ requirements.txt # Python dependencies
βββ zk-single-kafka-single.yml # Kafka/Zookeeper Docker Compose
βββ README.md # Project documentation
Recommended: Use Docker Compose for a reproducible, production-like environment. All dependencies and services are containerized.
- Docker Desktop (Windows/Mac/Linux)
- Git
git clone <your-repo-url>
cd TweetPulse-Prodocker compose -f tweetpulse-pipeline/docker-compose.analytics.yml up --buildThis will launch:
- Zookeeper & Kafka (real-time ingestion)
- MongoDB (storage)
- Producer (tweets to Kafka)
- Consumer (Spark streaming, sentiment analysis)
- REST API (analytics endpoints)
- Django Dashboard (visualization)
- Dashboard: http://localhost:8000
- REST API: http://localhost:5000
- MongoDB Compass: Connect to
mongodb://localhost:27017
- Install Python 3.10+ and create a virtual environment:
python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt
- Start with Docker Compose:
docker compose -f zk-single-kafka-single.yml up -d
- Start MongoDB (Docker or local install). Use MongoDB Compass for GUI.
- Edit
tweetpulse-pipeline/producer_config.yamlandconsumer_config.yamlas needed. - Run producer:
python tweetpulse-pipeline/kafka_producer.py --config tweetpulse-pipeline/producer_config.yaml
- Run consumer:
$env:JAVA_HOME = "C:\\Program Files\\Java\\jdk-17" # adjust if needed $env:PATH = "$env:JAVA_HOME\bin;$env:PATH" python tweetpulse-pipeline/kafka_spark_consumer.py --config tweetpulse-pipeline/consumer_config.yaml
- Run Flask API:
python tweetpulse-pipeline/analytics_api.py
- Collect static files:
python tweetpulse-dashboard/manage.py collectstatic --noinput
- Run server:
python tweetpulse-dashboard/manage.py runserver
- Ensure Docker Desktop is running and WSL2 backend is enabled.
- If running services outside Docker, install Java 17 (required by Spark) and set JAVA_HOME.
- If Kafka inside Docker and apps on host, use
localhost:9092. If apps inside Docker, they usekafka:9092via compose.
For development, debugging, or custom deployments, you can run individual services/scripts manually. See each folder's README or script docstrings for details.
- Containerization: All services are Dockerized for reproducibility and scalability.
- Configuration Management: Use YAML config files and environment variables for all scripts/services.
- Logging & Monitoring: All components use structured logging; integrate with ELK/Prometheus for production.
- Modular Codebase: Producer, consumer, and API are fully modular and independently deployable.
- Security: Never commit secrets; use
.envfiles and Docker secrets for credentials. - Testing: Unit/integration tests recommended for all modules (see
/testsif present). - Documentation: Keep this README and all configs up to date; use docstrings and comments in code.
- Naming Consistency: Use the project name "TweetPulse Pro" in all documentation, scripts, and UI for clarity and branding.
- Author: Manula Fernando (2025)
- Dataset: Kaggle Twitter Entity Sentiment Analysis
- ML Model: Trained with PySpark; see
tweetpulse-ml-model/for notebooks and details.
- Manula Fernando
For previous contributors and academic context, see project history.
- Open issues or pull requests for improvements, bugfixes, or questions.
- For custom deployments, advanced analytics, or consulting, contact the author via GitHub.
Happy coding! Explore, extend, and build on TweetPulse Pro for your own analytics needs.





