Data Science Project: NLP for Disaster Tweets

This repository contains code and data for a data science project focused on using natural language processing techniques to analyze tweets related to disasters and detect the positive cases.

Data Source

The dataset used for this project is the Natural Language Processing with Disaster Tweets competition dataset from Kaggle. It consists of a collection of tweets related to disasters, where each tweet is pre-labelled whether it is related to a real disaster.

The dataset includes 4,342 tweets labeled as not disaster-related and 3,271 labeled as disaster-related. The tweets cover a range of disaster events, including natural disasters like hurricanes and earthquakes, as well as man-made disasters like terrorist attacks and oil spills.

Project Goals

The main goals of this project are:

To perform exploratory data analysis on the dataset to gain insights into the characteristics of disaster-related tweets.
To perform topic modeling on the dataset to identify common themes and topics in disaster-related tweets.
To build a classification model to detect whether a given tweet is related to a real disaster or not.

Exploratory Data Analysis

In the exploratory data analysis phase, we'll examine the dataset to gain insights into the characteristics of disaster-related tweets. This will include analyzing the distribution of disaster-related tweets, examining the most common words and phrases used in disaster-related tweets, looking for patterns in the data that could be useful for building a classification model, and possibly scrape for real-time data for retraining purposes (closed-loop feedback).

Topic Modeling

In the topic modeling phase, we'll use unsupervised machine learning techniques to identify common themes and topics in disaster-related tweets. This will involve using algorithms like Latent Dirichlet Allocation (LDA) to identify groups of words that tend to appear together in disaster-related tweets.

Classification Model

In the classification model phase, we'll use supervised machine learning techniques to build a model that can detect whether a given tweet is related to a real disaster or not. We'll use a variety of feature extraction techniques, including bag-of-words and word embeddings, and evaluate the performance of different algorithms, such as Naive Bayes, Logistic Regression, and Random Forest.

Usage

To use the code in this repository, you'll need to have the following software installed:

Python 3
Jupyter Notebook
The required Python packages (listed in requirements.txt)

To get started, simply clone this repository to your local machine and run the Jupyter Notebook file (.ipynb) files. This file contains all the code necessary to perform the project tasks outlined above.

Results

The results of this project will be presented in a blog post, which will be linked to in this README once it is published. We expect that the project will provide valuable insights into the characteristics of disaster-related tweets and will produce a classification model that can accurately detect real disaster tweets. This could be of great help to businesses in the disaster relief space, allowing them to provide physical and psychological help more efficiently and quickly.

Conclusion

Overall, the goal of this project is to use natural language processing techniques to analyze tweets related to disasters and detect the positive cases. We hope that this project will provide useful insights and models that can be applied in real-world disaster relief efforts.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
License		License
README.md		README.md
classification_model.ipynb		classification_model.ipynb
dashboard_idea.ipynb		dashboard_idea.ipynb
disastertweets_train.csv		disastertweets_train.csv
eda.ipynb		eda.ipynb
eda_v2.ipynb		eda_v2.ipynb
requirements.txt		requirements.txt
scrape.ipynb		scrape.ipynb
stacking_model.sav		stacking_model.sav
stacking_model_2.sav		stacking_model_2.sav
submission.csv		submission.csv
test.csv		test.csv
topic_modeling_v2.ipynb		topic_modeling_v2.ipynb
topic_modelliing.ipynb		topic_modelliing.ipynb
train.csv		train.csv
train.pkl		train.pkl
train_topic.csv		train_topic.csv
tweets.json		tweets.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science Project: NLP for Disaster Tweets

Data Source

Project Goals

Exploratory Data Analysis

Topic Modeling

Classification Model

Usage

Results

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

guidesify/DataSci_Crisis

Folders and files

Latest commit

History

Repository files navigation

Data Science Project: NLP for Disaster Tweets

Data Source

Project Goals

Exploratory Data Analysis

Topic Modeling

Classification Model

Usage

Results

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages