This project uses the Sentiment140 dataset, a widely-used collection of 1.6 million tweets labeled with sentiments (positive or negative). The dataset is pre-labeled for binary sentiment classification and is used to train machine learning models to predict the sentiment of unseen tweets.
The sentiment analysis is performed using a Logistic Regression model. This model is trained on preprocessed tweet data, where text features are extracted using techniques like TF-IDF (Term Frequency-Inverse Document Frequency). The trained model predicts whether a tweet expresses a positive or negative sentiment.