This project performs sentiment classification on Twitter data using TF-IDF vectorization and machine learning classifiers. The notebook includes data cleaning, vectorization, training, and performance evaluation using classic models like Logistic Regression and Naive Bayes.
TFIDF-Sentiment-Analysis/
├── TF-IDFSentimentAnalysis.ipynb # Main notebook
├── NLP TF-IDF Sentiment Analysis.pdf # Project summary
├── sample_sentiment_dataset.csv # 5K sample of original dataset
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .gitignore # Git exclusion rules
The original dataset is from Kaggle Sentiment140, containing over 1.6M tweets labeled with sentiment polarity (0 = Negative, 2 = Neutral, 4 = Positive).
⚠️ Due to GitHub file size limits, only a 5,000-row sample is included assample_sentiment_dataset.csvfor testing and demo purposes.
Each record includes:
- Polarity (0/2/4)
- Tweet ID
- Date
- Query
- Username
- Cleaned tweet text
-
Text cleaning (lowercasing, punctuation, stopwords removal)
-
TF-IDF vectorization with
TfidfVectorizer -
Machine learning classifiers:
- Logistic Regression
- Naive Bayes
- SVM (optional)
-
Evaluation:
- Accuracy, precision, recall, F1-score
- Confusion matrix
- Visual plots
git clone https://github.com/your-username/TFIDF-Sentiment-Analysis.git
cd TFIDF-Sentiment-Analysispip install -r requirements.txtjupyter notebook TF-IDFSentimentAnalysis.ipynbMIT License © 2025 Sanjana Shah
Sanjana Shah
✨ Machine Learning & Generative AI Enthusiast
📫 Connect on LinkedIn
GitHub: @shahsanjanav
⭐ If you like this project, consider starring it on GitHub!