Merge pull request #2602 from srujana-16/sentiment_analysis

avinashkranjan · web-flow · commit 36e9f9118f93 · 2023-08-10T08:07:49.000+05:30
Sentiment analysis script added - GSSOC'23
diff --git a/SCRIPTS.md b/SCRIPTS.md
@@ -119,11 +119,11 @@
 | 109\.     | Domain Name Availability Checker | This script is a Python tool that allows you to check the availability of domain names using the GoDaddy API. | [Take Me](./Domain_Name_Availability/) | [Sabhi Sharma](https//github.com/sabhisharma-ise)
 | 110\.     | Automatic Spelling Checker and Corrector | This Script is used to detect spelling errors in a text and correct them if the user wishes to do so. | [Take Me](./Automatic_Spelling_Checker_Corrector/) | [Sabhi Sharma](https//github.com/sabhisharma-ise)
 | 111\.     | File Searcher | The File Search script is a Python tool that allows you to search for files with a specific extension in a directory. It recursively searches through all subdirectories of the specified directory and returns a list of files that match the provided file extension. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/File\Search)  | [Srujana Vanka](https://github.com/srujana-16) 
+| 112\.     | Sentiment Analysis | This Python script performs sentiment analysis on text data using a Support Vector Machine (SVM) classifier. It reads data from a CSV file, preprocesses the text, and trains an SVM model to classify the sentiment of each text into positive or negative. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/Sentiment\Analysis)  | [Srujana Vanka](https://github.com/srujana-16) 
 | 112\.     | Data Scraping | A Python script that retrieves data from various platforms such as social media, weather services, and financial data providers using APIs. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/DataScraping)  | [Shraddha Singh](https://github.com/shraddha761) 
 | 112\.     | Automated Data Reporting | A Python script that automates the process of generating data reports from CSV files. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/AutomatedDataReporting)  | [Shraddha Singh](https://github.com/shraddha761) 
 | 112\.     | Ludo Game | This python script will create a ludo game for the user to interact with it and play the game and enjoy the game. | [Take Me](./Ludo_Game) | [Avdhesh Varshney](https://github.com/Avdhesh-Varshney) 
 | 112\.     | Web Server Log Analysis Script | A Python script to parse and analyze web server logs to extract useful information such as visitor statistics, popular pages, and potential security threats. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/WebServer)  | [Shraddha Singh](https://github.com/shraddha761) 
 | 112\.     | Open Websites By Speaking | This python script can allow user to open any website just by speaking. | [Take Me](./Websites_Automation) | [Avdhesh Varshney](https://github.com/Avdhesh-Varshney) 
 | 112\.     | Advisor App | This python script allows user to give so many advices and motivation quote lines. | [Take Me](./Advisor_App) | [Avdhesh Varshney](https://github.com/Avdhesh-Varshney) 
 | 113\.      | Fake News Detection | The Fake News Detection is Python based ML script which allows you to check if your news article is Real or Fake. | [Take me](https://github.com/Parul1606/Amazing-Python-Scripts/tree/testing/Fake-News-Detection) | [Parul Pandey](https://github.com/Parul1606)
-
diff --git a/Sentiment Analysis/README.md b/Sentiment Analysis/README.md
@@ -0,0 +1,30 @@
+# Sentiment Analysis using Support Vector Machine (SVM)
+
+This Python script performs sentiment analysis on text data using a Support Vector Machine (SVM) classifier. It reads data from a CSV file, preprocesses the text, and trains an SVM model to classify the sentiment of each text into positive or negative.
+
+## Requirements
+
+- Python 3.x
+- scikit-learn
+- numpy
+- pandas
+
+Install the required libraries using the following command:
+`pip install scikit-learn numpy pandas` 
+
+## Usage
+
+1. Prepare your data: Create a CSV file (`data.csv`) with two columns: 'text' containing the text data (sentences, reviews, etc.), and 'label' containing the corresponding sentiment labels (e.g., positive or negative).
+
+2. Run the script: Execute the Python script `Sentiment_Analysis.py` to perform sentiment analysis on the data.
+
+
+## Output
+
+The script will print the accuracy and classification report of the SVM model on the test set.
+
+## Author(s)
+
+Srujana
+
+
diff --git a/Sentiment Analysis/Sentiment_Analysis.py b/Sentiment Analysis/Sentiment_Analysis.py
@@ -0,0 +1,63 @@
+import pandas as pd
+import numpy as np
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.model_selection import train_test_split
+from sklearn.svm import SVC
+from sklearn.metrics import accuracy_score, classification_report
+
+
+def sentiment_analysis():
+    """
+    Perform sentiment analysis using an SVM classifier.
+
+    The function reads the data from a CSV file, preprocesses it, and trains an SVM classifier
+    for sentiment analysis on the 'text' column with the 'label' column as the target.
+
+    Prints the accuracy and classification report on the test data.
+    """
+    # Load data from a CSV file (replace 'data.csv' with your data file)
+    data = pd.read_csv('data.csv')
+
+    # Preprocess data (remove any special characters, convert to lowercase, etc.)
+    data['text'] = data['text'].apply(preprocess_text)
+
+    # Split the data into features (X) and labels (y)
+    X = data['text']
+    y = data['label']
+
+    # Convert text data to numerical features using TF-IDF
+    vectorizer = TfidfVectorizer()
+    X = vectorizer.fit_transform(X)
+
+    # Split the data into training and testing sets
+    X_train, X_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42)
+
+    # Train an SVM classifier
+    svm_classifier = SVC(kernel='linear')
+    svm_classifier.fit(X_train, y_train)
+
+    # Make predictions on the test set
+    y_pred = svm_classifier.predict(X_test)
+
+    # Calculate and print accuracy and classification report
+    accuracy = accuracy_score(y_test, y_pred)
+    print("Accuracy:", accuracy)
+    print("Classification Report:")
+    print(classification_report(y_test, y_pred, zero_division=1))
+
+
+def preprocess_text(text):
+    # Replace special characters with spaces
+    text = text.replace('\n', ' ')
+    text = text.replace('\t', ' ')
+    text = text.replace('-', ' ')
+
+    # Convert to lowercase
+    text = text.lower()
+
+    return text
+
+
+if __name__ == '__main__':
+    sentiment_analysis()
diff --git a/Sentiment Analysis/data.csv b/Sentiment Analysis/data.csv
@@ -0,0 +1,11 @@
+text,label
+"I love this product!",positive
+"This is amazing!",positive
+"Terrible experience. Would not recommend.",negative
+"Not bad, but not great either.",neutral
+"The best purchase I've made!",positive
+"I regret buying this.",negative
+"This is fantastic!",positive
+"Not the best, but okay.",neutral
+"Great value for money.",positive
+"Awful quality. Do not buy.",negative