Merge pull request #2442 from andoriyaprashant/branch12

avinashkranjan · web-flow · commit 40f7843a6433 · 2023-08-09T16:19:11.000+05:30
Fraud Detection Script Added
diff --git a/Fraud Detection/README.md b/Fraud Detection/README.md
@@ -0,0 +1,48 @@
+# Fraud Detection Script
+
+This Python script uses machine learning algorithms to detect fraudulent transactions or activities based on historical data. It leverages the power of Random Forest classifier and handles class imbalance using SMOTE. The dataset is preprocessed with feature scaling to improve model performance.
+
+## Requirements
+
+To run this script, you need the following dependencies:
+
+- Python 3
+- pandas
+- scikit-learn
+- imbalanced-learn
+- numpy
+
+You can install the required dependencies using pip:
+
+```bash
+pip install pandas scikit-learn imbalanced-learn numpy
+```
+
+## Usage
+
+1. Prepare your dataset: Ensure your historical data is in a CSV format with a 'Class' column containing the target variable (0 for normal, 1 for fraudulent).
+
+2. Place your dataset: Place your data file named `data.csv` in the same directory as the `fraud_detection.py` script.
+
+3. Run the script: Execute the script using the following command:
+
+```bash
+python fraud_detection.py
+```
+
+4. Results: The script will print the accuracy, confusion matrix, and classification report for the fraud detection model.
+
+## Data Preprocessing
+
+The script performs the following preprocessing steps on the dataset:
+
+- Standard Scaling: All features are standardized to have a mean of 0 and standard deviation of 1 to improve model performance.
+
+- Class Imbalance Handling: SMOTE (Synthetic Minority Over-sampling Technique) is used to handle class imbalance by generating synthetic samples for the minority class.
+
+## Model
+
+The script uses a Random Forest classifier with 100 trees for fraud detection. The model is trained on the resampled data after applying SMOTE.
+
+Feel free to modify the script according to your dataset and experiment with different machine learning models for comparison.
+
diff --git a/Fraud Detection/fraud_detection.py b/Fraud Detection/fraud_detection.py
@@ -0,0 +1,37 @@
+import pandas as pd
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
+from imblearn.over_sampling import SMOTE
+from sklearn.preprocessing import StandardScaler
+
+# Load your historical data (CSV file or any other format) into a pandas DataFrame
+# Replace 'data.csv' with the actual file path containing your historical data
+data = pd.read_csv('data.csv')
+
+# Data preprocessing
+X = data.drop('Class', axis=1)
+y = data['Class']
+
+# Standardize the features
+scaler = StandardScaler()
+X = scaler.fit_transform(X)
+
+# Split the data into training and testing sets
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+
+# Handle class imbalance using SMOTE
+smote = SMOTE(sampling_strategy='auto', random_state=42)
+X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
+
+# Create and train the Random Forest model
+model = RandomForestClassifier(n_estimators=100, random_state=42)
+model.fit(X_train_resampled, y_train_resampled)
+
+# Make predictions on the test set
+y_pred = model.predict(X_test)
+
+# Evaluate the model
+print("Accuracy:", accuracy_score(y_test, y_pred))
+print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
+print("Classification Report:\n", classification_report(y_test, y_pred))
diff --git a/Fraud Detection/requirements.txt b/Fraud Detection/requirements.txt
@@ -0,0 +1,4 @@
+pandas
+scikit-learn
+imbalanced-learn
+numpy

-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +pandas
 +scikit-learn
 +imbalanced-learn
 +numpy