Skip to content

Commit 40f7843

Browse files
Merge pull request #2442 from andoriyaprashant/branch12
Fraud Detection Script Added
2 parents 1a946b6 + e2cf0ec commit 40f7843

File tree

3 files changed

+89
-0
lines changed

3 files changed

+89
-0
lines changed

Fraud Detection/README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Fraud Detection Script
2+
3+
This Python script uses machine learning algorithms to detect fraudulent transactions or activities based on historical data. It leverages the power of Random Forest classifier and handles class imbalance using SMOTE. The dataset is preprocessed with feature scaling to improve model performance.
4+
5+
## Requirements
6+
7+
To run this script, you need the following dependencies:
8+
9+
- Python 3
10+
- pandas
11+
- scikit-learn
12+
- imbalanced-learn
13+
- numpy
14+
15+
You can install the required dependencies using pip:
16+
17+
```bash
18+
pip install pandas scikit-learn imbalanced-learn numpy
19+
```
20+
21+
## Usage
22+
23+
1. Prepare your dataset: Ensure your historical data is in a CSV format with a 'Class' column containing the target variable (0 for normal, 1 for fraudulent).
24+
25+
2. Place your dataset: Place your data file named `data.csv` in the same directory as the `fraud_detection.py` script.
26+
27+
3. Run the script: Execute the script using the following command:
28+
29+
```bash
30+
python fraud_detection.py
31+
```
32+
33+
4. Results: The script will print the accuracy, confusion matrix, and classification report for the fraud detection model.
34+
35+
## Data Preprocessing
36+
37+
The script performs the following preprocessing steps on the dataset:
38+
39+
- Standard Scaling: All features are standardized to have a mean of 0 and standard deviation of 1 to improve model performance.
40+
41+
- Class Imbalance Handling: SMOTE (Synthetic Minority Over-sampling Technique) is used to handle class imbalance by generating synthetic samples for the minority class.
42+
43+
## Model
44+
45+
The script uses a Random Forest classifier with 100 trees for fraud detection. The model is trained on the resampled data after applying SMOTE.
46+
47+
Feel free to modify the script according to your dataset and experiment with different machine learning models for comparison.
48+

Fraud Detection/fraud_detection.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import pandas as pd
2+
from sklearn.ensemble import RandomForestClassifier
3+
from sklearn.model_selection import train_test_split
4+
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
5+
from imblearn.over_sampling import SMOTE
6+
from sklearn.preprocessing import StandardScaler
7+
8+
# Load your historical data (CSV file or any other format) into a pandas DataFrame
9+
# Replace 'data.csv' with the actual file path containing your historical data
10+
data = pd.read_csv('data.csv')
11+
12+
# Data preprocessing
13+
X = data.drop('Class', axis=1)
14+
y = data['Class']
15+
16+
# Standardize the features
17+
scaler = StandardScaler()
18+
X = scaler.fit_transform(X)
19+
20+
# Split the data into training and testing sets
21+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
22+
23+
# Handle class imbalance using SMOTE
24+
smote = SMOTE(sampling_strategy='auto', random_state=42)
25+
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
26+
27+
# Create and train the Random Forest model
28+
model = RandomForestClassifier(n_estimators=100, random_state=42)
29+
model.fit(X_train_resampled, y_train_resampled)
30+
31+
# Make predictions on the test set
32+
y_pred = model.predict(X_test)
33+
34+
# Evaluate the model
35+
print("Accuracy:", accuracy_score(y_test, y_pred))
36+
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
37+
print("Classification Report:\n", classification_report(y_test, y_pred))

Fraud Detection/requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
pandas
2+
scikit-learn
3+
imbalanced-learn
4+
numpy

0 commit comments

Comments
 (0)