Spam mail detection using SVM
Use SVM to classify emails into spam or non-spam categories and report the classification accuracy for various SVM parameters and kernel functions.
spambase.data :
Number of Instances: 4601 (1813 Spam = 39.4%)
Number of Attributes: 58 (57 continuous, 1 nominal class label)
Class Distribution:
Spam 1813 (39.4%)
Non-Spam 2788 (60.6%)
More info
Importing libraries
Reading dataset and preprocessing
Null checking
Class symbol conversion (1,-1)
Train-Test split (70:30)
Normalize based on mean and variance of train split
Model building
Class for SVM with functions for training and testing
params: kernel function (linear, poly, RBF) , soft margin constant
methods:
fit(X,y) : Solves dual equation of SVM and stores weights and bias of separator
project(X) : To project data points using obtained weights and bias
predict(X) : Sign function to specify class label
Model training and testing
Comparing with sklearn library function
Visualizing decision boundaries by performing PCA on data
Install Jupyter Notebook or use Google Colab.
Open the file ML_Assignment_2.ipynb in Jupyter Notebook or Google Colab.
Run all the cells.
pandas
numpy
matplotlib
sklearn
seaborn
Our SVM results
Kernel
C1
C2
C3
1
10
100
linear
0.923968
0.923244
0.919623
poly
0.902969
0.897900
0.898624
rbf
0.837075
0.849385
0.847212
Scikit learn results
Kernel
C1
C2
C3
1
10
100
linear
0.923968
0.923244
0.918899
poly
0.843592
0.924692
0.915279
rbf
0.934830
0.934106
0.920348