Name	Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints	.ipynb_checkpoints
ML_Assignment_2.ipynb	ML_Assignment_2.ipynb
ProblemStatement.pdf	ProblemStatement.pdf
README.md	README.md
Spam Filtering using SVM.ipynb	Spam Filtering using SVM.ipynb
alpha_constraints_SVM.png	alpha_constraints_SVM.png
spambase.DOCUMENTATION	spambase.DOCUMENTATION
spambase.data	spambase.data
spambase.names	spambase.names
svm-notes-long-08.pdf	svm-notes-long-08.pdf
test.py	test.py
test.txt	test.txt

Name

Last commit message

Last commit date

.ipynb_checkpoints

ML_Assignment_2.ipynb

ProblemStatement.pdf

README.md

Spam Filtering using SVM.ipynb

alpha_constraints_SVM.png

spambase.DOCUMENTATION

spambase.data

spambase.names

svm-notes-long-08.pdf

test.py

test.txt

Spam mail detection using SVM

Use SVM to classify emails into spam or non-spam categories and report the classification accuracy for various SVM parameters and kernel functions.

Dataset

spambase.data :
- Number of Instances: 4601 (1813 Spam = 39.4%)
- Number of Attributes: 58 (57 continuous, 1 nominal class label)
- Class Distribution: Spam 1813 (39.4%) Non-Spam 2788 (60.6%)

More info

Code walkthrough

Importing libraries
Reading dataset and preprocessing
- Null checking
- Class symbol conversion (1,-1)
- Train-Test split (70:30)
- Normalize based on mean and variance of train split
Model building
- Class for SVM with functions for training and testing
- params: kernel function (linear, poly, RBF) , soft margin constant
- methods:
  - fit(X,y) : Solves dual equation of SVM and stores weights and bias of separator
  - project(X) : To project data points using obtained weights and bias
  - predict(X) : Sign function to specify class label
Model training and testing
Comparing with sklearn library function
Visualizing decision boundaries by performing PCA on data

Steps to run the code

Install Jupyter Notebook or use Google Colab.
Open the file ML_Assignment_2.ipynb in Jupyter Notebook or Google Colab.
Run all the cells.

Libraries used:

pandas
numpy
matplotlib
sklearn
seaborn

Results (Accuracies)

Our SVM results

Kernel	C1	C2	C3
	1	10	100
linear	0.923968	0.923244	0.919623
poly	0.902969	0.897900	0.898624
rbf	0.837075	0.849385	0.847212

Scikit learn results

Kernel	C1	C2	C3
	1	10	100
linear	0.923968	0.923244	0.918899
poly	0.843592	0.924692	0.915279
rbf	0.934830	0.934106	0.920348

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Spam mail detection using SVM

Dataset

Code walkthrough

Steps to run the code

Libraries used:

Results (Accuracies)

FilesExpand file tree

SVM

Directory actions

More options

Directory actions

More options

Latest commit

History

SVM

Folders and files

parent directory

README.md

Spam mail detection using SVM

Dataset

Code walkthrough

Steps to run the code

Libraries used:

Results (Accuracies)