TO414FinalProj/KNN.Rmd at main · akanksharai1411/TO414FinalProj · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "KNN TO 414 Project"
author: "Angel He"
date: "2025-11-18"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library("tidyverse")
library(neuralnet)
library(caret)
library(class)
library(C50)
library(knitr)
library(kableExtra)
library(kernlab)
library(randomForest)
```

## KNN

This is the rmd file for KNN model. The rmd file will include step 4 and 5. We will be running a basic KKN model and try to improve as much as possible.


## Basic Model

```{r}
set.seed(12345)
knn_basic <- knn(train = marketing_train[, -35],
                 test = marketing_test[, -35],
                 cl = marketing_train[, 35],
                 k = 10)
```

```{r}
knn_basic_cm <- confusionMatrix(as.factor(knn_basic),
                                as.factor(marketing_test[, 35]),
                                positive = "1")

knn_basic_cm
```


## Testing Different K-Value

To improve KNN model, we will be testing different k-value. So for this I will be using a loop function to test different value and compare by the following metrics: accuracy, kappa, sensitivity, specificity.
```{r}
k_values <- seq(1, 20, by = 1)

detailed_results <- list()

for(k in k_values) {
  set.seed(12345)

  knn_pred <- knn(train = marketing_train[, -35],
                  test = marketing_test[, -35],
                  cl = marketing_train[, 35],
                  k = k)

  cm <- confusionMatrix(as.factor(knn_pred),
                       as.factor(marketing_test[, 35]),
                       positive = "1")
  detailed_results[[as.character(k)]] <- list(
    k = k,
    CM = cm,
    Accuracy = cm$overall["Accuracy"],
    Sensitivity = cm$byClass["Sensitivity"],
    Specificity = cm$byClass["Specificity"],
    Kappa = cm$overall["Kappa"]
  )
}

summary_results <- do.call(rbind, lapply(detailed_results, function(x) {
  data.frame(
    k = x$k,
    Accuracy = x$Accuracy,
    Sensitivity = x$Sensitivity,
    Specificity = x$Specificity,
    Kappa = x$Kappa
  )
}))

print(summary_results)
```

Looking the result of this table, I think 6 is a good middle for all the k-value test. It provides the best trade-off between sensitivity (20.0%) and specificity (97.2%). With a Kappa score of 0.233 and 86.2% accuracy, k=6 outperforms alternatives in balanced classification performance.

## Advance KNN Model
```{r}
set.seed(12345)
knn_advanced <- knn(train = marketing_train[, -35],
                    test = marketing_test[, -35],
                    cl = marketing_train[, 35],
                    k = 6,
                    prob = TRUE)
```

```{r}
knn_advanced_cm <- confusionMatrix(as.factor(knn_advanced),
                                  as.factor(marketing_test[, 35]),
                                  positive = "1")
knn_advanced_cm
```

## Creating Value for Combine Data
```{r}
knn_advance_probs <- attr(knn_advanced,"prob")
knn_pred <- knn_advanced
knn_prob_raw <- knn_advance_probs
knn_prob_1 <- ifelse(knn_pred == "1", knn_prob_raw, 1 - knn_prob_raw)

```