TO414FinalProj/FinalProject.Rmd at main · akanksharai1411/TO414FinalProj · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "ClassWork1106"
author: "Akanksha Rai"
date: "2025-11-06"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

### Step 0: What's the point, what, why ?

The regional manager at Meijer’s is facing a challenge during the most important revenue-generating period of the year, Black Friday. Current marketing approaches  are failing to capitalize on this opportunity, and the company currently lacks the sophistication to identify which customers are most likely to respond to promotional offers, resulting in inefficient spending and missed revenue opportunities. Our model and analysis aims to help transform Meijer's Black Friday marketing from a generalized, mass-market broadcast approach into a precision-targeted, customer-centric strategy.


We will be creating 6 different individual model and combine the result into one model for a stacked model.

### Step 1 - Read Data

```{r}
marketing_campaign <- read.csv("marketing_campaign.csv", sep = "\t", stringsAsFactors = TRUE)
summary(marketing_campaign)
```

### Step 2: Clean data

```{r}
marketing_campaign$Income <- ifelse(is.na(marketing_campaign$Income),
                                    mean(marketing_campaign$Income, na.rm = T), marketing_campaign$Income)

marketing_campaign$Dt_Customer <- as.character(marketing_campaign$Dt_Customer)

marketing_campaign$Year_Customer <- substr(marketing_campaign$Dt_Customer,
                                  nchar(marketing_campaign$Dt_Customer) - 3,
                                  nchar(marketing_campaign$Dt_Customer))

marketing_campaign$Dt_Customer <- NULL
marketing_campaign$ID <- NULL

marketing_campaign$Year_Customer <- as.factor(marketing_campaign$Year_Customer)

marketing_campaign$Z_CostContact <- NULL
marketing_campaign$Z_Revenue <- NULL

summary(marketing_campaign)

marketing_campaign_dummies <- as.data.frame(model.matrix(~ . -1, data = marketing_campaign))

minmax <- function(x){
  (x - min(x))/(max(x) - min(x))
}

marketing_s <- as.data.frame(lapply(marketing_campaign_dummies, minmax))

```

### Step 3 - Train/Test Split

```{r}
train_ratio <- 0.5
set.seed(12345)
train_rows <- sample(1:nrow(marketing_s), train_ratio*nrow(marketing_s))

marketing_train <- marketing_s[train_rows, ]
marketing_test <- marketing_s[-train_rows, ]

```

Our team has build the individual models in separate files for cleanness. For each individual model, we try to improve the model as much as we can. This way we get to combine the best result into our second level model, ensuring for a final model that has the best performance.

Here's what we will use toward the second level model from the individual model

-   Logistic Regression: "lr_prob_interactions"

-   KNN Model: "knn_prob_1"

-   ANN Model: "ann_probs"

-   Decision Tree: "dtree_probs_1"

-   SVM: "svm_probs_1"

-   Random Forest: "rf_prob_1"

### Step 4 & 5: Stacked Model

```{r}
combined <- data.frame(
  glm = as.numeric(lr_prob_interactions),
  knn = as.numeric(knn_prob_1),
  ann = as.numeric(ann_probs),
  dtree = as.numeric(dtree_probs_1),
  svm = as.numeric(svm_probs_1),
  rf = as.numeric(rf_prob_1),
  actual = as.numeric(marketing_test$Response)
)
```

```{r}
set.seed(12345)
combine_index <- sample(1:nrow(combined), 0.7 * nrow(combined))
combined_train <- combined[combine_index, ]
combined_test  <- combined[-combine_index, ]
```


```{r}
stacked_model <- C5.0(as.factor(actual) ~ ., data = combined_train)
summary(stacked_model)

plot(stacked_model)
```

```{r}
stacked_pred <- predict(stacked_model, combined_test)

confusionMatrix(as.factor(stacked_pred), as.factor(combined_test$actual), positive = "1")
```
Cost Matrix: Our marketing campaign is target towards the true positive. Thus we want to increase the number as much as possible. To do this we will be using a cost matrix, putting a heavier cost on the false negative.

```{r, warning = FALSE }
cost_01_values <- c(1, 2, 3)
cost_10_values <- c(2,3,4, 5, 6)

results <- data.frame(cost_01 = numeric(),
                      cost_10 = numeric(),
                      Accuracy = numeric(),
                      Kappa = numeric(),
                      Sensitivity = numeric())

for (c01 in cost_01_values) {
  for (c10 in cost_10_values) {
    if (c10 > c01) {
      cost_matrix <- matrix(c(0, c01,
                              c10, 0), nrow = 2)
      cost_model <- C5.0(as.factor(actual) ~ ., data = combined_train, cost = cost_matrix)
      pred <- predict(cost_model, combined_test)
      cm <- confusionMatrix(as.factor(pred), as.factor(combined_test$actual), positive = "1")
      results <- rbind(results, data.frame(cost_01 = c01,
                                           cost_10 = c10,
                                           Accuracy = cm$overall['Accuracy'],
                                           Kappa = cm$overall['Kappa'],
                                           Sensitivity = cm$byClass['Sensitivity']))
    }
  }
}

print(results)
```

```{r}
#Cost Matrix
cost_matrix <- matrix(c(0,1,4,0), nrow = 2)

#Cost Model
cost_stacked <- C5.0(as.factor(actual) ~ ., data = combined_train, cost = cost_matrix)

plot(cost_stacked)

cost_pred <- predict(cost_stacked, combined_test)
cost_cm <- confusionMatrix(as.factor(cost_pred), as.factor(combined_test$actual), positive = "1")
cost_cm
```