-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcomscore.html
More file actions
118 lines (113 loc) · 5.27 KB
/
comscore.html
File metadata and controls
118 lines (113 loc) · 5.27 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Basil Latif Portfolio</title>
<meta name="description" content="BL Comscore Project" />
<link rel="stylesheet" href="styles.css" />
</head>
<div align="center">
<h1>Web Behavior Dataset: Demographics, Transcations, and Web Sessions</h1>
</div>
<div class="topnav">
<a href="index.html">Home</a>
<a href="articles.html">Data Science Projects</a>
<a href="teaching.html">Teaching Portfolio</a>
<a href="freelancing.html">Consulting Portfolio</a>
<a href="books.html">Top 20 Books</a>
<a href="contact.html">Contact</a>
</div><br>
<div align="center">
<h1>Comscore Data Project</h1>
</div>
<div align="center">
<img src="/cscore/comscore-logo.jpg" width="300" height="200" alt="Logo Trim" /><br />
<a class = "two" href="https://www.comscore.com/" target="_blank" rel="noopener noreferrer">Link to Comscore Website</a><br>
</div>
<div align="center">
<h1>Project Goal:</h1>
<p>
A Comscore dataset was given to the project team by the UC Irvine Business School. Given the data, the goal of this project was to understand which factors impacted customers
purchasing behavior, and how we can increase a customers basket total based on
recommending them with items they might be more likely to purchase. Although Comscore provided us
with user data from over 20 different domains (countries), our interest was in the type of
products customers purchased rather than which domain customers purchased on, so domain information was filtered out.
</p>
</div>
<div align="center">
<h1>Exploratory Data Analysis</h1>
<img src="/cscore/pagxdur.jpeg" alt="Views" /><br />
<p>
As we can see here, there is no relationship between pages_viewed and
duration. For the same number of pages_viewed, the duration can be either very
small or very large. One possible confounding factor here is that people may
leave their computers on but not be actively viewing a page but the duration
variable is still tracked.
</p><br>
</div>
<div align="center">
<img src="/cscore/pagxdurxtran.jpeg" alt="Views" /><br />
<p>
We want to further investigate the relationship between pages_viewed and
duration. We join the sessions table with the transactions table in order to
see what the relationships is for people who made purchases. In the above
graph, we see that pages_viewed and duration have a linear relationship
indicating that when people are making a purchase decision. This indicates
that when making a purchase the time spent on a page and the time spent
overall on the website linearly increases when you have a serious buyer.
</p>
</div>
<div align="center">
<h1>Clustering on this Dataset</h1>
<p>
Clustering is a powerful technique to explore patterns structures within data
and has wide applications in business analytics. We decided to use clusters to
break our large dataset into smaller groups of customer segmentation based on
their attributes. Based on the cluster outputs companies can better target on
customer segmentations.
</p>
<p>
We do data cleaning by removing irrelevant columns and making sure data is
consistent. Then, we plot an SSE curve to see how many clusters we should
form. We decide to make 4 and generate the following cluster plot:
</p>
<img src="/cscore/cscore_clust.png" alt="Views" /><br />
<p>
Our cluster output shows that there are 99 observations in cluster 1, 73
observations in cluster 2, 63 observations in cluster 3, and 12 observations
in cluster 4. The computed clusters from the k-mean method shows that three
out of the four clusters have roughly similar numbers of observations.
Generally speaking the clusters have clear boundaries, there are some minor
overlaps between cluster 1 and 2, and between cluster 2 and 3. Cluster 4 is
the most unique group out of all four clusters.
</p>
<p>
We explored the customer segment persona from the clusters by comparing
average duration of visiting sites and household education based on cluster
distribution. Cluster 3 stands out in characterizing users with high school
diplomas who do not spend a long time browsing websites in comparison to
individuals who might be more educated. This means that less educated users
educated might have a lower likelihood to convert and make a transaction on a
website. Cluster 2 has a higher average duration compared with the other
clusters.
</p>
</div>
<div align="center">
<h1>Recommender Using Collaborative Filtering</h1>
<p>
The purpose is to suggest relevant items to users. To achieve this task, there
are three major requirements that need to be satisfied to build a recommender
system model, which are 1) users’ information, 2) the purchased items and 3)
their interaction for each transaction. Collaborative filtering methods for
recommender systems are methods that are based solely on these three
variables. Specifically, it looks through past interactions recorded between
users and items in order to produce new recommendations. Those interactions
can be ratings, reviews and purchase frequency that can represent buyers’
preferences for each transaction.
</p>
</div>
<div align="center">
<a href="https://github.com/shujiangalex/MSBA277-recommender-system"
>Link to GitHub</a><br />
<a href="index.html">Go Back to Home Page</a><br><br>
</div>