basillatif.github.io/comscore.html at master · basillatif/basillatif.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>Basil Latif Portfolio</title>
    <meta name="description" content="BL Comscore Project" />
    <link rel="stylesheet" href="styles.css" />
  </head>

<div align="center">
  <h1>Web Behavior Dataset: Demographics, Transcations, and Web Sessions</h1>
</div>
<div class="topnav">
	  <a href="index.html">Home</a>
	  <a href="articles.html">Data Science Projects</a>
	  <a href="teaching.html">Teaching Portfolio</a>
	  <a href="freelancing.html">Consulting Portfolio</a>
	  <a href="books.html">Top 20 Books</a>
	  <a href="contact.html">Contact</a>
</div><br>

<div align="center">
  <h1>Comscore Data Project</h1>
</div>

<div align="center">
  <img src="/cscore/comscore-logo.jpg" width="300" height="200" alt="Logo Trim" /><br />
	<a class = "two" href="https://www.comscore.com/" target="_blank" rel="noopener noreferrer">Link to Comscore Website</a><br>
</div>

<div align="center">
<h1>Project Goal:</h1>
<p>
  A Comscore dataset was given to the project team by the UC Irvine Business School. Given the data, the goal of this project was to understand which factors impacted customers
  purchasing behavior, and how we can increase a customers basket total based on
  recommending them with items they might be more likely to purchase. Although Comscore provided us
  with user data from over 20 different domains (countries), our interest was in the type of
  products customers purchased rather than which domain customers purchased on, so domain information was filtered out.
</p>
</div>
<div align="center">
<h1>Exploratory Data Analysis</h1>
<img src="/cscore/pagxdur.jpeg" alt="Views" /><br />
<p>
  As we can see here, there is no relationship between pages_viewed and
  duration. For the same number of pages_viewed, the duration can be either very
  small or very large. One possible confounding factor here is that people may
  leave their computers on but not be actively viewing a page but the duration
  variable is still tracked.
</p><br>
</div>
<div align="center">
  <img src="/cscore/pagxdurxtran.jpeg" alt="Views" /><br />
<p>
  We want to further investigate the relationship between pages_viewed and
  duration. We join the sessions table with the transactions table in order to
  see what the relationships is for people who made purchases. In the above
  graph, we see that pages_viewed and duration have a linear relationship
  indicating that when people are making a purchase decision. This indicates
  that when making a purchase the time spent on a page and the time spent
  overall on the website linearly increases when you have a serious buyer.
</p>
</div>
<div align="center">
  <h1>Clustering on this Dataset</h1>
<p>
  Clustering is a powerful technique to explore patterns structures within data
  and has wide applications in business analytics. We decided to use clusters to
  break our large dataset into smaller groups of customer segmentation based on
  their attributes. Based on the cluster outputs companies can better target on
  customer segmentations.
</p>
<p>
  We do data cleaning by removing irrelevant columns and making sure data is
  consistent. Then, we plot an SSE curve to see how many clusters we should
  form. We decide to make 4 and generate the following cluster plot:
</p>
<img src="/cscore/cscore_clust.png" alt="Views" /><br />
<p>
  Our cluster output shows that there are 99 observations in cluster 1, 73
  observations in cluster 2, 63 observations in cluster 3, and 12 observations
  in cluster 4. The computed clusters from the k-mean method shows that three
  out of the four clusters have roughly similar numbers of observations.
  Generally speaking the clusters have clear boundaries, there are some minor
  overlaps between cluster 1 and 2, and between cluster 2 and 3. Cluster 4 is
  the most unique group out of all four clusters.
</p>
<p>
  We explored the customer segment persona from the clusters by comparing
  average duration of visiting sites and household education based on cluster
  distribution. Cluster 3 stands out in characterizing users with high school
  diplomas who do not spend a long time browsing websites in comparison to
  individuals who might be more educated. This means that less educated users
  educated might have a lower likelihood to convert and make a transaction on a
  website. Cluster 2 has a higher average duration compared with the other
  clusters.
</p>
</div>
<div align="center">
  <h1>Recommender Using Collaborative Filtering</h1>
<p>
  The purpose is to suggest relevant items to users. To achieve this task, there
  are three major requirements that need to be satisfied to build a recommender
  system model, which are 1) users’ information, 2) the purchased items and 3)
  their interaction for each transaction. Collaborative filtering methods for
  recommender systems are methods that are based solely on these three
  variables. Specifically, it looks through past interactions recorded between
  users and items in order to produce new recommendations. Those interactions
  can be ratings, reviews and purchase frequency that can represent buyers’
  preferences for each transaction.
</p>
</div>
<div align="center">
  <a href="https://github.com/shujiangalex/MSBA277-recommender-system"
  >Link to GitHub</a><br />

<a href="index.html">Go Back to Home Page</a><br><br>
</div>