You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2020-11-29-segmentation.md
+43-13Lines changed: 43 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
---
2
2
layout: post
3
-
title: "Not A Regular RFM Analysis"
4
-
subtitle: "Why limit to Recency, Frequency and Monetary measures during Customer Segmentation?"
5
-
description: "Customer segmentation with limited data: learn a proven 5D RFM+ approach using k-means to segment responders/non-responders and drive targeted in-game marketing."
3
+
title: "Customer Segmentation with RFM+ and K-Means: 7 Segments from Gaming Data"
4
+
subtitle: "Build a 5D RFM+ framework, engineer metrics, and segment responders/non-responders with k-means to power targeted in‑game marketing"
5
+
description: "Customer segmentation with limited data. Learn a 5D RFM+ framework, engineer metrics, and use k-means to create 7 segments—apply insights now."
<imgsrc="/images/posts/2020-11-29-segmentation/data.jpg"alt="Sample of gaming user-level dataset with purchase dates for base game, expansion packs, and downloadable content"title="User-Level Gaming Dataset Features"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
40
+
<figcaption><p>Snapshot of available features used for segmentation: base game, expansions, and DLC install dates</p></figcaption>
41
+
</figure>
39
42
40
43
So the last 8 features are the names of either an expansion pack of the game or a downloadable content. The dataset has 500k rows. That's good because it means we can make more segments, right!?
41
44
@@ -53,7 +56,10 @@ I tag the users as responders or non-responders based on whether they buy any ad
53
56
54
57
Now I can begin defining my key metrics for segmenting the responders:
<imgsrc="/images/posts/2020-11-29-segmentation/recency.jpg"alt="Recency distribution showing user activity recency across years with higher activity in 2019"title="Recency Distribution of Player Activity"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
61
+
<figcaption><p>Recency metric: days since last activity, highlighting more recent engagement in 2019</p></figcaption>
62
+
</figure>
57
63
58
64
59
65
### Recency
@@ -62,7 +68,10 @@ This is the number of days passed since the user was seen active on the gaming p
62
68
63
69
The chart shows that more users have been active in 2019, as compared to the users in 2017.
<imgsrc="/images/posts/2020-11-29-segmentation/frequency.jpg"alt="Frequency distribution of days played since installation, skewed toward fewer active days"title="Frequency of Gameplay Days"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
73
+
<figcaption><p>Frequency metric: number of active days since install, skewed toward fewer days for most players</p></figcaption>
74
+
</figure>
66
75
67
76
### Frequency
68
77
@@ -71,7 +80,10 @@ Since the day a player installed the game, how many days did he play the game?
71
80
The chart is concentrated towards left, meaning that most players are active for lesser days. However, it should be noted that new players have less number of days where they could be active, as compared to older players.
<imgsrc="/images/posts/2020-11-29-segmentation/monetary-value.png"alt="Monetary value distribution of player spending based on mapped add-on prices"title="Monetary Value of Player Spend"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
85
+
<figcaption><p>Monetary value metric: spend estimated by mapping store prices to user add-on purchases</p></figcaption>
86
+
</figure>
75
87
76
88
77
89
### Monetary Value
@@ -80,7 +92,10 @@ Since this information is not available in the data, I went to the game store we
80
92
81
93
Most players spend less than a hundred bucks. This is expected because the base game costs 55 bucks. And the downloadable content is generally cheap!
<imgsrc="/images/posts/2020-11-29-segmentation/responses.png"alt="Distribution of number of add-ons purchased per player showing most buyers purchase one"title="Responses: Add-ons Purchased per Player"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
97
+
<figcaption><p>Responses metric: count of prior add-on purchases per player; most buyers purchase only one</p></figcaption>
98
+
</figure>
84
99
85
100
86
101
### Responses
@@ -89,7 +104,10 @@ How many add-ons did the player buy previously? This will not be correlated with
89
104
90
105
It can be seen that most people who bought any add-on, only bought one.
<imgsrc="/images/posts/2020-11-29-segmentation/purchase-frequency.png"alt="Histogram of purchase intervals showing peaks near expansion launch windows"title="Purchase Frequency Over Time"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
109
+
<figcaption><p>Purchase frequency metric: intervals between purchases with peaks around expansion release periods</p></figcaption>
110
+
</figure>
93
111
94
112
95
113
### Purchase Frequency
@@ -104,7 +122,10 @@ While most players buy everything soon after they buy the game, we see other hig
104
122
105
123
Using the 5 key metrics, I apply k-means clustering to segment the users.
<imgsrc="/images/posts/2020-11-29-segmentation/elbow.jpg"alt="Elbow method chart indicating optimal k around five clusters for k-means"title="Elbow Method for Optimal k"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
127
+
<figcaption><p>Elbow plot suggests k=5 as a balanced choice for k-means clustering complexity and cohesion</p></figcaption>
128
+
</figure>
108
129
109
130
Looking at the chart, I select 5 as the optimum number of clusters/segments. This gives me a balance between homogeneity within clusters and complexity of the analysis.
110
131
@@ -113,23 +134,32 @@ Looking at the chart, I select 5 as the optimum number of clusters/segments. Thi
113
134
114
135
Since these are the users who have not interacted much, we only have two measures to judge them: Recency and Frequency.
<imgsrc="/images/posts/2020-11-29-segmentation/recency-vs-frequency.jpg"alt="Scatter plot of recency versus frequency used to segment non-responders by activity threshold"title="Recency vs Frequency for Non-Responders"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
139
+
<figcaption><p>Non-responder segmentation using a recency threshold to separate recently active from lapsed users</p></figcaption>
140
+
</figure>
117
141
118
142
As can be seen in the above chart, I segment such users by a threshold of 1000 days. That is, those who have been active in last 200 days are in Cluster 6, others are in Cluster 5 (Cluster 0–4 being the responders).
119
143
120
144
## Analysis and Strategy
121
145
122
146
Following table gives means of all the features across the user segments.
<imgsrc="/images/posts/2020-11-29-segmentation/segments.jpg"alt="Table of means for key metrics across identified customer segments"title="Segment Means Across Metrics"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
150
+
<figcaption><p>Summary statistics by segment for recency, frequency, responses, monetary value, and purchase cadence</p></figcaption>
151
+
</figure>
125
152
126
153
Look at the first row. On average, players in Cluster 0 were active for nearly 15 days, bought 1.5 add-ons, were active 477 days from the beginning (long back), spent 65 bucks, and purchased an add-on every 33 days. Since these were active long back, they have probably forgotten about the game. So, in-game marketing may not work on them! On the other hand, email marketing might!
127
154
128
155
Now look at the second row. On average, players in Cluster 1 were active for a whopping 92 days, bought nearly 3 add-ons, were active fairly recently, have spent much more than others have, but purchase relatively rarely. These could be the players who have recently bought an add-on. These are the customers who seem to be loyal. We could target them with more exciting features!
129
156
130
157
Following figure gives similar summary of each cluster/segment.
Copy file name to clipboardExpand all lines: _posts/2020-12-07-practical-guide-better-code.md
+47-13Lines changed: 47 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,8 @@
1
1
---
2
2
layout: post
3
-
title: "A practical guide for better-looking python code"
4
-
description: "Setting up a CI/CD pipeline using GitHub"
3
+
title: "Python CI/CD with GitHub Actions: Pre-commit, Linters, and Pytest Guide"
4
+
subtitle: "Step-by-step workflow to secure branches, automate linting, and run tests using GitHub Actions, pre-commit, black/isort/flake8/mypy, and pytest."
5
+
description: "Python CI/CD with GitHub Actions: Discover branch protection, pre-commit, black, isort, flake8, mypy, and pytest to enforce essential, tested code—start now."
<imgsrc="/images/posts/2020-12-07-practical-guide-better-code/empty-repo.png"alt="New GitHub repository with only README on main branch"title="Empty Repository on GitHub"loading="lazy"style="max-width: 100%; height: auto; border: 1pxsolid#ddd; border-radius: 4px;" />
40
+
<figcaption><p>Starting point: an empty repo with a single README on main</p></figcaption>
<figcaption><p>Required check “build (3.7)” appears after configuring the workflow</p></figcaption>
141
+
</figure>
129
142
130
143
Notice that `build (3.7)` has appeared among status checks. This corresponds to the name of the job (`build`) and python version `3.7`. I made a small modification to the `README.md` file, and let’s see if I can push it now to the main branch. Here is the error I get:
131
144
@@ -151,13 +164,19 @@ git push origin dev
151
164
A new branch called `dev` is created on the remote repository. What’s left is to create a pull request, and merge it to the `main` branch.
<img src="/images/posts/2020-12-07-practical-guide-better-code/status-check-passed.png" alt="GitHub PR showing all status checks have passed" title="Status Checks Passed" loading="lazy" style="max-width: 100%; height: auto; border: 1px solid #ddd; border-radius: 4px;" />
178
+
<figcaption><p>All required checks pass—your PR is ready to merge</p></figcaption>
179
+
</figure>
161
180
162
181
163
182
We would like to introduce actions or tests to be performed, before the pull request is ready to be approved, so let’s provide code that will be actually checked. We will consider solving the `FizzBuzz` problem, see the next section.
@@ -264,7 +283,10 @@ jobs:
264
283
265
284
Let’s now try to push the solution above to the repository.
<img src="/images/posts/2020-12-07-practical-guide-better-code/fail.png" alt="GitHub Actions CI job failing due to linter or formatter issues" title="CI Job Failing Example" loading="lazy" style="max-width: 100%; height: auto; border: 1px solid #ddd; border-radius: 4px;" />
288
+
<figcaption><p>Example of a failing CI run—fix issues locally and push updates</p></figcaption>
289
+
</figure>
268
290
269
291
270
292
And we see that it fails on the first check. When it fails it does not proceed to the next steps, but it turns out that the code above for solving the `FizzBuzz` problem will fail on every check.
@@ -360,12 +382,18 @@ After the file is created in the repository, run `pre-commit install` to install
360
382
Here is a small test: let’s change the neat `fizzbuzz.py` code to get back to the one that does not pass the checks and see what happens. Here is a part of the result: we see where it fails. Note that the pre-commit hook modifies files for some commands (like black or isort).
0 commit comments