Skip to content

Commit 92578d0

Browse files
committed
2 parents cf96301 + 2dc82ae commit 92578d0

File tree

4 files changed

+68
-48
lines changed

4 files changed

+68
-48
lines changed

projects/analyze-twitch-data-with-sqlite/analyze-twitch-data-with-sqlite.mdx

Lines changed: 67 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ In this project tutorial, we will explore a dataset about top Twitch streamers a
3737

3838
The two CSV files we have are:
3939

40-
- [Top 1000 Twitch Streamers (2024)](https://www.kaggle.com/datasets/hibrahimag1/top-1000-twitch-streamers-data-may-2024) featuring streamers like [KaiCenat](https://www.twitch.tv/kaicenat), [Jynxzi](https://www.twitch.tv/jynxzi).
41-
- [Top 1000 Twitch Streamers (2021)](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata) featuring streamers like [xQc](https://www.twitch.tv/xqc), [summit1g](https://www.twitch.tv/summit1g), [TimTheTatman](https://www.twitch.tv/timthetatman), [pokimane](https://www.twitch.tv/pokimane).
40+
- [Top 1000 Twitch Streamers (2024)](https://www.kaggle.com/datasets/hibrahimag1/top-1000-twitch-streamers-data-may-2024) featuring names like [KaiCenat](https://www.twitch.tv/kaicenat), [Jynxzi](https://www.twitch.tv/jynxzi), [shroud](https://www.twitch.tv/shroud).
41+
- [Top 1000 Twitch Streamers (2021)](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata) featuring names like [xQc](https://www.twitch.tv/xqc), [summit1g](https://www.twitch.tv/summit1g), [TimTheTatman](https://www.twitch.tv/timthetatman), [pokimane](https://www.twitch.tv/pokimane).
4242

4343
Do you recognize any of the names?
4444

@@ -56,6 +56,8 @@ sqlite3
5656

5757
You should see a prompt that says `sqlite>`.
5858

59+
![sqlite3 GIF](https://raw.githubusercontent.com/codedex-io/projects/refs/heads/main/projects/analyze-twitch-data-with-sqlite/sqlite3.gif)
60+
5961
Download one of the CSV files and open it up to make sure the CSV file is working:
6062

6163
- **[streamers2024.csv](https://github.com/codedex-io/projects/blob/main/projects/analyze-twitch-data-with-sqlite/streamers2024.csv)**
@@ -70,7 +72,7 @@ So the **streamers2021.csv** data looks like:
7072
But behind the scenes, the plain text is just:
7173

7274
```output
73-
Channel, Watch time, Stream time, Peak viewers, Average viewers, Followers, Followers gained, Views gained, Partnered, Mature, Language
75+
channel, watch_time, stream_time, peak_viewers, average_viewers, followers, followers_gained, views_gained, partnered, mature, language
7476
xQcOW, 6196161750, 215250, 222720, 27716, 3246298, 1734810, 93036735, True, False, English
7577
summit1g, 6091677300, 211845, 310998, 25610, 5310163, 1370184, 89705964, True, False, English
7678
Gaules, 5644590915, 515280, 387315, 109761767635, 1023779102611607, True
@@ -81,17 +83,17 @@ Tfue, 3671000070, 123660, 285644, 29602, 8938903, 206842478998587, False
8183

8284
The column names are:
8385

84-
- Channel
85-
- Watch time
86-
- Stream time
87-
- Peak viewers
88-
- Average viewers
89-
- Followers
90-
- Followers gained
91-
- Views gained
92-
- Partnered
93-
- Mature
94-
- Language
86+
- `channel`
87+
- `watch_time`
88+
- `stream_time`
89+
- `peak_viewers`
90+
- `average_viewers`
91+
- `followers`
92+
- `followers_gained`
93+
- `views_gained`
94+
- `partnered`
95+
- `mature`
96+
- `language`
9597

9698
And the **streamers2024.csv** data looks like:
9799

@@ -107,23 +109,25 @@ RANK, NAME, LANGUAGE, TYPE, MOST_STREAMED_GAME, 2ND_MOST_STREAMED_GAME, AVERAGE_
107109

108110
The column names are:
109111

110-
- RANK
111-
- NAME
112-
- LANGUAGE
113-
- TYPE
114-
- MOST_STREAMED_GAME
115-
- 2ND_MOST_STREAMED_GAME
116-
- AVERAGE_STREAM_DURATION
117-
- FOLLOWERS_GAINED_PER_STREAM
118-
- AVG_VIEWERS_PER_STREAM
119-
- AVG_GAMES_PER_STREAM
120-
- TOTAL_TIME_STREAMED
121-
- TOTAL_FOLLOWERS
122-
- TOTAL_VIEWS
123-
- TOTAL_GAMES_STREAMED
124-
- ACTIVE_DAYS_PER_WEEK
125-
- MOST_ACTIVE_DAY
126-
- DAY_WITH_MOST_FOLLOWERS_GAINED
112+
- `RANK`
113+
- `NAME`
114+
- `LANGUAGE`
115+
- `TYPE`
116+
- `MOST_STREAMED_GAME`
117+
- `2ND_MOST_STREAMED_GAME`
118+
- `AVERAGE_STREAM_DURATION`
119+
- `FOLLOWERS_GAINED_PER_STREAM`
120+
- `AVG_VIEWERS_PER_STREAM`
121+
- `AVG_GAMES_PER_STREAM`
122+
- `TOTAL_TIME_STREAMED`
123+
- `TOTAL_FOLLOWERS`
124+
- `TOTAL_VIEWS`
125+
- `TOTAL_GAMES_STREAMED`
126+
- `ACTIVE_DAYS_PER_WEEK`
127+
- `MOST_ACTIVE_DAY`
128+
- `DAY_WITH_MOST_FOLLOWERS_GAINED`
129+
130+
Go in an update all the column names to lowercase.
127131

128132
There are 1,001 rows in each of the datasets because there is 1 column for the headings and 1,000 streamers each!
129133

@@ -137,33 +141,35 @@ In the terminal, type:
137141
sqlite twitch.db
138142
```
139143

140-
Inside the SQLite prompt, create a table called `streams`:
144+
Inside the SQLite prompt, create a table called `streamers`:
141145

142146
```sql
143147
CREATE TABLE streamers (
144148
channel TEXT PRIMARY KEY,
145149
watch_time INTEGER,
146150
stream_time INTEGER,
147151
peak_viewers INTEGER,
152+
average_viewers INTEGER,
148153
followers INTEGER,
149154
followers_gained INTEGER,
155+
views_gained INTEGER,
150156
partnered TEXT,
151157
mature TEXT,
152158
language TEXT
153159
);
154160
```
155161

156-
- Channel
157-
- Watch time
158-
- Stream time
159-
- Peak viewers
160-
- Average viewers
161-
- Followers
162-
- Followers gained
163-
- Views gained
164-
- Partnered
165-
- Mature
166-
- Language
162+
- `channel`
163+
- `watch_time`
164+
- `stream_time`
165+
- `peak_viewers`
166+
- `average_viewers`
167+
- `followers`
168+
- `followers_gained`
169+
- `views_gained`
170+
- `partnered`
171+
- `mature`
172+
- `language`
167173

168174
These names will have to match the CSV columns as well as the data type here, or else there will be an error later.
169175

@@ -173,25 +179,39 @@ It’s time to move our data from the CSV file into a SQL table.
173179

174180
Make sure your CSV file is in the same folder. Then in the SQLite prompt:
175181

176-
```
182+
```terminal
177183
.mode csv
178-
.import twitch_data.csv streams
184+
.import streamers2021.csv streamers
179185
```
180186

181187
To make sure it’s working:
182188

183189
```
184190
SELECT *
185-
FROM streams
191+
FROM streamers
186192
LIMIT 10;
187193
```
188194

189195
You should see something like:
190196

191-
[screenshot image]
197+
```output
198+
xQcOW,6196161750,215250,222720,27716,3246298,1734810,93036735,True,False,English
199+
summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English
200+
Gaules,5644590915,515280,387315,10976,1767635,1023779,102611607,True,True,Portuguese
201+
ESL_CSGO,3970318140,517740,300575,7714,3944850,703986,106546942,True,False,English
202+
Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English
203+
Asmongold,3668799075,82260,263720,42414,1563438,554201,61715781,True,False,English
204+
NICKMERCS,3360675195,136275,115633,24181,4074287,1089824,46084211,True,False,English
205+
Fextralife,3301867485,147885,68795,18985,508816,425468,670137548,True,False,English
206+
loltyler1,2928356940,122490,89387,22381,3530767,951730,51349926,True,False,English
207+
208+
sqlite>
209+
```
192210

193211
The data is now in a table, and we are ready to analyze it. ✅
194212

213+
**Note:** If you ever want to start over with a new table, you can delete a table with `DROP TABLE streamers;`
214+
195215
### Getting a Feel for the Dataset
196216

197217
So usually, I like to start by selecting the first 10-20 rows from the table to see the column names:
19.1 KB
Loading

projects/analyze-twitch-data-with-sqlite/streamers2021.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Channel,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers,Followers gained,Views gained,Partnered,Mature,Language
1+
channel,watch_time,stream_time,peak_viewers,average_viewers,followers,followers_gained,views_gained,partnered,mature,language
22
xQcOW,6196161750,215250,222720,27716,3246298,1734810,93036735,True,False,English
33
summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English
44
Gaules,5644590915,515280,387315,10976,1767635,1023779,102611607,True,True,Portuguese
379 KB
Loading

0 commit comments

Comments
 (0)