@@ -37,8 +37,8 @@ In this project tutorial, we will explore a dataset about top Twitch streamers a
3737
3838The two CSV files we have are:
3939
40- - [ Top 1000 Twitch Streamers (2024)] ( https://www.kaggle.com/datasets/hibrahimag1/top-1000-twitch-streamers-data-may-2024 ) featuring streamers like [ KaiCenat] ( https://www.twitch.tv/kaicenat ) , [ Jynxzi] ( https://www.twitch.tv/jynxzi ) .
41- - [ Top 1000 Twitch Streamers (2021)] ( https://www.kaggle.com/datasets/aayushmishra1512/twitchdata ) featuring streamers like [ xQc] ( https://www.twitch.tv/xqc ) , [ summit1g] ( https://www.twitch.tv/summit1g ) , [ TimTheTatman] ( https://www.twitch.tv/timthetatman ) , [ pokimane] ( https://www.twitch.tv/pokimane ) .
40+ - [ Top 1000 Twitch Streamers (2024)] ( https://www.kaggle.com/datasets/hibrahimag1/top-1000-twitch-streamers-data-may-2024 ) featuring names like [ KaiCenat] ( https://www.twitch.tv/kaicenat ) , [ Jynxzi] ( https://www.twitch.tv/jynxzi ) , [ shroud ] ( https://www.twitch.tv/shroud ) .
41+ - [ Top 1000 Twitch Streamers (2021)] ( https://www.kaggle.com/datasets/aayushmishra1512/twitchdata ) featuring names like [ xQc] ( https://www.twitch.tv/xqc ) , [ summit1g] ( https://www.twitch.tv/summit1g ) , [ TimTheTatman] ( https://www.twitch.tv/timthetatman ) , [ pokimane] ( https://www.twitch.tv/pokimane ) .
4242
4343Do you recognize any of the names?
4444
@@ -56,6 +56,8 @@ sqlite3
5656
5757You should see a prompt that says ` sqlite> ` .
5858
59+ ![ sqlite3 GIF] ( https://raw.githubusercontent.com/codedex-io/projects/refs/heads/main/projects/analyze-twitch-data-with-sqlite/sqlite3.gif )
60+
5961Download one of the CSV files and open it up to make sure the CSV file is working:
6062
6163- ** [ streamers2024.csv] ( https://github.com/codedex-io/projects/blob/main/projects/analyze-twitch-data-with-sqlite/streamers2024.csv ) **
@@ -70,7 +72,7 @@ So the **streamers2021.csv** data looks like:
7072But behind the scenes, the plain text is just:
7173
7274``` output
73- Channel, Watch time, Stream time, Peak viewers, Average viewers, Followers, Followers gained, Views gained, Partnered, Mature, Language
75+ channel, watch_time, stream_time, peak_viewers, average_viewers, followers, followers_gained, views_gained, partnered, mature, language
7476xQcOW, 6196161750, 215250, 222720, 27716, 3246298, 1734810, 93036735, True, False, English
7577summit1g, 6091677300, 211845, 310998, 25610, 5310163, 1370184, 89705964, True, False, English
7678Gaules, 5644590915, 515280, 387315, 109761767635, 1023779102611607, True
@@ -81,17 +83,17 @@ Tfue, 3671000070, 123660, 285644, 29602, 8938903, 206842478998587, False
8183
8284The column names are:
8385
84- - Channel
85- - Watch time
86- - Stream time
87- - Peak viewers
88- - Average viewers
89- - Followers
90- - Followers gained
91- - Views gained
92- - Partnered
93- - Mature
94- - Language
86+ - ` channel `
87+ - ` watch_time `
88+ - ` stream_time `
89+ - ` peak_viewers `
90+ - ` average_viewers `
91+ - ` followers `
92+ - ` followers_gained `
93+ - ` views_gained `
94+ - ` partnered `
95+ - ` mature `
96+ - ` language `
9597
9698And the ** streamers2024.csv** data looks like:
9799
@@ -107,23 +109,25 @@ RANK, NAME, LANGUAGE, TYPE, MOST_STREAMED_GAME, 2ND_MOST_STREAMED_GAME, AVERAGE_
107109
108110The column names are:
109111
110- - RANK
111- - NAME
112- - LANGUAGE
113- - TYPE
114- - MOST_STREAMED_GAME
115- - 2ND_MOST_STREAMED_GAME
116- - AVERAGE_STREAM_DURATION
117- - FOLLOWERS_GAINED_PER_STREAM
118- - AVG_VIEWERS_PER_STREAM
119- - AVG_GAMES_PER_STREAM
120- - TOTAL_TIME_STREAMED
121- - TOTAL_FOLLOWERS
122- - TOTAL_VIEWS
123- - TOTAL_GAMES_STREAMED
124- - ACTIVE_DAYS_PER_WEEK
125- - MOST_ACTIVE_DAY
126- - DAY_WITH_MOST_FOLLOWERS_GAINED
112+ - ` RANK `
113+ - ` NAME `
114+ - ` LANGUAGE `
115+ - ` TYPE `
116+ - ` MOST_STREAMED_GAME `
117+ - ` 2ND_MOST_STREAMED_GAME `
118+ - ` AVERAGE_STREAM_DURATION `
119+ - ` FOLLOWERS_GAINED_PER_STREAM `
120+ - ` AVG_VIEWERS_PER_STREAM `
121+ - ` AVG_GAMES_PER_STREAM `
122+ - ` TOTAL_TIME_STREAMED `
123+ - ` TOTAL_FOLLOWERS `
124+ - ` TOTAL_VIEWS `
125+ - ` TOTAL_GAMES_STREAMED `
126+ - ` ACTIVE_DAYS_PER_WEEK `
127+ - ` MOST_ACTIVE_DAY `
128+ - ` DAY_WITH_MOST_FOLLOWERS_GAINED `
129+
130+ Go in an update all the column names to lowercase.
127131
128132There are 1,001 rows in each of the datasets because there is 1 column for the headings and 1,000 streamers each!
129133
@@ -137,33 +141,35 @@ In the terminal, type:
137141sqlite twitch.db
138142```
139143
140- Inside the SQLite prompt, create a table called ` streams ` :
144+ Inside the SQLite prompt, create a table called ` streamers ` :
141145
142146``` sql
143147CREATE TABLE streamers (
144148 channel TEXT PRIMARY KEY ,
145149 watch_time INTEGER ,
146150 stream_time INTEGER ,
147151 peak_viewers INTEGER ,
152+ average_viewers INTEGER ,
148153 followers INTEGER ,
149154 followers_gained INTEGER ,
155+ views_gained INTEGER ,
150156 partnered TEXT ,
151157 mature TEXT ,
152158 language TEXT
153159);
154160```
155161
156- - Channel
157- - Watch time
158- - Stream time
159- - Peak viewers
160- - Average viewers
161- - Followers
162- - Followers gained
163- - Views gained
164- - Partnered
165- - Mature
166- - Language
162+ - ` channel `
163+ - ` watch_time `
164+ - ` stream_time `
165+ - ` peak_viewers `
166+ - ` average_viewers `
167+ - ` followers `
168+ - ` followers_gained `
169+ - ` views_gained `
170+ - ` partnered `
171+ - ` mature `
172+ - ` language `
167173
168174These names will have to match the CSV columns as well as the data type here, or else there will be an error later.
169175
@@ -173,25 +179,39 @@ It’s time to move our data from the CSV file into a SQL table.
173179
174180Make sure your CSV file is in the same folder. Then in the SQLite prompt:
175181
176- ```
182+ ``` terminal
177183.mode csv
178- .import twitch_data .csv streams
184+ .import streamers2021 .csv streamers
179185```
180186
181187To make sure it’s working:
182188
183189```
184190SELECT *
185- FROM streams
191+ FROM streamers
186192LIMIT 10;
187193```
188194
189195You should see something like:
190196
191- [ screenshot image]
197+ ``` output
198+ xQcOW,6196161750,215250,222720,27716,3246298,1734810,93036735,True,False,English
199+ summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English
200+ Gaules,5644590915,515280,387315,10976,1767635,1023779,102611607,True,True,Portuguese
201+ ESL_CSGO,3970318140,517740,300575,7714,3944850,703986,106546942,True,False,English
202+ Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English
203+ Asmongold,3668799075,82260,263720,42414,1563438,554201,61715781,True,False,English
204+ NICKMERCS,3360675195,136275,115633,24181,4074287,1089824,46084211,True,False,English
205+ Fextralife,3301867485,147885,68795,18985,508816,425468,670137548,True,False,English
206+ loltyler1,2928356940,122490,89387,22381,3530767,951730,51349926,True,False,English
207+
208+ sqlite>
209+ ```
192210
193211The data is now in a table, and we are ready to analyze it. ✅
194212
213+ ** Note:** If you ever want to start over with a new table, you can delete a table with ` DROP TABLE streamers; `
214+
195215### Getting a Feel for the Dataset
196216
197217So usually, I like to start by selecting the first 10-20 rows from the table to see the column names:
0 commit comments