You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Making transcript of every episode for one of the most popular TV shows in the region "Državni Posao" (eng. smth. like "Government Job"). There is no similar project nor dataset out there.
Goal
Possibilities are unlimited, for both analysis and general curiosity.
Do you want to find your favorite joke?
Just search through the dataset.
Wanna watch all the occurrences of a certain character?
Again, just search through the dataset
Interested in making any data analysis of the show?
Once more, just go through the dataset.
Help of the Community
There is a certain possibility that this "one man" project would be done in 1-2 years.
This is an open-source project, any help from the community is welcome.
Methodology
So far, one of the working steps would be the following:
Scrape titles and links of each episode from YouTube. (Scrapy)
Download each episode using manually or using automation. (Selenium)
Upload episode/Provide a link to the 3rd party tool which would generate a transcript of the episode, manually or with automation. (Selenium)
Store episodes and extract valuable data from them into the main dataset.
About future contributions: Guidelines and detailed documentation will be uploaded in the near future.
**In Progress** A transcript of every episode for one of the most popular TV shows in the region, Državni Posao (translated as Government Job). The goal of this project is to create a valuable dataset for the whole series, containing general data and accurate transcript of each episode.