Skip to content

jovanovic-djo/dp-tv-show-transcript

Repository files navigation

Project is still in progress

Idea Behind the Project

Making transcript of every episode for one of the most popular TV shows in the region "Državni Posao" (eng. smth. like "Government Job"). There is no similar project nor dataset out there.

Goal

Possibilities are unlimited, for both analysis and general curiosity.

Do you want to find your favorite joke?

  • Just search through the dataset.

Wanna watch all the occurrences of a certain character?

  • Again, just search through the dataset

Interested in making any data analysis of the show?

  • Once more, just go through the dataset.

Help of the Community

There is a certain possibility that this "one man" project would be done in 1-2 years.

This is an open-source project, any help from the community is welcome.

Methodology

So far, one of the working steps would be the following:

  • Scrape titles and links of each episode from YouTube. (Scrapy)
  • Download each episode using manually or using automation. (Selenium)
  • Upload episode/Provide a link to the 3rd party tool which would generate a transcript of the episode, manually or with automation. (Selenium)
  • Store episodes and extract valuable data from them into the main dataset.

About future contributions: Guidelines and detailed documentation will be uploaded in the near future.

About the TV Show

Main data about show:

  • Show aired: 2012.
  • Number of episodes ~2300+
  • Views 790,000,000+
  • Languages: Serbian
  • Secondary languages: German, Hungarian, English, Chinese, Slovakian, Greek
  • Number of seasons: 13

About

**In Progress** A transcript of every episode for one of the most popular TV shows in the region, Državni Posao (translated as Government Job). The goal of this project is to create a valuable dataset for the whole series, containing general data and accurate transcript of each episode.

Resources

License

Stars

Watchers

Forks

Contributors