Last updated: November 27, 2023
- OCW Video Lectures: results.csv
This is a simple crawler to save the available courses on MIT OpenCourseWare. This crawler will export the courses with video lectures as a CSV file.
You can crawl for courses other than video lectures by changing the @start_urls in crawler.rb.
This is the simplest way to run the crawler. It will run the crawler and save the results in results.csv using a Docker volume.
$ docker build -t ocw-crawl:1.0 .
$ docker run --volume $(pwd)/results.csv:/app/results.csv \
--rm \
--name ocw-crawl \
ocw-crawl:1.0To run the crawler without Docker, you'll need to install an older version of Ruby that's compatible with kimurai. You'll also need geckodriver and Firefox. Read more about setting up kimurai here if you run into trouble.
Install Ruby 2.5.0 and run bundle install.
$ asdf install ruby 2.5.0
$ asdf global ruby 2.5.0
$ gem install bundler
$ bundle install # install dependencies$ ruby crawler.rb
...- Use OCW Sitemaps to crawl all courses
- Get more information about each course from the sitemap
- Course materials often follow these patterns:
- Syllabus:
/pages/syllabus/ - Course download:
/download/ - Resources:
/resources/*/- PDFs, slides, lectures notes, etc.
- Course pages:
/pages/*/- Readings:
/pages/readings/
- Readings:
- Syllabus:
- Course materials often follow these patterns:
- Turn the data into an app or API