-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
We have a speed limit for scraping github, repo homepages at least.
From one ip address it is around 2 repo per second, but it is only 2-3 times faster from 20 different IP addresses ( from the same datacenter, toolkit). A lot of status code 429, rate limiting events. I wonder if it is general github policy or or datacenter just got lucky?
Experiment code here https://github.com/bigcode-project/bigcode-analysis/blob/github_scraping_test/data_analysis/github_scraping_test/github_scrapping_test.ipynb
Maybe anyone can run this experiment on their ray cluster or just repeat the test any other way form their range of ip addresses?
Metadata
Metadata
Assignees
Labels
No labels