-
Notifications
You must be signed in to change notification settings - Fork 3
GitHub API
##Available Data
###GitHub data is composed of three kinds of data: User info, Repo info, and Org info
-
User info
- username
- payment info
- repo list
- org list
- following
- followers
-
Repo info:
- Code, ie actual files
- Commits:
- code
- timestamps
- users
- messages
- Wiki pages:
- essentially equivalent to code files but stored in a separate repo named .wiki
- Issues:
- issue numbers
- users
- timestamps
- messages
- status
- milestones
- features
- Forks
- pull requests
- watchers
- Contributor stats
- User/org ownership
-
Org info
- repos
- users
###The GitHub API provides a convenient way to access all of this data. It is documented here, and is fairly straightforward.
##Rate Limiting
###Every user is limited to 5000 requests/hour.
This limitation is not as bad as it seems, however, since the rate is per user, not per application. In particular, if a user authorizes our site to access GitHub on their behalf, we can make 5000 requests in an hour on behalf of that user, and another 5000 for each other user.
The best ways to avoid running into rate limits include caching as much as possible, using statistics, and using "free" conditional requests. We have written a request to Github for additional access, but have not heard anything as of yet.
The other major way of accessing GitHub data is through git itself, by cloning repos locally. This gives us all files, commits, timestamps, and Wikis, but not issues, forks, and per-user/org lists of repos. We also still need authentication for private repos. Still, this might be worth looking into if we want to compute statistics that GitHub doesn't have, or for certain rate-limit issues.