|
1 | | -# GitHub and Travis mining utility |
| 1 | +# GitHub API Mining Utility |
| 2 | + |
| 3 | +This is a simplified repository miner based on [caiusb/miner-utils](https://github.com/caiusb/miner-utils), and targeting the GitHub REST API (v3). |
2 | 4 |
|
3 | 5 | ## Installation |
4 | 6 |
|
5 | | -Run `pip install "git+https://github.com/caiusb/miner-utils"` |
| 7 | +### Requirements |
| 8 | +The following must be installed and available for the mining utility: |
| 9 | + * [Python 3](https://www.python.org/downloads/) |
| 10 | + * [`pip`](https://pypi.org/project/pip/) |
| 11 | + |
| 12 | +To verify that these packages are installed and updated, use the following commands in a terminal/console: |
| 13 | +```bash |
| 14 | +python --version |
| 15 | +# example: |
| 16 | +# > python3 --version |
| 17 | +# Python 3.7.4 |
| 18 | + |
| 19 | +pip --version |
| 20 | +# example: |
| 21 | +# > pip --version |
| 22 | +# pip 20.2.3 from /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pip (python 3.7) |
| 23 | +``` |
| 24 | + |
| 25 | +### Installing the mining utility |
| 26 | +To install the mining utility into a Python global environment, run the following command in a terminal/console: |
| 27 | +```bash |
| 28 | +pip install "git+https://github.com/EPICLab/miner-utils" |
| 29 | +``` |
| 30 | + |
| 31 | +To install the mining utility into an enhanced shell like IPython or the Jupyter notebook, run the following commands in a code cell: |
| 32 | +```python |
| 33 | +!pip install 'git+https://github.com/EPICLab/miner-utils' |
| 34 | +``` |
6 | 35 |
|
7 | 36 | ## Usage |
8 | 37 |
|
9 | | -### Instantiating a miner |
| 38 | +The GitHub REST API (v3) has rate limits for the number of resource objects that can be requested in a given timeframe. |
10 | 39 |
|
11 | | -To instantiate a GitHub miner, simply call the constructor: |
| 40 | +For API requests using Basic Authentication or OAuth, you can make up to 5000 requests per hour. Authenticated requests are associated with the authenticated user, regardless of whether Basic Authentication or an OAuth token was used. This means that all OAuth applications authorized by a user share the same quota of 5000 requests per hour when they authenticate with different tokens owned by the same user. |
12 | 41 |
|
13 | | -``` |
14 | | -gh = GitHub(); |
15 | | -``` |
| 42 | +For unauthenticated requests, the rate limit allows for up to 60 requests per hour. Unauthenticated requests are associated with the originating IP address, and not the user making the requests. |
16 | 43 |
|
17 | | -The contructor takes 2 optional arguments, a username and a token. It is recommended that you use them, in order to greatly reduce the time it takes to collect the data: |
| 44 | +For more information on GitHub's rate limiting policy, see the [rate limiting documentation](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting). |
18 | 45 |
|
19 | | -``` |
20 | | -gh = GitHub(username, token) |
21 | | -``` |
| 46 | +### Obtaining a GitHub authentication token |
| 47 | +The GitHub REST API (v3) originally supported Basic Authentication using either a username/password or username/token. However, authentication using username/password is currently being deprecated and will be completely removed as of November 13, 2020 at 16:00 UTC ([GitHub Developer release note](https://developer.github.com/changes/2020-02-14-deprecating-password-auth/)). |
22 | 48 |
|
23 | | -To instantiate a Travis miner, simply call the constructor: |
| 49 | +Follow the GitHub documentation, ["Creating a personal access token"](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token), obtain a personal access token (PAT) that has `(no scope)` set so that read-only access to public information is allowed (i.e. leave the scope fields unchecked). |
24 | 50 |
|
25 | | -``` |
26 | | -tr = Travis() |
27 | | -``` |
| 51 | +> **WARNING**: Treat your tokens like passwords and keep them secret. When using the GitHub API Mining Utility, set the token during instantiation, but do not publish the token in any Python programs or IPython/Jupyter notebooks. |
28 | 52 |
|
29 | | -The constructor also takes 1 optional authentication token: |
| 53 | +### Instantiating the GitHub API Mining Utility |
| 54 | +To create an instance of the GitHub API Mining Utility in either a Python environment or a IPython/Jupyter notebook, run the following commands: |
| 55 | +```python |
| 56 | +from minerutils import GitHub |
30 | 57 |
|
| 58 | +gh = GitHub(username, token) |
| 59 | + |
| 60 | +# example: |
| 61 | +# gh = GitHub(username='nelsonni', token='b123c123d123e123') |
31 | 62 | ``` |
32 | | -tr = Travis(token) |
| 63 | + |
| 64 | +### Interacting with the GitHub API Mining Utility |
| 65 | +Once the GitHub API Mining Utility has been instantiated, you can interact with the GitHub REST API through GET requests that take the following format: |
| 66 | +```python |
| 67 | +gh.get(url, params, headers) |
| 68 | + |
| 69 | +# example (these are equivalent): |
| 70 | +# gh.get("/repos/scala/scala/pulls", params={'state': 'all'}) |
| 71 | +# gh.get("/repos/scala/scala/pulls?state=all") |
33 | 72 | ``` |
34 | 73 |
|
35 | | -### Calling the API |
| 74 | +The examples above get all of the pull requests for the specified project (e.g. `scala/scala`). The `params` and `header` arguments are optional, but useful for passing a parameter or query for a particular resource. Both parameters take a map of `(key, value)` pairs for the arguments that you want to pass to the GitHub API endpoint. The alternative is to embed the parameters directly into the `url` (as demonstrated in the second example above). |
36 | 75 |
|
37 | | -Both miners have a similar API. To perfom a get request, use: |
| 76 | +For all available GitHub REST API (v3) resources, including `url` and `params` values, refer to the [GitHub Docs: REST API](https://docs.github.com/en/rest/reference) site. |
38 | 77 |
|
39 | | -``` |
40 | | -gh.get("/repos/scala/scala/pulls", params={'state': 'all'}) |
41 | | -``` |
| 78 | +### Python 3 |
| 79 | +This miner is written in Python 3, and should be run in a Python 3.x environment. If you attempt to run in a Python 2 environment, runtime errors will warn that `urllib.parse` module cannot be imported (this is because the `urlparse` module was renamed to `urllib.parse` in Python 3). |
42 | 80 |
|
43 | | -The example above, gets all the pull requests for the specified project. Consult the documentation of the service that you are using to determine what resources are available. If you need to pass a parameter, or a query, use the `params` argument. It takes a map (key, value pairs) of the arguments that you want to pass. Alternatively, you can pass the parameters in the url directly, like this: |
| 81 | +## Commands Documentation |
44 | 82 |
|
45 | | -``` |
46 | | -gh.get("/repos/scala/scala/pulls?state=all") |
47 | | -``` |
| 83 | +| Command | Return Type | Description | |
| 84 | +| :------ | ----------- | ----------: | |
| 85 | +|`printConfig()` | `None` | Prints the symbols table associated with the GitHub API Mining Utility instance, including authentication values. | |
| 86 | +| `get(url, params={}, headers={}, perPage=100)` | `list` | Calls the GitHub REST API (v3) using GET requests that include the authentication parameters (if provided during instantiation), any `params` pairs (if provided), any `headers` pairs (if provided), and paginates the results based on the `perPage` rate. This call respects the GitHub REST API (v3) rate limits (included in 403 status code responses) to determine when the rate limit has been exhausted, and will sleep until the limit has been reset. | |
| 87 | +| `getRepoRoot(repo)` | `string` | Accepts a `repo` parameter in the form of a map containing `username` and `repo` key-value pairs, and returns a GitHub URL of the form `https://api.github.com/{username}/{repo}`. | |
| 88 | +| `getRemainingRateLimit()` | `int` | Obtains the numerical count of the remaining GitHub REST API (v3) calls allowed before reaching the rate limit. | |
| 89 | +| `printRemainingRateLimit()` | `None` | Prints the numerical count of the remaining GitHub API (v3) calls allowed before reaching the rate limit. | |
| 90 | +| `repoExists(user, repo)` | `bool` | Calls the GitHub REST API (v3) using a GET request with a URL of the form `https://api.github.com/repos/{user}/{repo}` and indicates whether that response was successful (i.e. whether the repository exists on GitHub). | |
48 | 91 |
|
49 | 92 | ## License |
50 | 93 |
|
51 | | -This project is licensed under the MIT License - see the LICENSE.md file for details |
| 94 | +This project is licensed under the MIT License - see the LICENSE.md file for details. |
0 commit comments