Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/daily_collection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ jobs:
run: |
uv pip install -U pip
uv pip install .
- name: Collect Github Data
- name: Collect GitHub Data
run: |
uv run github-analytics collect -v -q -t ${{ secrets.GITHUB_TOKEN }} -m -c daily.yaml
uv run gitmetrics collect -v -q -t ${{ secrets.GITHUB_TOKEN }} -m -c daily.yaml
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}

Expand All @@ -47,6 +47,6 @@ jobs:
uv pip install -U pip
uv pip install .[dev]
- name: Slack alert if failure
run: python -m github_analytics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
run: python -m gitmetrics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
env:
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
6 changes: 3 additions & 3 deletions .github/workflows/daily_summarize.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
uv pip install .
- name: Run Summarize
run: |
uv run github-analytics summarize \
uv run gitmetrics summarize \
--input-folder gdrive://1ZvsuVbFAUk3BN-n6Pv_lUBLwviHZSxM2
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
Expand Down Expand Up @@ -66,9 +66,9 @@ jobs:
uv pip install .[dev]
- name: Slack alert if failure
run: |
uv run python -m github_analytics.slack_utils \
uv run python -m gitmetrics.slack_utils \
-r ${{ github.run_id }} \
-c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }} \
-m 'Summarize GitHub Analytics build failed :fire: :dumpster-fire: :fire:'
-m 'Summarize GitMetrics build failed :fire: :dumpster-fire: :fire:'
env:
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
4 changes: 2 additions & 2 deletions .github/workflows/manual.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install .
- name: Collect Github Data
- name: Collect GitHub Data
run: |
github-analytics collect -v -q \
gitmetrics collect -v -q \
-t ${{ secrets.GITHUB_TOKEN }} \
-p ${{ github.event.inputs.project }} \
-r ${{ github.event.inputs.repositories }} \
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/traffic_collection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ on:
- cron: "0 0 */14 * *" # Runs every 14 days at midnight UTC

jobs:
collect_traffic:
daily_traffic_collection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -25,13 +25,13 @@ jobs:
run: |
uv pip install -U pip
uv pip install .
- name: Collect Github Traffic Data
- name: Collect GitHub Traffic Data
run: |
uv run github-analytics traffic -v -t ${{ secrets.GH_TRAFFIC_TOKEN }} -c traffic_config.yaml
uv run gitmetrics traffic -v -t ${{ secrets.GH_TRAFFIC_TOKEN }} -c traffic_config.yaml
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
alert:
needs: [daily_github_collection]
needs: [daily_traffic_collection]
runs-on: ubuntu-latest
if: failure()
steps:
Expand All @@ -46,6 +46,6 @@ jobs:
uv pip install -U pip
uv pip install .[dev]
- name: Slack alert if failure
run: python -m github_analytics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
run: python -m gitmetrics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
env:
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
6 changes: 3 additions & 3 deletions .github/workflows/weekly_collection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@ jobs:
uv pip install .
- name: Collect GitHub Data
run: |
uv run github-analytics collect -v -q -t ${{ secrets.GITHUB_TOKEN }} -m -c weekly.yaml
uv run gitmetrics collect -v -q -t ${{ secrets.GITHUB_TOKEN }} -m -c weekly.yaml
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
- name: Consolidate GitHub Data
run: |
uv run github-analytics consolidate -v -c weekly.yaml
uv run gitmetrics consolidate -v -c weekly.yaml
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
alert:
Expand All @@ -51,6 +51,6 @@ jobs:
uv pip install -U pip
uv pip install .[dev]
- name: Slack alert if failure
run: python -m github_analytics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
run: python -m gitmetrics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
env:
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ bigquery_creds.json
client_secrets.json
credentials.json
sdv-dev.github.io/*
uv.lock

notebooks
*.xlsx
Expand Down
20 changes: 10 additions & 10 deletions CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Github Analytics Configuration
# GitMetrics Configuration

The Github Analytics script can be configured using a YAML file that indicates which repositories
The GitMetrics script can be configured using a YAML file that indicates which repositories
to collect and where to store the collected data.

Additionally, [Github Actions Workflows](.github/workflows) are being used to trigger the
Additionally, [GitMetrics Workflows](.github/workflows) are being used to trigger the
collection of such projects either manually or on a scheduled basis.

## Configuration file format

The configuration file for `github-analytics` must have the following contents:
The configuration file for `gitmetrics` must have the following contents:

* `output_folder`: Folder where results will be written. Can be a Google Drive in the format
gdrive://<folder-name>
Expand Down Expand Up @@ -48,7 +48,7 @@ projects:
### Adding an entire organization

Optionally, an organization or user name can be added to the confiuration instead of the
individual repositories, and then `github-analytics` will translate that into the list of
individual repositories, and then `gitmetrics` will translate that into the list of
repositories owned by that user or organization *which are not forks of other repositories*.

For example, this configuration file would include all the repositories listed above,
Expand Down Expand Up @@ -78,12 +78,12 @@ projects:

## Default Configuration File

By default, Github Analytics collects the projects configured in the [config.yaml](config.yaml)
By default, GitMetrics collects the projects configured in the [config.yaml](config.yaml)
file included in the project. However, passing a different configuration file when running the
command line script is possible via the `-c` flag, as shown in the example above:

```bash
$ github-analytics collect -c my_config_file.yaml ...
$ gitmetrics collect -c my_config_file.yaml ...
```

### Importing other configuration files
Expand Down Expand Up @@ -142,11 +142,11 @@ projects:

## Daily and Weekly Collection

Github Analytics is configured to collect data daily and weekly via the
GitMetrics is configured to collect data daily and weekly via the
[.github/workflow/daily.yaml](.github/workflow/daily.yaml) and [.github/workflow/weekly.yaml](
.github/workflow/weekly.yaml) Github Action Workflows.
.github/workflow/weekly.yaml) GitHub Action Workflows.

These workflows are configured to execute the `github-analytics collect` command using the
These workflows are configured to execute the `gitmetrics collect` command using the
[daily.yaml](daily.yaml) and [weekly.yaml](weekly.yaml) configuration files respectively,
which:
- Import the [config.yaml](config.yaml) file, where all the project repositories are listed
Expand Down
12 changes: 0 additions & 12 deletions MANIFEST.in

This file was deleted.

50 changes: 25 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
# Github Analytics
# GitMetrics

Scripts to extract multiple metrics from Github Projects.
Scripts to extract multiple metrics from GitHub Projects.

## Install

```bash
pip install git+ssh://[email protected]/datacebo/github-analytics
pip install git+ssh://[email protected]/datacebo/gitmetrics
```

### Development

For development, clone the repository and install `dev-requirements.txt`:

```bash
git clone [email protected]:datacebo/github-analytics
cd github-analytics
pip install -r dev-requirements.txt
git clone [email protected]:datacebo/gitmetrics
cd gitmetrics
pip install -e .[test,dev]
```

# Local Usage

To collect metrics from github by running `github-analytics` on your computer you need to provide:
To collect metrics from GitHub by running `gitmetrics` on your computer you need to provide:

1. A Github Token. Documentation about how to create a Personal Access Token can be found
1. A GitHub Token. Documentation about how to create a Personal Access Token can be found
[here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
2. A list of Github Repositories for which to collect the metrics. The repositories need
2. A list of GitHub Repositories for which to collect the metrics. The repositories need
to be given as `{org-name}/{repo-name}`, like `sdv-dev/SDV`.
3. (Optional) A filename where the output will be stored. If a name containing the `.xlsx`
extension is given (like `path/to/my-filename.xlsx`), it will be used as provided.
Expand All @@ -36,44 +36,44 @@ To collect metrics from github by running `github-analytics` on your computer yo
## Python Interface

In order to run the collection script from python, the `collect_project_metrics` function
needs to be imported from the `github_analytics` package and executed passing the values
needs to be imported from the `gitmetrics` package and executed passing the values
indicated above.

**NOTE**: For detailed output, logging must be enabled as shown in the example below.

```python3
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> from github_analytics import collect_project_metrics
>>> from gitmetrics import collect_project_metrics
>>> repositories = ['sdv-dev/RDT', 'sdv-dev/SDV', 'sdv-dev/Copulas', 'sdv-dev/CTGAN']
>>> output_name = 'sdv-dev'
>>> token = '<my-github-token>'
>>> collect_project_metrics(token, repositories, output_name)
INFO:github_analytics.main:Getting information for repository sdv-dev/RDT
INFO:gitmetrics.main:Getting information for repository sdv-dev/RDT
100%|███████████████████████████████████████████████████████████████| 143/143 [00:00<00:00, 195.00it/s]
100%|███████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 364.64it/s]
100%|███████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 91020.09it/s]
INFO:github_analytics.main:Getting information for repository sdv-dev/SDV
INFO:gitmetrics.main:Getting information for repository sdv-dev/SDV
100%|███████████████████████████████████████████████████████████████| 389/389 [00:02<00:00, 193.20it/s]
100%|███████████████████████████████████████████████████████████████| 219/219 [00:00<00:00, 231.17it/s]
100%|███████████████████████████████████████████████████████████████| 561/561 [00:03<00:00, 158.39it/s]
INFO:github_analytics.main:Getting information for repository sdv-dev/Copulas
INFO:gitmetrics.main:Getting information for repository sdv-dev/Copulas
100%|███████████████████████████████████████████████████████████████| 138/138 [00:00<00:00, 333.27it/s]
100%|███████████████████████████████████████████████████████████████| 143/143 [00:00<00:00, 287.29it/s]
100%|███████████████████████████████████████████████████████████████| 245/245 [00:01<00:00, 204.88it/s]
INFO:github_analytics.main:Getting information for repository sdv-dev/CTGAN
INFO:gitmetrics.main:Getting information for repository sdv-dev/CTGAN
100%|███████████████████████████████████████████████████████████████| 113/113 [00:00<00:00, 287.26it/s]
100%|██████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 134824.44it/s]
100%|███████████████████████████████████████████████████████████████| 498/498 [00:02<00:00, 171.11it/s]
INFO:github_analytics.main:Getting 164 missing users
INFO:gitmetrics.main:Getting 164 missing users
99%|██████████████████████████████████████████████████████████████▌| 163/164 [00:01<00:00, 121.99it/s]
INFO:github_analytics.output:Creating file github-metrics-sdv-dev-2021-11-12.xlsx
INFO:gitmetrics.output:Creating file github-metrics-sdv-dev-2021-11-12.xlsx
```


## Command Line Interface

In order to run the collection script from the command line, the `github-analytics collect` command
In order to run the collection script from the command line, the `gitmetrics collect` command
must be called passing the following optional arguments:

- `-c / --config-file CONFIG_FILE`: Path to the config file to use. Defaults to `config.yaml`.
Expand All @@ -91,13 +91,13 @@ must be called passing the following optional arguments:
spreadsheet.
- `-n / --not-incremental`: If indicated, collect data from scratch instead of doing it
incrementally over the existing data.
- `-t / --token`: Github token to use. If not given, it will be requested in a prompt.
- `-t / --token`: GitHub token to use. If not given, it will be requested in a prompt.
- `-l / --logfile LOGFILE`: Write logs to the indicated logfile.
- `-v / --verbose`: Be more verbose.

```bash
$ github-analytics github -p sdv-dev -c config.yaml
Please input your Github Token: <my-github-token>
$ gitmetrics github -p sdv-dev -c config.yaml
Please input your GitHub Token: <my-github-token>
2021-11-12 15:42:43,100 - INFO - Getting information for repository sdv-dev/RDT
100%|███████████████████████████████████████████████████████████████| 143/143 [00:00<00:00, 300.87it/s]
100%|███████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 324.25it/s]
Expand Down Expand Up @@ -144,7 +144,7 @@ aggregation metrics for the entire project.

## Google Drive Integration

Github Analytics is capable of reading and writing results in Google Spreadsheets.
GitMetrics is capable of reading and writing results in Google Spreadsheets.

For this to work, the following things are required:

Expand All @@ -156,11 +156,11 @@ For this to work, the following things are required:
the corresponding `settings.yaml` file, or passed via the `PYDRIVE_CREDENTIALS` environment
variable.

# Github Analytics Configuration
# GitMetrics Configuration

The Github Analytics script can be configured using a YAML file that indicates which repositories
The GitMetrics script can be configured using a YAML file that indicates which repositories
to collect and where to store the collected data, as well as when to execute the collection
of data using Github Actions.
of data using GitHub Actions.

For more details about how to configure this, check the [CONFIGURATION.md](CONFIGURATION.md)
document.
5 changes: 0 additions & 5 deletions github_analytics/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion github_analytics/github/__init__.py

This file was deleted.

5 changes: 5 additions & 0 deletions gitmetrics/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Scripts to extract multiple metrics from GitHub projects."""

from gitmetrics.main import collect_project_metrics, collect_projects

__all__ = ['collect_project_metrics', 'collect_projects']
Loading