Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/daily_collection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
uv pip install .
- name: Collect PyPI Downloads
run: |
uv run download-analytics collect-pypi \
uv run pymetrics collect-pypi \
--verbose \
--max-days ${{ inputs.max_days_pypi || 30 }} \
--add-metrics \
Expand All @@ -50,7 +50,7 @@ jobs:
BIGQUERY_CREDENTIALS: ${{ secrets.BIGQUERY_CREDENTIALS }}
- name: Collect Anaconda Downloads
run: |
uv run download-analytics collect-anaconda \
uv run pymetrics collect-anaconda \
--output-folder gdrive://1UnDYovLkL4gletOF5328BG1X59mSHF-Z \
--max-days ${{ inputs.max_days_anaconda || 90 }} \
--verbose
Expand All @@ -72,6 +72,6 @@ jobs:
uv pip install -U pip
uv pip install -e .[dev]
- name: Slack alert if failure
run: uv run python -m download_analytics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
run: uv run python -m pymetrics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }}
env:
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
4 changes: 2 additions & 2 deletions .github/workflows/daily_summarize.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
uv pip install .
- name: Run Summarize
run: |
uv run download-analytics summarize \
uv run pymetrics summarize \
--output-folder gdrive://10QHbqyvptmZX4yhu2Y38YJbVHqINRr0n
env:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
Expand Down Expand Up @@ -67,7 +67,7 @@ jobs:
uv pip install .[dev]
- name: Slack alert if failure
run: |
uv run python -m download_analytics.slack_utils \
uv run python -m pymetrics.slack_utils \
-r ${{ github.run_id }} \
-c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }} \
-m 'Summarize Analytics build failed :fire: :dumpster-fire: :fire:'
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/dryrun.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
uv pip install .
- name: Collect PyPI Downloads - Dry Run
run: |
uv run download-analytics collect-pypi \
uv run pymetrics collect-pypi \
--verbose \
--max-days 30 \
--add-metrics \
Expand All @@ -39,7 +39,7 @@ jobs:
BIGQUERY_CREDENTIALS: ${{ secrets.BIGQUERY_CREDENTIALS }}
- name: Collect Anaconda Downloads - Dry Run
run: |
uv run download-analytics collect-anaconda \
uv run pymetrics collect-anaconda \
--output-folder gdrive://1UnDYovLkL4gletOF5328BG1X59mSHF-Z \
--max-days 90 \
--verbose \
Expand All @@ -48,7 +48,7 @@ jobs:
PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }}
- name: Summarize - Dry Run
run: |
uv run download-analytics summarize \
uv run pymetrics summarize \
--verbose \
--output-folder gdrive://10QHbqyvptmZX4yhu2Y38YJbVHqINRr0n \
--dry-run
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/manual.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
uv pip install .
- name: Collect Downloads Data
run: |
uv run download-analytics collect-pypi \
uv run pymetrics collect-pypi \
--verbose \
--projects ${{ github.event.inputs.projects }} \
${{ github.event.inputs.max_days && '--max-days ' || '' }} \
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Download Analytics
# PyMetrics

The Download Analytics project allows you to extract download metrics for Python libraries published on [PyPI](https://pypi.org/) and [Anaconda](https://www.anaconda.com/).
The PyMetrics project allows you to extract download metrics for Python libraries published on [PyPI](https://pypi.org/) and [Anaconda](https://www.anaconda.com/).

The DataCebo team uses these scripts to report download counts for the libraries in the [SDV ecosystem](https://sdv.dev/) and other libraries.

## Overview
The Download Analytics project is a collection of scripts and tools to extract information
The PyMetrics project is a collection of scripts and tools to extract information
about OSS project downloads from different sources and to analyze them to produce user
engagement metrics.

Expand Down
8 changes: 4 additions & 4 deletions docs/COLLECTED_DATA.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Data collected by Download Analytics
# Data collected by PyMetrics

The Download Analytics project collects data about downloads from multiple sources.
The PyMetrics project collects data about downloads from multiple sources.

This guide explains the exact data that is being collected from each source, as well as
the aggregations metrics that are computed on them.

## PyPI Downloads

Download Analytics collects information about the downloads from PyPI by making queries to the
PyMetrics collects information about the downloads from PyPI by making queries to the
[public PyPI download statistics dataset on BigQuery](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=pypi&page=dataset)
by running the following query:

Expand Down Expand Up @@ -51,7 +51,7 @@ the given time period, with the following columns:

## Aggregation Metrics

If the `--add-metrics` option is passed to `download-analytics`, a spreadsheet with aggregation
If the `--add-metrics` option is passed to `pymetrics`, a spreadsheet with aggregation
metrics will be created alongside the raw PyPI downloads CSV file for each individual project.

The aggregation metrics spreasheets contain the following tabs:
Expand Down
24 changes: 12 additions & 12 deletions docs/DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,46 @@
# Download Analytics Development Guide
# PyMetrics Development Guide

This guide covers how to download and install **Download Analytics** to run it locally and
This guide covers how to download and install **PyMetrics** to run it locally and
modify its code.

## Install

**Download Analytics** is not released to any public Python package repository, so the only
**PyMetrics** is not released to any public Python package repository, so the only
way to run it is to download the code from Github and install from source.

1. Clone the [github repository](https://github.com/datacebo/download-analytics)
1. Clone the [github repository](https://github.com/datacebo/pymetrics)

```bash
git clone [email protected]:datacebo/download-analytics
git clone [email protected]:datacebo/pymetrics
```

2. Create a `virtualenv` (or `conda` env) to host the project and its dependencies. The example
below covers the creation of a `virtualenv` using `virtualenvwrapper` with Python 3.8.

```bash
mkvirtualenv download-analytics -p $(which python3.8)
mkvirtualenv pymetrics -p $(which python3.8)
```

3. Enter the project folder and install the project:

```bash
cd download-analytics
cd pymetrics
make install
```

For development, run `make install-develop` instead.

## Command Line Interface

After the installation, a new `download-analytics` command will have been registered inside your
After the installation, a new `pymetrics` command will have been registered inside your
`virtualenv`. This command can be used in conjunction with the `collect-pypi` action to collect
downloads data from BigQuery and store the output locally or in Google Drive.

Here is the entire list of arguments that the command line has:

```bash
$ download-analytics collect-pypi --help
usage: download-analytics collect-pypi [-h] [-v] [-l LOGFILE] [-o OUTPUT_FOLDER] [-a AUTHENTICATION_CREDENTIALS]
$ pymetrics collect-pypi --help
usage: pymetrics collect-pypi [-h] [-v] [-l LOGFILE] [-o OUTPUT_FOLDER] [-a AUTHENTICATION_CREDENTIALS]
[-c CONFIG_FILE] [-p [PROJECTS [PROJECTS ...]]] [-s START_DATE]
[-m MAX_DAYS] [-d] [-f] [-M]

Expand Down Expand Up @@ -73,7 +73,7 @@ and store the downloads data into a Google Drive folder alongside the correspond
metric spreadsheets would look like this:

```bash
$ download-analytics collect-pypi --verbose --projects sdv ctgan --start-date 2021-01-01 \
$ pymetrics collect-pypi --verbose --projects sdv ctgan --start-date 2021-01-01 \
--add-metrics --output-folder gdrive://10QHbqyvptmZX4yhu2Y38YJbVHqINRr0n
```

Expand All @@ -83,7 +83,7 @@ have a look at the [COLLECTED_DATA.md](COLLECTED_DATA.md) document.
## Python Interface

The Python entry point that is equivalent to the CLI explained above is the function
`download_analytics.main.collect_downloads`.
`pymetrics.main.collect_downloads`.

This function has the following interface:

Expand Down
16 changes: 8 additions & 8 deletions docs/SETUP.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Download Analytics Setup
# PyMetrics Setup

The Download Analytics project requires privileged access to the following resources:
The PyMetrics project requires privileged access to the following resources:
- Google Drive, which is accessed via the `PyDrive` library.
- Google Big Query, which is accessed via the `google-cloud-bigquery` library.

Expand Down Expand Up @@ -31,10 +31,10 @@ if contains the application KEY which should never be made public.

Once the file is created, you can follow these steps:

1. Run the `download-analytics collect-pypi` command. If the `settings.yaml` file has been properly
1. Run the `pymetrics collect-pypi` command. If the `settings.yaml` file has been properly
created, this will **open a new tab on your web browser**, where you need to authenticate.

| ![pydrive-collect](imgs/pydrive-collect.png "Run the `download-analytics collect-pypi` Command") |
| ![pydrive-collect](imgs/pydrive-collect.png "Run the `pymetrics collect-pypi` Command") |
| - |

2. Click on the Google account which you which to authenticate with. Notice that the account that
Expand Down Expand Up @@ -67,7 +67,7 @@ be provided to you by a privileged admin.
Once you have this JSON file, you have two options:

1. Pass the path to the authentication file with the `-a` or `--authentication-credentials`
argument to the `download-analytics collect-pypi` command.
argument to the `pymetrics collect-pypi` command.

| ![bigquery-a](imgs/bigquery-a.png "Pass the credentials on command line") |
| - |
Expand All @@ -80,12 +80,12 @@ Once you have this JSON file, you have two options:

## Github Actions Setup

When using Download Analytics via Github Actions, the authentication credentials for Google
When using PyMetrics via Github Actions, the authentication credentials for Google
Drive and Big Query must be passed as repository `secrets`, which will later on be declared
as environment variables.

1. Open the [Settings page of the Download Analytics repository](
https://github.com/datacebo/download-analytics/settings/secrets/actions) and click on `Secrets`.
1. Open the [Settings page of the PyMetrics repository](
https://github.com/datacebo/pymetrics/settings/secrets/actions) and click on `Secrets`.

| ![secrets](imgs/secrets.png "Open the secrets page of the repository") |
| - |
Expand Down
10 changes: 5 additions & 5 deletions docs/WORKFLOWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Download Analytics Workflows
# PyMetrics Workflows

This document describes how to perform the two most common workflows:

Expand Down Expand Up @@ -65,13 +65,13 @@ after it is merged, the downloads of the new library will start to be added to t

## One Shot - Collecting data over a specific period.

Download Analytics is prepared to collect data for one or more libraries over a specific period
using a [GitHub Actions Workflow](https://github.com/datacebo/download-analytics/actions/workflows/manual.yaml).
PyMetrics is prepared to collect data for one or more libraries over a specific period
using a [GitHub Actions Workflow](https://github.com/datacebo/pymetrics/actions/workflows/manual.yaml).

In order to do this, you will need to follow these steps:

1. Enter the [GitHub Actions Section of the repository](https://github.com/datacebo/download-analytics/actions)
and click on the [Manual Collection Workflow](https://github.com/datacebo/download-analytics/actions/workflows/manual.yaml).
1. Enter the [GitHub Actions Section of the repository](https://github.com/datacebo/pymetrics/actions)
and click on the [Manual Collection Workflow](https://github.com/datacebo/pymetrics/actions/workflows/manual.yaml).

| ![manual-collection](imgs/manual-collection.png "Manual Collection Workflow") |
| - |
Expand Down
File renamed without changes.
16 changes: 8 additions & 8 deletions download_analytics/__main__.py → pymetrics/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Download Analytics CLI."""
"""PyMetrics CLI."""

import argparse
import logging
Expand All @@ -9,9 +9,9 @@

import yaml

from download_analytics.anaconda import collect_anaconda_downloads
from download_analytics.main import collect_downloads
from download_analytics.summarize import summarize_downloads
from pymetrics.anaconda import collect_anaconda_downloads
from pymetrics.main import collect_downloads
from pymetrics.summarize import summarize_downloads

LOGGER = logging.getLogger(__name__)

Expand All @@ -22,7 +22,7 @@ def _env_setup(logfile, verbosity):
format_ = '%(asctime)s - %(levelname)s - %(message)s'
level = (3 - verbosity) * 10
logging.basicConfig(filename=logfile, level=level, format=format_)
logging.getLogger('download_analytics').setLevel(level)
logging.getLogger('pymetrics').setLevel(level)
logging.getLogger().setLevel(logging.WARN)


Expand Down Expand Up @@ -119,8 +119,8 @@ def _get_parser():
help='Do not upload the results. Just calculate them.',
)
parser = argparse.ArgumentParser(
prog='download-analytics',
description='Download Analytics Command Line Interface',
prog='pymetrics',
description='PyMetrics Command Line Interface',
parents=[logging_args],
)
parser.set_defaults(action=None)
Expand Down Expand Up @@ -255,7 +255,7 @@ def _get_parser():


def main():
"""Run the Download Analytics CLI."""
"""Run the PyMetrics CLI."""
parser = _get_parser()
if len(sys.argv) < 2:
parser.print_help()
Expand Down
4 changes: 2 additions & 2 deletions download_analytics/anaconda.py → pymetrics/anaconda.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
import requests
from tqdm import tqdm

from download_analytics.output import append_row, create_csv, get_path, load_csv
from download_analytics.time_utils import drop_duplicates_by_date
from pymetrics.output import append_row, create_csv, get_path, load_csv
from pymetrics.time_utils import drop_duplicates_by_date

LOGGER = logging.getLogger(__name__)
dir_path = os.path.dirname(os.path.realpath(__file__))
Expand Down
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions download_analytics/main.py → pymetrics/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

import logging

from download_analytics.metrics import compute_metrics
from download_analytics.output import create_csv, get_path
from download_analytics.pypi import get_pypi_downloads
from download_analytics.summarize import get_previous_pypi_downloads
from pymetrics.metrics import compute_metrics
from pymetrics.output import create_csv, get_path
from pymetrics.pypi import get_pypi_downloads
from pymetrics.summarize import get_previous_pypi_downloads

LOGGER = logging.getLogger(__name__)

Expand Down
2 changes: 1 addition & 1 deletion download_analytics/metrics.py → pymetrics/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import pandas as pd

from download_analytics.output import create_spreadsheet
from pymetrics.output import create_spreadsheet

LOGGER = logging.getLogger(__name__)

Expand Down
6 changes: 3 additions & 3 deletions download_analytics/output.py → pymetrics/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pandas as pd

from download_analytics import drive
from pymetrics import drive

LOGGER = logging.getLogger(__name__)

Expand Down Expand Up @@ -118,7 +118,7 @@ def create_csv(output_path, data):


def load_spreadsheet(spreadsheet):
"""Load a spreadsheet previously created by download-analytics.
"""Load a spreadsheet previously created by pymetrics.

Args:
spreadsheet (str or stream):
Expand Down Expand Up @@ -154,7 +154,7 @@ def load_spreadsheet(spreadsheet):


def load_csv(csv_path, read_csv_kwargs=None):
"""Load a CSV previously created by download-analytics.
"""Load a CSV previously created by pymetrics.

Args:
csv_path (str):
Expand Down
2 changes: 1 addition & 1 deletion download_analytics/pypi.py → pymetrics/pypi.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import pandas as pd

from download_analytics.bq import run_query
from pymetrics.bq import run_query

LOGGER = logging.getLogger(__name__)

Expand Down
Loading