You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a daily basis, this workflow collects download data from PyPI and Anaconda. The data is then published in CSV format (`pypi.csv`). In addition, it computes metrics for the PyPI downloads (see below).
55
-
56
-
#### Metrics
57
-
This PyPI download metrics are computed along several dimensions:
79
+
On a daily basis, this workflow collects download data from PyPI and Anaconda. The data is then published in CSV format (`pypi.csv`). In addition, it computes metrics for the PyPI downloads (see [#Aggregation Metrics](#aggregation-metrics))
58
80
59
-
-**By Month**: The number of downloads per month.
60
-
-**By Version**: The number of downloads per version of the software, as determined by the software maintainers.
61
-
-**By Python Version**: The number of downloads per minor Python version (eg. 3.8).
62
-
-**And more!**
63
-
64
-
### Daily Summarize
81
+
### Daily Summarization
65
82
66
83
On a daily basis, this workflow summarizes the PyPI download data from `pypi.csv` and calculates downloads for libraries. The summarized data is published to a GitHub repo:
@@ -77,5 +94,55 @@ Installing the main SDV library also installs all the other libraries as depende
77
94
78
95
This methodology prevents double-counting downloads while providing an accurate representation of SDV usage.
79
96
97
+
## PyPI Data
98
+
PyMetrics collects download information from PyPI by querying the [public PyPI download statistics dataset on BigQuery](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=pypi&page=dataset). The following data fields are captured for each download event:
99
+
100
+
**Temporal & Geographic Data:**
101
+
*`timestamp`: The timestamp at which the download happened
102
+
*`country_code`: The 2-letter country code
103
+
104
+
**Package Information:**
105
+
*`project`: The name of the PyPI project (library) that is being downloaded
106
+
*`version`: The downloaded version
107
+
*`type`: The type of file that was downloaded (source or wheel)
108
+
109
+
**Installation Environment:**
110
+
*`installer_name`: The installer used for the download, like `pip` or `bandersnatch` or `uv`
111
+
*`implementation_name`: The name of the Python implementation, such as `cpython`
112
+
*`implementation_version`: The Python version
113
+
*`ci`: A boolean flag indicating whether the download originated from a CI system (True, False, or null). This is determined by checking for specific environment variables set by CI platforms such as Azure Pipelines (`BUILD_BUILDID`), Jenkins (`BUILD_ID`), or general CI indicators (`CI`, `PIP_IS_CI`)
114
+
115
+
**System Information:**
116
+
*`distro_name`: Name of the Linux or Mac distribution (empty if Windows)
117
+
*`distro_version`: Distribution version (empty for Windows)
118
+
*`system_name`: Type of OS, like Linux, Darwin (for Mac), or Windows
119
+
*`system_release`: OS version in case of Windows, kernel version in case of Unix
120
+
*`cpu`: CPU architecture used
121
+
122
+
## Aggregation Metrics
123
+
124
+
If the `--add-metrics` option is passed to `pymetrics`, a spreadsheet with aggregation
125
+
metrics will be created alongside the raw PyPI downloads CSV file for each individual project.
126
+
127
+
The aggregation metrics spreasheets contain the following tabs:
128
+
129
+
***By Month:** Number of downloads per month and increase in the number of downloads from month to month.
130
+
***By Version:** Absolute and relative number of downloads per version.
131
+
***By Country Code:** Absolute and relative number of downloads per Country.
132
+
***By Python Version:** Absolute and relative number of downloads per minor Python Version (X.Y, like 3.8).
133
+
***By Full Python Version:** Absolute and relative number of downloads per Python Version, including
134
+
the patch number (X.Y.Z, like 3.8.1).
135
+
***By Installer Name:** Absolute and relative number of downloads per Installer (e.g. pip)
136
+
***By Distro Name:** Absolute and relative number of downloads per Distribution Name (e.g. Ubuntu)
137
+
***By Distro Name:** Absolute and relative number of downloads per Distribution Name AND Version (e.g. Ubuntu 20.04)
138
+
***By Distro Kernel:** Absolute and relative number of downloads per Distribution Name, Version AND Kernel (e.g. Ubuntu 18.04 - 5.4.104+)
139
+
***By OS Type:** Absolute and relative number of downloads per OS Type (e.g. Linux)
140
+
***By Cpu:** Absolute and relative number of downloads per CPU Version (e.g. AMD64)
141
+
***By CI**: Absolute and relative number of downloads by CI status (automated vs. manual installations)
142
+
***By Month and Version:** Absolute number of downloads per month and version.
143
+
***By Month and Python Version:** Absolute number of downloads per month and Python version.
144
+
***By Month and Country Code:** Absolute number of downloads per month and country.
145
+
***By Month and Installer Name:** Absolute number of downloads per month and Installer.
146
+
80
147
## Known Issues
81
148
1. The conda package download data for Anaconda does not match the download count shown on the website. This is due to missing download data in the conda package download data. See this: https://github.com/anaconda/anaconda-package-data/issues/45
0 commit comments