Skip to content

Conversation

@abarciauskas-bgse
Copy link
Contributor

Pypi stats only go back 180 days so I don't think we can generate a comparable report for zarr 3.0 unfortunately since it was released in April.

I looked into the github API for repo downloads for zarr-python but can't see there is actually a downloads metric to be had.

Thoughts @maxrjones ?

@maxrjones
Copy link
Collaborator

We could use big query to get the older stats. Zarr Python 3 was first released in January but released from 3.0 - 3.0.7 were yanked.

SELECT COUNT(*) as downloads
FROM `bigquery-public-data.pypi.file_downloads`
WHERE file.project = 'zarr'
  AND DATE(timestamp) BETWEEN '2024-12-01' AND '2024-12-31'

1076076

SELECT COUNT(*) as downloads
FROM `bigquery-public-data.pypi.file_downloads`
WHERE file.project = 'zarr'
  AND DATE(timestamp) BETWEEN '2025-09-01' AND '2025-09-30'

2340374

@abarciauskas-bgse
Copy link
Contributor Author

abarciauskas-bgse commented Nov 12, 2025

@maxrjones You said zarr-python 3.0 was first released in January but the last 2.x release, 2.18.7, was released April 9th, so I'm not sure we can associate downloads prior to April 9th with release 2.18.x or 3.0.x. (NB: There were also all of the 3.0.0-alpha releases in 2024)

I think there are 2 opttions:

  1. Report daily average downloads 3 months before and after each (zarr-python 3.0.7 released on April 21st, 2025 and virtualizarr 2.0.0 released on July 21s, 2025) major release.

Or we could use the following last minor releases:

  • zarr-python 2.18.x releases (May 7, 2024 to April 20, 2025) and compare daily average downloads of zarr-python 3 since the 3.0.7 release (April 21, 2025 to present)
  • virtualizarr 1.3.x releases (Feb 3, 2025 to July 20th, 2025) compared with July 21s, 2025 to present

Also thank you for sharing that big query option! Much easier than the pypi stats API.

@maxrjones
Copy link
Collaborator

@maxrjones You said zarr-python 3.0 was first released in January but the last 2.x release, 2.18.7, was released April 9th, so I'm not sure we can associate downloads prior to April 9th with release 2.18.x or 3.0.x. (NB: There were also all of the 3.0.0-alpha releases in 2024)

Zarr Python is still supporting 2.x for a period of time after 3.0 was released to facilitate migration, meaning bug-fix releases are coming out. Since the PyPI stats aggregate V2 and V3 downloads, I think the most helpful comparison would be x downloads per day in the month preceding v3.0.0 release (December 9, 2024-January 8, 2025) vs x downloads per day in the last complete month. I would prefer this solution because I think you'd already be catching some of the V3 upswing (if there is any) if using April as the baseline.

Perhaps it would be even easier to explain a comparison between October 2025 to October 2024, which would also include any signal from the V3 release.

@abarciauskas-bgse
Copy link
Contributor Author

abarciauskas-bgse commented Nov 12, 2025

For zarr:

  • October 2024 downloads: 1,458,177
  • October 2025 downloads: 2,785,324
  • December 9, 2024 to January 8, 2025: 996,392
  • January 9, 2025 to February 8, 2025: 1,433,948
  • So I think we could say zarr-python's 3.0 release has resulted in near twice (2x) as many monthly downloads as the same month a year prior. Or we could say the release in January resulted in a 44% increase in downloads month over month.

Virtualizarr:

  • October 2024 downloads: 518
  • October 2025 downloads: 4813
  • June 19 - July 20, 2025: 2640
  • July 21 - August 21, 2025: 4590

Virtualizarr's 2.0 release has resulted in over 9x more downloads (should we say this? This can also be attributed to general knowledge of the project) over the same month a year prior OR a saw a 74% increase in downloads month over month from the release.

@maxrjones
Copy link
Collaborator

Virtualizarr's 2.0 release has resulted in over 9x more downloads (should we say this? This can also be attributed to general knowledge of the project) over the same month a year prior OR a saw a 74% increase in downloads month over month from the release.

Maybe we could say "virtualizarr 2.0 development" to account for both the release and outreach about the release

These are some pretty sweet stats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants