-
Notifications
You must be signed in to change notification settings - Fork 30
Add scripts for PR metrics from github API #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mpg
wants to merge
29
commits into
main
Choose a base branch
from
dev/mpg/pr-metrics
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
b10b809
Add scripts for PR metrics from github API
mpg c531cae
Update requirements & allow use from venv
mpg a715053
Add comments about Ubuntu 20.04
mpg bae9af3
Make get-pr-data 10x faster
mpg 00d8499
Avoid potential better-than-reality lifetime figures
mpg 28dffa7
Adjust pr_dates() to reduce risk of misuse
mpg 3d7880c
Adapt detection of community PRs
mpg 37844d4
Add warning about making this work on 16.04
mpg cf9e41d
Avoid repeating the start date in many places
mpg f06becf
Update outdated comment
mpg cc05d6a
Make first and last date configurable
mpg ed1adea
Fix flake8 warnings
mpg 08c0b7c
Rotate labels for quarters
mpg b2ee775
Clarify community detection
mpg 3feb297
Smarter handling of p.mergeable in get-pr-data
mpg 1d58093
Update pending-mergeability
mpg 5f6d268
We no longer use labels for community PRs
mpg 94533e1
Update list of core contributors
mpg b7f7f76
Update Readme (PR last date)
mpg e69fb3a
Shift one month for quarterly PR lifetime
mpg cd9c1f6
Update list of team member
mpg 4d58ba0
Revert "Shift one month for quarterly PR lifetime"
mpg 45fa6ce
Use statistics.median
mpg f1b54e1
Handle uncertainty about lifetimes
mpg ac21a51
Update Readme about incomplete results
mpg e095fc3
Update team members with current reviewers
mpg ce08049
Draw error bars, don't skip uncertain quarters
mpg c86237c
New script pr-backlog.py
mpg b7a02f6
Cosmetic adjustments
mpg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#!/usr/bin/env python3 | ||
# coding: utf-8 | ||
|
||
"""Produce analysis of PR backlog over time""" | ||
|
||
from prs import pr_dates, first, last, quarter | ||
|
||
from datetime import datetime, timedelta | ||
from collections import Counter | ||
from itertools import chain | ||
|
||
import matplotlib.pyplot as plt | ||
|
||
new_days = 90 | ||
old_days = 365 | ||
|
||
new = Counter() | ||
med = Counter() | ||
old = Counter() | ||
|
||
for beg, end, com in pr_dates(): | ||
if end is None: | ||
tomorrow = datetime.now().date() + timedelta(days=1) | ||
n_days = (tomorrow - beg).days | ||
else: | ||
n_days = (end - beg).days | ||
for i in range(n_days): | ||
q = quarter(beg + timedelta(days=i)) | ||
q1 = quarter(beg + timedelta(days=i+1)) | ||
# Only count on each quarter's last day | ||
if q == q1: | ||
continue | ||
if i <= new_days: | ||
new[q] += 1 | ||
elif i <= old_days: | ||
med[q] += 1 | ||
else: | ||
old[q] += 1 | ||
|
||
first_q = quarter(first) | ||
last_q = quarter(last) | ||
|
||
quarters = (q for q in chain(new, med, old) if first_q <= q <= last_q) | ||
quarters = tuple(sorted(set(quarters))) | ||
|
||
new_y = tuple(new[q] for q in quarters) | ||
med_y = tuple(med[q] for q in quarters) | ||
old_y = tuple(old[q] for q in quarters) | ||
sum_y = tuple(old[q] + med[q] for q in quarters) | ||
|
||
old_name = "older than {} days".format(old_days) | ||
med_name = "medium" | ||
new_name = "recent (less {} days old)".format(new_days) | ||
|
||
width = 0.9 | ||
fig, ax = plt.subplots() | ||
ax.bar(quarters, old_y, width, label=old_name) | ||
ax.bar(quarters, med_y, width, label=med_name, bottom=old_y) | ||
ax.bar(quarters, new_y, width, label=new_name, bottom=sum_y) | ||
ax.legend(loc="upper left") | ||
ax.grid(True) | ||
ax.set_xlabel("quarter") | ||
ax.set_ylabel("Number or PRs pending") | ||
ax.tick_params(axis="x", labelrotation=90) | ||
fig.suptitle("State of the PR backlog at the end of each quarter") | ||
fig.set_size_inches(12.8, 7.2) # default 100 dpi -> 720p | ||
fig.savefig("prs-backlog.png") | ||
|
||
print("Quarter,recent,medium,old,total") | ||
for q in quarters: | ||
print("{},{},{},{},{}".format(q, new[q], med[q], old[q], | ||
new[q] + med[q] + old[q])) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bit of a push to standardise ages for what is considered recent/old in OSS a while back. The thresholds picked were 15 and 90.
It might be useful to have a few extra ranges, e.g. <15, 15-90, 90-365, >365 to align better with this? If I have time I'll push an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking more, this is a lot like the median lifetime graph, but with a couple of thresholds which are based on age rather than percentiles. Would it be better to have a graph that shows e.g., median, 75th percentile, 95th percentile? Pros and cons either way I think, maybe worth exploring it if it's quick/easy to do but I'm not sure if it would be better or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know about the push for standardised thresholds, indeed it would be good to align with that (and add some of our own if needed). Unfortunately the way the script is structured currently makes it a very manual change (you can't just give a list of threesholds at the top and have everything else work automagically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding adding other percentiles to the median one, that's something I've been considering for a while, but the problem is over what set. Currently it's over "PRs created this quarter", which means, considering Q1 for the sake of concreteness we can only compute the median (or an upper bound for it) after we've closed at least 50% of PRs created in Q1. Fortunately, that's usually the case at the very beginning of Q2 when we prepare our report. (Even then, we might get only a range, as the value could still get lower if we closed a lot of PRs created near the end of Q1 at the very beginning of Q2.)
But for the 4th quartile, resp. 95th percentile, we'd need to have closed 75%, resp. 95% of PRs created in Q1 by the time we produce our report in early Q2, which realistically is not going to happen most of the time.
We could avoid the problem with incomplete data entirely by considering instead the set of PRs we closed this quarter - there all the lifetimes are known for sure and we can do stats without uncertainty. But I doesn't really tell the same thing: for example, doing at lot of historical review one quarter would raise the median age of PRs closed this quarter, but that's still a good thing. So the data might become more difficult to interpret.
So, I think there are basically three ways to select PRs over which we make stats / grouping / etc for one quarter:
and each set will give slightly different information.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, there isn't an easy answer or obvious better way. So let's leave as-is for now.