Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b10b809
Add scripts for PR metrics from github API
mpg Jun 30, 2020
c531cae
Update requirements & allow use from venv
mpg Sep 30, 2020
a715053
Add comments about Ubuntu 20.04
mpg Dec 24, 2020
bae9af3
Make get-pr-data 10x faster
mpg Dec 24, 2020
00d8499
Avoid potential better-than-reality lifetime figures
mpg Dec 30, 2020
28dffa7
Adjust pr_dates() to reduce risk of misuse
mpg Dec 30, 2020
3d7880c
Adapt detection of community PRs
mpg Apr 2, 2021
37844d4
Add warning about making this work on 16.04
mpg Apr 2, 2021
cf9e41d
Avoid repeating the start date in many places
mpg Apr 2, 2021
f06becf
Update outdated comment
mpg Apr 2, 2021
cc05d6a
Make first and last date configurable
mpg Apr 2, 2021
ed1adea
Fix flake8 warnings
mpg Apr 2, 2021
08c0b7c
Rotate labels for quarters
mpg Apr 2, 2021
b2ee775
Clarify community detection
mpg May 19, 2021
3feb297
Smarter handling of p.mergeable in get-pr-data
mpg May 20, 2021
1d58093
Update pending-mergeability
mpg May 20, 2021
5f6d268
We no longer use labels for community PRs
mpg Sep 30, 2022
94533e1
Update list of core contributors
mpg Oct 12, 2022
b7f7f76
Update Readme (PR last date)
mpg Oct 12, 2022
e69fb3a
Shift one month for quarterly PR lifetime
mpg Jan 11, 2023
cd9c1f6
Update list of team member
mpg Jan 11, 2023
4d58ba0
Revert "Shift one month for quarterly PR lifetime"
mpg Jan 11, 2023
45fa6ce
Use statistics.median
mpg Jan 11, 2023
f1b54e1
Handle uncertainty about lifetimes
mpg Jan 11, 2023
ac21a51
Update Readme about incomplete results
mpg Jan 12, 2023
e095fc3
Update team members with current reviewers
mpg Apr 6, 2023
ce08049
Draw error bars, don't skip uncertain quarters
mpg Apr 6, 2023
c86237c
New script pr-backlog.py
mpg Apr 6, 2023
b7a02f6
Cosmetic adjustments
mpg Apr 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pr-metrics/do.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

set -eu

for topic in created closed pending lifetime; do
for topic in created closed pending lifetime backlog; do
echo "PRs $topic..."
rm -f prs-${topic}.png prs-${topic}.csv
./pr-${topic}.py > prs-${topic}.csv
Expand Down
72 changes: 72 additions & 0 deletions pr-metrics/pr-backlog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env python3
# coding: utf-8

"""Produce analysis of PR backlog over time"""

from prs import pr_dates, first, last, quarter

from datetime import datetime, timedelta
from collections import Counter
from itertools import chain

import matplotlib.pyplot as plt

new_days = 90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bit of a push to standardise ages for what is considered recent/old in OSS a while back. The thresholds picked were 15 and 90.

It might be useful to have a few extra ranges, e.g. <15, 15-90, 90-365, >365 to align better with this? If I have time I'll push an update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking more, this is a lot like the median lifetime graph, but with a couple of thresholds which are based on age rather than percentiles. Would it be better to have a graph that shows e.g., median, 75th percentile, 95th percentile? Pros and cons either way I think, maybe worth exploring it if it's quick/easy to do but I'm not sure if it would be better or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know about the push for standardised thresholds, indeed it would be good to align with that (and add some of our own if needed). Unfortunately the way the script is structured currently makes it a very manual change (you can't just give a list of threesholds at the top and have everything else work automagically).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding adding other percentiles to the median one, that's something I've been considering for a while, but the problem is over what set. Currently it's over "PRs created this quarter", which means, considering Q1 for the sake of concreteness we can only compute the median (or an upper bound for it) after we've closed at least 50% of PRs created in Q1. Fortunately, that's usually the case at the very beginning of Q2 when we prepare our report. (Even then, we might get only a range, as the value could still get lower if we closed a lot of PRs created near the end of Q1 at the very beginning of Q2.)

But for the 4th quartile, resp. 95th percentile, we'd need to have closed 75%, resp. 95% of PRs created in Q1 by the time we produce our report in early Q2, which realistically is not going to happen most of the time.

We could avoid the problem with incomplete data entirely by considering instead the set of PRs we closed this quarter - there all the lifetimes are known for sure and we can do stats without uncertainty. But I doesn't really tell the same thing: for example, doing at lot of historical review one quarter would raise the median age of PRs closed this quarter, but that's still a good thing. So the data might become more difficult to interpret.

So, I think there are basically three ways to select PRs over which we make stats / grouping / etc for one quarter:

  • PRs created this quarter;
  • PRs still open at the end of the quarter (or any specific date);
  • PRs closed this quarter;
    and each set will give slightly different information.

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, there isn't an easy answer or obvious better way. So let's leave as-is for now.

old_days = 365

new = Counter()
med = Counter()
old = Counter()

for beg, end, com in pr_dates():
if end is None:
tomorrow = datetime.now().date() + timedelta(days=1)
n_days = (tomorrow - beg).days
else:
n_days = (end - beg).days
for i in range(n_days):
q = quarter(beg + timedelta(days=i))
q1 = quarter(beg + timedelta(days=i+1))
# Only count on each quarter's last day
if q == q1:
continue
if i <= new_days:
new[q] += 1
elif i <= old_days:
med[q] += 1
else:
old[q] += 1

first_q = quarter(first)
last_q = quarter(last)

quarters = (q for q in chain(new, med, old) if first_q <= q <= last_q)
quarters = tuple(sorted(set(quarters)))

new_y = tuple(new[q] for q in quarters)
med_y = tuple(med[q] for q in quarters)
old_y = tuple(old[q] for q in quarters)
sum_y = tuple(old[q] + med[q] for q in quarters)

old_name = "older than {} days".format(old_days)
med_name = "medium"
new_name = "recent (less {} days old)".format(new_days)

width = 0.9
fig, ax = plt.subplots()
ax.bar(quarters, old_y, width, label=old_name)
ax.bar(quarters, med_y, width, label=med_name, bottom=old_y)
ax.bar(quarters, new_y, width, label=new_name, bottom=sum_y)
ax.legend(loc="upper left")
ax.grid(True)
ax.set_xlabel("quarter")
ax.set_ylabel("Number or PRs pending")
ax.tick_params(axis="x", labelrotation=90)
fig.suptitle("State of the PR backlog at the end of each quarter")
fig.set_size_inches(12.8, 7.2) # default 100 dpi -> 720p
fig.savefig("prs-backlog.png")

print("Quarter,recent,medium,old,total")
for q in quarters:
print("{},{},{},{},{}".format(q, new[q], med[q], old[q],
new[q] + med[q] + old[q]))