Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing by dklawren · Pull Request #12 · mozilla-conduit/github-etl

dklawren · 2026-03-16T19:16:28Z

extract_reviewers: filters out any review where user is null, preserving empty-body reviews (e.g. approve without comment)
extract_comments: filters out any comment where user is null or body is empty
transform_data: defensive (review.get("user") or {}) so a null user won't raise AttributeError even if it somehow reaches the transform

…body and user object is null. The ETL script needs to handle these better instead of crashing

Copilot

Pull request overview

Adds defensive handling for GitHub PR reviews/comments where user is null or body is empty, preventing ETL script crashes on older repositories with malformed data.

Changes:

extract_reviewers: Filters out reviews with null user and logs skipped count
extract_comments: Filters out comments with null user or empty body and logs skipped count
transform_data: Uses (review.get("user") or {}) pattern to safely handle null user objects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

shtrom

I don't think the error we saw will be addressed by this fix.

Is there a way you could run one iteration of the loop against a given PR locally? That would make it easier to reproduce topical issues.

shtrom · 2026-03-17T23:08:51Z

main.py


-    logger.info(f"Extracted {len(reviewers)} reviewers for PR #{pr_number}")
-    return reviewers
+    filtered = [r for r in reviewers if r.get("user") is not None]


nit: Unless you expect r["user"] to be validly False or "", you can just

Suggested change

filtered = [r for r in reviewers if r.get("user") is not None]

filtered = [r for r in reviewers if r.get("user")]

shtrom · 2026-03-17T23:09:50Z

main.py

-    logger.info(f"Extracted {len(comments)} comments for PR #{pr_number}")
-    return comments
+
+    filtered = [c for c in comments if c.get("user") is not None and c.get("body")]


Ditto.

Suggested change

filtered = [c for c in comments if c.get("user") is not None and c.get("body")]

filtered = [c for c in comments if c.get("user") and c.get("body")]

shtrom · 2026-03-17T23:12:31Z

main.py

                "date_reviewed": review.get("submitted_at"),
                "reviewer_email": None,  # TODO Placeholder for reviewer email extraction logic
-                "reviewer_username": review.get("user", {}).get("login", "None"),
+                "reviewer_username": (review.get("user") or {}).get("login"),


I think the problem with the error message we saw is not that user was empty, but that review was None, so that the initial Review.get... was a None.get..., which failed.

shtrom

Actually, I didn't consider all options.

I thought review== so that review.get raises that error
however I now think you may be right about review['user']==None, in which case your fix is correct.

In any case, I think that call for some local one-shot runability to confirm (:

Bug 2023624 - Some older repos have comments that have empty comment …

6de1476

…body and user object is null. The ETL script needs to handle these better instead of crashing

dklawren requested review from cgsheeh and Copilot March 16, 2026 19:16

Copilot started reviewing on behalf of dklawren March 16, 2026 19:16 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

dklawren requested a review from shtrom March 17, 2026 14:06

Black formatted

93e2486

shtrom requested changes Mar 17, 2026

View reviewed changes

shtrom approved these changes Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12
dklawren wants to merge 2 commits intomainfrom
2023624

dklawren commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

shtrom left a comment

Uh oh!

shtrom Mar 17, 2026

Uh oh!

shtrom Mar 17, 2026

Uh oh!

shtrom Mar 17, 2026

Uh oh!

shtrom left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	filtered = [r for r in reviewers if r.get("user") is not None]
	filtered = [r for r in reviewers if r.get("user")]

	filtered = [c for c in comments if c.get("user") is not None and c.get("body")]
	filtered = [c for c in comments if c.get("user") and c.get("body")]

Conversation

dklawren commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

shtrom left a comment

Choose a reason for hiding this comment

Uh oh!

shtrom Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shtrom Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shtrom Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shtrom left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants