Skip to content

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12

Open
dklawren wants to merge 2 commits intomainfrom
2023624
Open

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12
dklawren wants to merge 2 commits intomainfrom
2023624

Conversation

@dklawren
Copy link
Contributor

  • extract_reviewers: filters out any review where user is null, preserving empty-body reviews (e.g. approve without comment)
  • extract_comments: filters out any comment where user is null or body is empty
  • transform_data: defensive (review.get("user") or {}) so a null user won't raise AttributeError even if it somehow reaches the transform

…body and user object is null. The ETL script needs to handle these better instead of crashing
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds defensive handling for GitHub PR reviews/comments where user is null or body is empty, preventing ETL script crashes on older repositories with malformed data.

Changes:

  • extract_reviewers: Filters out reviews with null user and logs skipped count
  • extract_comments: Filters out comments with null user or empty body and logs skipped count
  • transform_data: Uses (review.get("user") or {}) pattern to safely handle null user objects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@dklawren dklawren requested a review from shtrom March 17, 2026 14:06
Copy link
Member

@shtrom shtrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the error we saw will be addressed by this fix.

Is there a way you could run one iteration of the loop against a given PR locally? That would make it easier to reproduce topical issues.


logger.info(f"Extracted {len(reviewers)} reviewers for PR #{pr_number}")
return reviewers
filtered = [r for r in reviewers if r.get("user") is not None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unless you expect r["user"] to be validly False or "", you can just

Suggested change
filtered = [r for r in reviewers if r.get("user") is not None]
filtered = [r for r in reviewers if r.get("user")]

logger.info(f"Extracted {len(comments)} comments for PR #{pr_number}")
return comments

filtered = [c for c in comments if c.get("user") is not None and c.get("body")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Suggested change
filtered = [c for c in comments if c.get("user") is not None and c.get("body")]
filtered = [c for c in comments if c.get("user") and c.get("body")]

"date_reviewed": review.get("submitted_at"),
"reviewer_email": None, # TODO Placeholder for reviewer email extraction logic
"reviewer_username": review.get("user", {}).get("login", "None"),
"reviewer_username": (review.get("user") or {}).get("login"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem with the error message we saw is not that user was empty, but that review was None, so that the initial Review.get... was a None.get..., which failed.

Copy link
Member

@shtrom shtrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I didn't consider all options.

I thought review== so that review.get raises that error
however I now think you may be right about review['user']==None, in which case your fix is correct.

In any case, I think that call for some local one-shot runability to confirm (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants