-
-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Hi!
I'm encountering an issue while using PyDriller's only_authors parameter. According to the documentation, it states:
only_authors (List[str]): only analyses commits that are made by these authors. The check is made on the username, NOT the email.
However, when I pass a list of GitHub usernames to only_authors, I receive an empty response. If I instead provide the full names of the authors (as they appear in the commit metadata), I get results.
The problem is that some users have multiple full names recorded across different commits (possibly due to different Git configurations), which makes it difficult to track a single user reliably across repositories. This creates logistical challenges when analyzing a large number of commits, as I need to manually map all name variations to their corresponding GitHub username.
Given the documentation, I expected PyDriller to match GitHub usernames when filtering commits using only_authors.
Here's a simplified version of my code:
for commit in Repository(path_to_repo=repo_list, since=since, to=to, only_authors=copilot_authors).traverse_commits():
record = {
"User": commit.author.name,
"Date": commit.committer_date.date(),
"Repository": commit.project_name,
"Insertions": commit.insertions,
"Deletions": commit.deletions,
"Total lines": commit.lines,
"Files changed": commit.files,
"Unit size": unit_size,
"Complexity": complexity,
}
Does only_authors actually check usernames, or does it filter based on the commit author’s full name as recorded in Git?
Thankyou!