Skip to content
3 changes: 3 additions & 0 deletions .env-example
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ END_DATE = ""
ORGANIZATION = "organization"
REPOSITORY = "organization/repository"
START_DATE = ""
SPONSOR_INFO = "False"
LINK_TO_PROFILE = "True"
ACKNOWLEDGE_COAUTHORS = "True"

# GITHUB APP
GH_APP_ID = ""
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,20 +84,23 @@ This action can be configured to authenticate with GitHub App Installation or Pe

#### Other Configuration Options

| field | required | default | description |
| ------------------- | ----------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `GH_ENTERPRISE_URL` | False | "" | The `GH_ENTERPRISE_URL` is used to connect to an enterprise server instance of GitHub. github.com users should not enter anything here. |
| `ORGANIZATION` | Required to have `ORGANIZATION` or `REPOSITORY` | | The name of the GitHub organization which you want the contributor information of all repos from. ie. github.com/github would be `github` |
| `REPOSITORY` | Required to have `ORGANIZATION` or `REPOSITORY` | | The name of the repository and organization which you want the contributor information from. ie. `github/contributors` or a comma separated list of multiple repositories `github/contributor,super-linter/super-linter` |
| `START_DATE` | False | Beginning of time | The date from which you want to start gathering contributor information. ie. Aug 1st, 2023 would be `2023-08-01`. |
| `END_DATE` | False | Current Date | The date at which you want to stop gathering contributor information. Must be later than the `START_DATE`. ie. Aug 2nd, 2023 would be `2023-08-02` |
| `SPONSOR_INFO` | False | False | If you want to include sponsor information in the output. This will include the sponsor count and the sponsor URL. This will impact action performance. ie. SPONSOR_INFO = "False" or SPONSOR_INFO = "True" |
| `LINK_TO_PROFILE` | False | True | If you want to link usernames to their GitHub profiles in the output. ie. LINK_TO_PROFILE = "True" or LINK_TO_PROFILE = "False" |
| field | required | default | description |
| ----------------------- | ----------------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GH_ENTERPRISE_URL` | False | "" | The `GH_ENTERPRISE_URL` is used to connect to an enterprise server instance of GitHub. github.com users should not enter anything here. |
| `ORGANIZATION` | Required to have `ORGANIZATION` or `REPOSITORY` | | The name of the GitHub organization which you want the contributor information of all repos from. ie. github.com/github would be `github` |
| `REPOSITORY` | Required to have `ORGANIZATION` or `REPOSITORY` | | The name of the repository and organization which you want the contributor information from. ie. `github/contributors` or a comma separated list of multiple repositories `github/contributor,super-linter/super-linter` |
| `START_DATE` | False | Beginning of time | The date from which you want to start gathering contributor information. ie. Aug 1st, 2023 would be `2023-08-01`. |
| `END_DATE` | False | Current Date | The date at which you want to stop gathering contributor information. Must be later than the `START_DATE`. ie. Aug 2nd, 2023 would be `2023-08-02` |
| `SPONSOR_INFO` | False | False | If you want to include sponsor information in the output. This will include the sponsor count and the sponsor URL. This will impact action performance. ie. SPONSOR_INFO = "False" or SPONSOR_INFO = "True" |
| `LINK_TO_PROFILE` | False | True | If you want to link usernames to their GitHub profiles in the output. ie. LINK_TO_PROFILE = "True" or LINK_TO_PROFILE = "False" |
| `ACKNOWLEDGE_COAUTHORS` | False | True | If you want to include co-authors from commit messages as contributors. Co-authors are identified via the `Co-authored-by:` trailer in commit messages. The action will extract GitHub usernames from GitHub noreply emails (e.g., `[email protected]`) or use the full email address for other email domains. This will impact action performance as it requires scanning all commits. ie. ACKNOWLEDGE_COAUTHORS = "True" or ACKNOWLEDGE_COAUTHORS = "False" |

**Note**: If `start_date` and `end_date` are specified then the action will determine if the contributor is new. A new contributor is one that has contributed in the date range specified but not before the start date.

**Performance Note:** Using start and end dates will reduce speed of the action by approximately 63X. ie without dates if the action takes 1.7 seconds, it will take 1 minute and 47 seconds.

**Co-authors Note:** When `ACKNOWLEDGE_COAUTHORS` is enabled, the action will scan commit messages for `Co-authored-by:` trailers and include those users as contributors. For GitHub noreply email addresses (e.g., `[email protected]`), the username will be extracted. For other email addresses (e.g., `[email protected]`), the full email address will be used as the contributor identifier. See [GitHub's documentation on creating commits with multiple authors](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors).
Comment on lines +96 to +102
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is incomplete. It mentions that GitHub noreply emails extract the username and other emails use the full email address, but it doesn't document that @github.com email addresses also extract the username (part before @), or that the action attempts to use the GitHub Search Users API to find usernames for other email addresses before falling back to the email address.

Suggested change
| `ACKNOWLEDGE_COAUTHORS` | False | True | If you want to include co-authors from commit messages as contributors. Co-authors are identified via the `Co-authored-by:` trailer in commit messages. The action will extract GitHub usernames from GitHub noreply emails (e.g., `[email protected]`) or use the full email address for other email domains. This will impact action performance as it requires scanning all commits. ie. ACKNOWLEDGE_COAUTHORS = "True" or ACKNOWLEDGE_COAUTHORS = "False" |
**Note**: If `start_date` and `end_date` are specified then the action will determine if the contributor is new. A new contributor is one that has contributed in the date range specified but not before the start date.
**Performance Note:** Using start and end dates will reduce speed of the action by approximately 63X. ie without dates if the action takes 1.7 seconds, it will take 1 minute and 47 seconds.
**Co-authors Note:** When `ACKNOWLEDGE_COAUTHORS` is enabled, the action will scan commit messages for `Co-authored-by:` trailers and include those users as contributors. For GitHub noreply email addresses (e.g., `[email protected]`), the username will be extracted. For other email addresses (e.g., `[email protected]`), the full email address will be used as the contributor identifier. See [GitHub's documentation on creating commits with multiple authors](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors).
| `ACKNOWLEDGE_COAUTHORS` | False | True | If you want to include co-authors from commit messages as contributors. Co-authors are identified via the `Co-authored-by:` trailer in commit messages. The action will extract GitHub usernames from GitHub noreply emails (e.g., `[email protected]`) and from `@github.com` email addresses (using the part before `@`), and for other email domains it will first attempt to resolve the email to a GitHub username via the GitHub Search Users API before falling back to using the full email address. This will impact action performance as it requires scanning all commits. ie. ACKNOWLEDGE_COAUTHORS = "True" or ACKNOWLEDGE_COAUTHORS = "False" |
**Note**: If `start_date` and `end_date` are specified then the action will determine if the contributor is new. A new contributor is one that has contributed in the date range specified but not before the start date.
**Performance Note:** Using start and end dates will reduce speed of the action by approximately 63X. ie without dates if the action takes 1.7 seconds, it will take 1 minute and 47 seconds.
**Co-authors Note:** When `ACKNOWLEDGE_COAUTHORS` is enabled, the action will scan commit messages for `Co-authored-by:` trailers and include those users as contributors. For GitHub noreply email addresses (e.g., `[email protected]`), the username will be extracted. For `@github.com` email addresses (e.g., `[email protected]`), the part before `@` will be treated as the GitHub username. For other email addresses (e.g., `[email protected]`), the action will first attempt to resolve the email to a GitHub username using the GitHub Search Users API and, if no matching user is found, will fall back to using the full email address as the contributor identifier. See [GitHub's documentation on creating commits with multiple authors](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors).

Copilot uses AI. Check for mistakes.

### Example workflows

**Be sure to change at least these values: `<YOUR_ORGANIZATION_GOES_HERE>`, `<YOUR_GITHUB_HANDLE_HERE>`**
Expand Down
176 changes: 173 additions & 3 deletions contributors.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# pylint: disable=broad-exception-caught
"""This file contains the main() and other functions needed to get contributor information from the organization or repository"""

import re
from typing import List

import auth
Expand All @@ -27,6 +28,7 @@ def main():
end_date,
sponsor_info,
link_to_profile,
acknowledge_coauthors,
) = env.get_env_vars()

# Auth to GitHub.com
Expand All @@ -46,7 +48,13 @@ def main():

# Get the contributors
contributors = get_all_contributors(
organization, repository_list, start_date, end_date, github_connection, ghe
organization,
repository_list,
start_date,
end_date,
github_connection,
ghe,
acknowledge_coauthors,
)

# Check for new contributor if user provided start_date and end_date
Expand All @@ -60,6 +68,7 @@ def main():
end_date=start_date,
github_connection=github_connection,
ghe=ghe,
acknowledge_coauthors=acknowledge_coauthors,
)
for contributor in contributors:
contributor.new_contributor = contributor_stats.is_new_contributor(
Expand Down Expand Up @@ -103,6 +112,7 @@ def get_all_contributors(
end_date: str,
github_connection: object,
ghe: str,
acknowledge_coauthors: bool,
):
"""
Get all contributors from the organization or repository
Expand All @@ -113,6 +123,8 @@ def get_all_contributors(
start_date (str): The start date of the date range for the contributor list.
end_date (str): The end date of the date range for the contributor list.
github_connection (object): The authenticated GitHub connection object from PyGithub
ghe (str): The GitHub Enterprise URL to use for authentication
acknowledge_coauthors (bool): Whether to acknowledge co-authors from commit messages

Returns:
all_contributors (list): A list of ContributorStats objects
Expand All @@ -130,7 +142,14 @@ def get_all_contributors(
all_contributors = []
if repos:
for repo in repos:
repo_contributors = get_contributors(repo, start_date, end_date, ghe)
repo_contributors = get_contributors(
repo,
start_date,
end_date,
ghe,
acknowledge_coauthors,
github_connection,
)
if repo_contributors:
all_contributors.append(repo_contributors)

Expand All @@ -140,20 +159,91 @@ def get_all_contributors(
return all_contributors


def get_contributors(repo: object, start_date: str, end_date: str, ghe: str):
def get_coauthors_from_message(
commit_message: str, github_connection: object = None
) -> List[str]:
"""
Extract co-author identifiers from a commit message.

Co-authored-by trailers follow the format:
Co-authored-by: Name <email>

For GitHub noreply emails ([email protected]), extracts the username.
For @github.com emails, extracts the username (part before @).
For other emails, uses GitHub Search Users API to find the username, or falls back to email.

Args:
commit_message (str): The commit message to parse
github_connection (object): The authenticated GitHub connection object from PyGithub

Returns:
List[str]: List of co-author identifiers (GitHub usernames or email addresses)
"""
# Match Co-authored-by trailers - case insensitive
# Format: Co-authored-by: Name <email>
pattern = r"Co-authored-by:\s*[^<]*<([^>]+)>"
matches = re.findall(pattern, commit_message, re.IGNORECASE)

identifiers = []
for email in matches:
# Check if it's a GitHub noreply email format: [email protected]
noreply_pattern = r"^(\d+\+)?([^@]+)@users\.noreply\.github\.com$"
noreply_match = re.match(noreply_pattern, email)
if noreply_match:
# For GitHub noreply emails, extract just the username
identifiers.append(noreply_match.group(2))
elif email.endswith("@github.com"):
# For @github.com emails, extract the username (part before @)
username = email.split("@")[0]
identifiers.append(username)
Comment on lines +195 to +198
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is missing for the @github.com email handling path. Add a test case that verifies co-authors with @github.com email addresses are correctly parsed to extract the username before the @ symbol.

Copilot uses AI. Check for mistakes.
else:
# For other emails, try to find GitHub username using Search Users API
if github_connection:
try:
# Search for users by email
search_result = github_connection.search_users(f"email:{email}")
if search_result.totalCount > 0:
# Use the first matching user's login
identifiers.append(search_result[0].login)
else:
# If no user found, fall back to email address
identifiers.append(email)
except Exception:
# If API call fails, fall back to email address
identifiers.append(email)
Comment on lines +200 to +213
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub Users Search API call at line 204 may experience rate limiting when processing repositories with many co-authors using non-GitHub email addresses. Each unique email requires an API call, which could significantly impact performance and potentially exhaust API rate limits. Consider adding rate limit handling or caching of email-to-username mappings to mitigate this issue.

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +213
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is missing for the GitHub Search Users API fallback path (lines 200-213). Add test cases that verify: 1) successful username lookup via the search API when a matching user is found, 2) fallback to email address when no user is found (totalCount = 0), and 3) fallback to email address when the API call raises an exception.

Copilot uses AI. Check for mistakes.
else:
# If no GitHub connection available, use the full email address
identifiers.append(email)
return identifiers


def get_contributors(
repo: object,
start_date: str,
end_date: str,
ghe: str,
acknowledge_coauthors: bool,
github_connection: object,
):
"""
Get contributors from a single repository and filter by start end dates if present.

Args:
repo (object): The repository object from PyGithub
start_date (str): The start date of the date range for the contributor list.
end_date (str): The end date of the date range for the contributor list.
ghe (str): The GitHub Enterprise URL to use for authentication
acknowledge_coauthors (bool): Whether to acknowledge co-authors from commit messages
github_connection (object): The authenticated GitHub connection object from PyGithub

Returns:
contributors (list): A list of ContributorStats objects
"""
all_repo_contributors = repo.contributors()
contributors = []
# Track usernames already added as contributors
contributor_usernames = set()
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contributor_usernames set is created and populated at line 280 but never actually used. If the intention was to prevent duplicate contributors when a user is both a regular contributor and a co-author in the same repository, the set should be used to filter co-authors. Otherwise, this variable should be removed as it serves no purpose.

Copilot uses AI. Check for mistakes.

try:
for user in all_repo_contributors:
# Ignore contributors with [bot] in their name
Expand Down Expand Up @@ -187,6 +277,19 @@ def get_contributors(repo: object, start_date: str, end_date: str, ghe: str):
"",
)
contributors.append(contributor)
contributor_usernames.add(user.login)

# Get co-authors from commit messages if enabled
if acknowledge_coauthors:
coauthor_contributors = get_coauthor_contributors(
repo,
start_date,
end_date,
ghe,
github_connection,
)
contributors.extend(coauthor_contributors)
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When acknowledge_coauthors is True and a user is both a regular contributor and a co-author on some commits in the same repository, they will appear twice in the results list. While the merge_contributors function will later merge these duplicates, this creates unnecessary processing. Consider checking if a username already exists in contributor_usernames before adding co-authors to avoid creating duplicates within a single repository.

Suggested change
contributors.extend(coauthor_contributors)
# Avoid adding duplicate contributors for the same username within this repository
filtered_coauthors = []
for coauthor in coauthor_contributors:
username = getattr(coauthor, "username", None) or getattr(
coauthor, "login", None
)
if username and username not in contributor_usernames:
filtered_coauthors.append(coauthor)
contributor_usernames.add(username)
contributors.extend(filtered_coauthors)

Copilot uses AI. Check for mistakes.

except Exception as e:
print(f"Error getting contributors for repository: {repo.full_name}")
print(e)
Expand All @@ -195,5 +298,72 @@ def get_contributors(repo: object, start_date: str, end_date: str, ghe: str):
return contributors


def get_coauthor_contributors(
repo: object,
start_date: str,
end_date: str,
ghe: str,
github_connection: object,
) -> List[contributor_stats.ContributorStats]:
"""
Get contributors who were co-authors on commits in the repository.

Args:
repo (object): The repository object
start_date (str): The start date of the date range for the contributor list.
end_date (str): The end date of the date range for the contributor list.
ghe (str): The GitHub Enterprise URL
github_connection (object): The authenticated GitHub connection object from PyGithub

Returns:
List[ContributorStats]: A list of ContributorStats objects for co-authors
"""
coauthor_counts: dict = {} # username -> count
endpoint = ghe if ghe else "https://github.com"

try:
# Get all commits in the date range
if start_date and end_date:
commits = repo.commits(since=start_date, until=end_date)
else:
commits = repo.commits()

for commit in commits:
# Get commit message from the commit object
commit_message = commit.commit.message if commit.commit else ""
if not commit_message:
continue

# Extract co-authors from commit message
coauthors = get_coauthors_from_message(commit_message, github_connection)
for username in coauthors:
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot accounts listed as co-authors via Co-authored-by: trailers will be included in the contributor list, while bot accounts that are regular contributors are filtered out (line 250). Consider applying the same bot filtering logic to co-authors for consistency by checking if "[bot]" is in the username before adding them to coauthor_counts.

Suggested change
for username in coauthors:
for username in coauthors:
# Skip bot accounts for consistency with regular contributor filtering
if "[bot]" in username.lower():
continue

Copilot uses AI. Check for mistakes.
coauthor_counts[username] = coauthor_counts.get(username, 0) + 1

except Exception as e:
print(f"Error getting co-authors for repository: {repo.full_name}")
print(e)
return []

# Create ContributorStats objects for co-authors
coauthor_contributors = []
for username, count in coauthor_counts.items():
if start_date and end_date:
commit_url = f"{endpoint}/{repo.full_name}/commits?author={username}&since={start_date}&until={end_date}"
else:
commit_url = f"{endpoint}/{repo.full_name}/commits?author={username}"

contributor = contributor_stats.ContributorStats(
username,
False,
"", # No avatar URL available for co-authors
count,
commit_url,
"",
)
coauthor_contributors.append(contributor)

return coauthor_contributors


if __name__ == "__main__":
main()
Loading
Loading