Skip to content

Commit d562d71

Browse files
Copilotzkoppert
andcommitted
Enhance docstrings with detailed parameter descriptions and return types
Co-authored-by: zkoppert <6935431+zkoppert@users.noreply.github.com>
1 parent 5f794e2 commit d562d71

File tree

5 files changed

+409
-59
lines changed

5 files changed

+409
-59
lines changed

auth.py

Lines changed: 102 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,24 @@
1-
"""This is the module that contains functions related to authenticating to GitHub with a personal access token."""
1+
"""GitHub authentication module for the InnerSource measurement tool.
2+
3+
This module provides functions for authenticating with GitHub using either Personal Access
4+
Tokens (PAT) or GitHub App installations. It supports both GitHub.com and GitHub Enterprise
5+
Server installations.
6+
7+
Authentication Methods:
8+
1. Personal Access Token (PAT) - Simple token-based authentication
9+
2. GitHub App Installation - More secure app-based authentication with JWT
10+
11+
The module handles the complexity of different authentication methods and provides
12+
a unified interface for establishing authenticated connections to GitHub's API.
13+
14+
Functions:
15+
auth_to_github: Create an authenticated GitHub client connection
16+
get_github_app_installation_token: Obtain installation tokens for GitHub Apps
17+
18+
Dependencies:
19+
- github3.py: GitHub API client library
20+
- requests: HTTP library for API calls
21+
"""
222

323
import github3
424
import requests
@@ -13,19 +33,52 @@ def auth_to_github(
1333
gh_app_enterprise_only: bool,
1434
) -> github3.GitHub:
1535
"""
16-
Connect to GitHub.com or GitHub Enterprise, depending on env variables.
17-
36+
Establish an authenticated connection to GitHub.com or GitHub Enterprise.
37+
38+
This function creates an authenticated GitHub client using either Personal Access Token
39+
or GitHub App authentication. It supports both GitHub.com and GitHub Enterprise
40+
installations.
41+
42+
Authentication Priority:
43+
1. GitHub App authentication (if all app credentials are provided)
44+
2. Personal Access Token authentication (if token is provided)
45+
1846
Args:
19-
token (str): the GitHub personal access token
20-
gh_app_id (int | None): the GitHub App ID
21-
gh_app_installation_id (int | None): the GitHub App Installation ID
22-
gh_app_private_key_bytes (bytes): the GitHub App Private Key
23-
ghe (str): the GitHub Enterprise URL
24-
gh_app_enterprise_only (bool): Set this to true if the GH APP is created
25-
on GHE and needs to communicate with GHE api only
47+
token (str): The GitHub personal access token for authentication.
48+
Can be empty if using GitHub App authentication.
49+
gh_app_id (int | None): The GitHub App ID for app-based authentication.
50+
Required along with other app credentials for app auth.
51+
gh_app_installation_id (int | None): The GitHub App Installation ID.
52+
Required for app-based authentication.
53+
gh_app_private_key_bytes (bytes): The GitHub App Private Key as bytes.
54+
Required for app-based authentication.
55+
ghe (str): The GitHub Enterprise URL (e.g., "https://github.company.com").
56+
Leave empty for GitHub.com.
57+
gh_app_enterprise_only (bool): Set to True if the GitHub App is created
58+
on GitHub Enterprise and should only communicate
59+
with the GHE API endpoint.
2660
2761
Returns:
28-
github3.GitHub: the GitHub connection object
62+
github3.GitHub: An authenticated GitHub client object that can be used
63+
to make API calls to GitHub.
64+
65+
Raises:
66+
ValueError: If authentication fails due to:
67+
- Missing required credentials (no token or incomplete app credentials)
68+
- Unable to establish connection to GitHub
69+
70+
Examples:
71+
>>> # Using Personal Access Token
72+
>>> client = auth_to_github(token="ghp_...", gh_app_id=None,
73+
... gh_app_installation_id=None,
74+
... gh_app_private_key_bytes=b"",
75+
... ghe="", gh_app_enterprise_only=False)
76+
77+
>>> # Using GitHub App
78+
>>> client = auth_to_github(token="", gh_app_id=12345,
79+
... gh_app_installation_id=67890,
80+
... gh_app_private_key_bytes=private_key_bytes,
81+
... ghe="", gh_app_enterprise_only=False)
2982
"""
3083
if gh_app_id and gh_app_private_key_bytes and gh_app_installation_id:
3184
if ghe and gh_app_enterprise_only:
@@ -58,17 +111,47 @@ def get_github_app_installation_token(
58111
gh_app_installation_id: str,
59112
) -> str | None:
60113
"""
61-
Get a GitHub App Installation token.
62-
API: https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/authenticating-as-a-github-app-installation # noqa: E501
63-
114+
Obtain a GitHub App Installation access token using JWT authentication.
115+
116+
This function creates a JWT token using the GitHub App's private key and exchanges
117+
it for an installation access token that can be used to authenticate API requests
118+
on behalf of the installed app.
119+
120+
Reference: https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/authenticating-as-a-github-app-installation
121+
64122
Args:
65-
ghe (str): the GitHub Enterprise endpoint
66-
gh_app_id (str): the GitHub App ID
67-
gh_app_private_key_bytes (bytes): the GitHub App Private Key
68-
gh_app_installation_id (str): the GitHub App Installation ID
123+
ghe (str): The GitHub Enterprise endpoint URL (e.g., "https://github.company.com").
124+
Leave empty for GitHub.com.
125+
gh_app_id (str): The GitHub App ID as a string.
126+
gh_app_private_key_bytes (bytes): The GitHub App Private Key in bytes format.
127+
This should be the complete private key including
128+
the header and footer.
129+
gh_app_installation_id (str): The GitHub App Installation ID as a string.
130+
This identifies the specific installation of the app.
69131
70132
Returns:
71-
str: the GitHub App token
133+
str | None: The installation access token if successful, None if the request
134+
fails or if there's an error in the authentication process.
135+
136+
Raises:
137+
No exceptions are raised directly, but request failures are handled gracefully
138+
and logged to the console.
139+
140+
Notes:
141+
- The token has a default expiration time (typically 1 hour)
142+
- The token provides access to resources the app installation has been granted
143+
- Network errors and API failures are handled gracefully with None return
144+
145+
Examples:
146+
>>> private_key = b"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----"
147+
>>> token = get_github_app_installation_token(
148+
... ghe="",
149+
... gh_app_id="12345",
150+
... gh_app_private_key_bytes=private_key,
151+
... gh_app_installation_id="67890"
152+
... )
153+
>>> if token:
154+
... print("Successfully obtained installation token")
72155
"""
73156
jwt_headers = github3.apps.create_jwt_headers(gh_app_private_key_bytes, gh_app_id)
74157
api_endpoint = f"{ghe}/api/v3" if ghe else "https://api.github.com"

config.py

Lines changed: 78 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,27 @@ def __repr__(self):
8585

8686

8787
def get_bool_env_var(env_var_name: str, default: bool = False) -> bool:
88-
"""Get a boolean environment variable.
88+
"""Get a boolean environment variable with proper type conversion.
89+
90+
This function retrieves an environment variable and converts it to a boolean.
91+
Only the string "true" (case-insensitive) is considered True; all other
92+
values are considered False.
8993
9094
Args:
91-
env_var_name: The name of the environment variable to retrieve.
92-
default: The default value to return if the environment variable is not set.
95+
env_var_name (str): The name of the environment variable to retrieve.
96+
default (bool, optional): The default value to return if the environment
97+
variable is not set or is empty. Defaults to False.
9398
9499
Returns:
95-
The value of the environment variable as a boolean.
100+
bool: True if the environment variable is set to "true" (case-insensitive),
101+
False otherwise, or the default value if the variable is not set.
102+
103+
Examples:
104+
>>> os.environ['TEST_VAR'] = 'true'
105+
>>> get_bool_env_var('TEST_VAR')
106+
True
107+
>>> get_bool_env_var('NONEXISTENT_VAR', default=True)
108+
True
96109
"""
97110
ev = os.environ.get(env_var_name, "")
98111
if ev == "" and default:
@@ -101,13 +114,27 @@ def get_bool_env_var(env_var_name: str, default: bool = False) -> bool:
101114

102115

103116
def get_int_env_var(env_var_name: str) -> int | None:
104-
"""Get an integer environment variable.
117+
"""Get an integer environment variable with proper type conversion and validation.
118+
119+
This function retrieves an environment variable and attempts to convert it to an integer.
120+
If the conversion fails or the variable is not set, it returns None.
105121
106122
Args:
107-
env_var_name: The name of the environment variable to retrieve.
123+
env_var_name (str): The name of the environment variable to retrieve.
108124
109125
Returns:
110-
The value of the environment variable as an integer or None.
126+
int | None: The value of the environment variable as an integer, or None if
127+
the variable is not set, empty, or cannot be converted to an integer.
128+
129+
Examples:
130+
>>> os.environ['PORT'] = '8080'
131+
>>> get_int_env_var('PORT')
132+
8080
133+
>>> get_int_env_var('NONEXISTENT_VAR')
134+
None
135+
>>> os.environ['INVALID_INT'] = 'not-a-number'
136+
>>> get_int_env_var('INVALID_INT')
137+
None
111138
"""
112139
env_var = os.environ.get(env_var_name)
113140
if env_var is None or not env_var.strip():
@@ -120,9 +147,50 @@ def get_int_env_var(env_var_name: str) -> int | None:
120147

121148
def get_env_vars(test: bool = False) -> EnvVars:
122149
"""
123-
Get the environment variables for use in the script.
124-
125-
Returns EnvVars object with all environment variables
150+
Get and validate all environment variables required for the InnerSource measurement tool.
151+
152+
This function loads environment variables from the system and an optional .env file,
153+
validates them, and returns a structured EnvVars object containing all configuration
154+
needed to run the tool.
155+
156+
Args:
157+
test (bool, optional): If True, skip loading the .env file (used for testing).
158+
Defaults to False.
159+
160+
Returns:
161+
EnvVars: A structured object containing all validated environment variables
162+
and configuration settings.
163+
164+
Raises:
165+
ValueError: If required environment variables are missing or invalid:
166+
- Missing GitHub authentication (GH_TOKEN or GitHub App credentials)
167+
- Missing or invalid REPOSITORY format (must be "owner/repo")
168+
- Incomplete GitHub App credentials (missing ID, key, or installation ID)
169+
170+
Environment Variables Required:
171+
Authentication (choose one):
172+
- GH_TOKEN: GitHub personal access token
173+
- GH_APP_ID + GH_APP_PRIVATE_KEY + GH_APP_INSTALLATION_ID: GitHub App credentials
174+
175+
Repository:
176+
- REPOSITORY: Repository to analyze in "owner/repo" format
177+
178+
Optional:
179+
- GH_ENTERPRISE_URL: GitHub Enterprise URL (for on-premises installations)
180+
- GITHUB_APP_ENTERPRISE_ONLY: Set to "true" for GHE-only GitHub Apps
181+
- REPORT_TITLE: Custom title for the report (default: "InnerSource Report")
182+
- OUTPUT_FILE: Output filename (default: "innersource_report.md")
183+
- RATE_LIMIT_BYPASS: Set to "true" to bypass rate limiting
184+
- CHUNK_SIZE: Number of items to process at once (default: 100, minimum: 10)
185+
186+
Examples:
187+
>>> os.environ['GH_TOKEN'] = 'ghp_...'
188+
>>> os.environ['REPOSITORY'] = 'octocat/Hello-World'
189+
>>> env_vars = get_env_vars()
190+
>>> print(env_vars.owner)
191+
'octocat'
192+
>>> print(env_vars.repo)
193+
'Hello-World'
126194
"""
127195
if not test: # pragma: no cover
128196
dotenv_path = join(dirname(__file__), ".env")

markdown_helpers.py

Lines changed: 74 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,49 @@
1-
"""Helper functions for working with markdown files."""
1+
"""Markdown file processing utilities for the InnerSource measurement tool.
2+
3+
This module provides helper functions for working with markdown files, particularly
4+
for handling large files that may exceed GitHub's character limits for issue bodies.
5+
6+
GitHub issues have a maximum character limit of 65,535 characters for the body content.
7+
When InnerSource reports are large, they need to be split into smaller files that can
8+
fit within this limit.
9+
10+
Functions:
11+
markdown_too_large_for_issue_body: Check if a markdown file is too large for GitHub issues
12+
split_markdown_file: Split large markdown files into smaller, manageable chunks
13+
14+
Common Use Cases:
15+
- Splitting large InnerSource reports for GitHub issue compatibility
16+
- Managing file sizes for various markdown-based systems with character limits
17+
- Preparing reports for different output formats with size constraints
18+
"""
219

320

421
def markdown_too_large_for_issue_body(file_path: str, max_char_count: int) -> bool:
522
"""
6-
Check if the markdown file is too large to fit into a github issue.
7-
8-
Inputs:
9-
file_path: str - the path to the markdown file to check
10-
max_char_count: int - the maximum number of characters allowed in a github issue body
23+
Check if a markdown file exceeds GitHub's issue body character limit.
24+
25+
GitHub issues have a maximum character limit for the body content. This function
26+
reads a markdown file and determines if it would exceed this limit.
27+
28+
Args:
29+
file_path (str): The path to the markdown file to check. Must be a valid
30+
file path that exists and is readable.
31+
max_char_count (int): The maximum number of characters allowed in a GitHub
32+
issue body. For GitHub.com, this is typically 65,535.
1133
1234
Returns:
13-
bool - True if the file is too large, False otherwise
14-
35+
bool: True if the file contents exceed the character limit, False otherwise.
36+
37+
Raises:
38+
FileNotFoundError: If the specified file does not exist.
39+
PermissionError: If the file cannot be read due to permission issues.
40+
UnicodeDecodeError: If the file contains invalid UTF-8 encoding.
41+
42+
Examples:
43+
>>> # Check if a report is too large for GitHub issues
44+
>>> is_too_large = markdown_too_large_for_issue_body("report.md", 65535)
45+
>>> if is_too_large:
46+
... print("File needs to be split for GitHub issues")
1547
"""
1648
with open(file_path, "r", encoding="utf-8") as file:
1749
file_contents = file.read()
@@ -20,12 +52,41 @@ def markdown_too_large_for_issue_body(file_path: str, max_char_count: int) -> bo
2052

2153
def split_markdown_file(file_path: str, max_char_count: int) -> None:
2254
"""
23-
Split the markdown file into smaller files.
24-
25-
Inputs:
26-
file_path: str - the path to the markdown file to split
27-
max_char_count: int - the maximum number of characters allowed before splitting markdown file
55+
Split a large markdown file into smaller files that fit within size limits.
56+
57+
This function reads a markdown file and splits it into multiple smaller files
58+
when the original file is too large for GitHub issues or other systems with
59+
character limits.
60+
61+
Args:
62+
file_path (str): The path to the markdown file to split. The file must exist
63+
and be readable. The function will create new files with
64+
numbered suffixes in the same directory.
65+
max_char_count (int): The maximum number of characters allowed in each split
66+
file. Content will be split at this boundary.
2867
68+
Returns:
69+
None: This function performs file operations and creates new split files.
70+
71+
Side Effects:
72+
- Creates new files with names like "{original_name}_0.md", "{original_name}_1.md", etc.
73+
- Each new file contains a portion of the original content
74+
- Files are created in the same directory as the original file
75+
- The original file is not modified or deleted
76+
77+
File Naming:
78+
- Original file: "report.md"
79+
- Split files: "report_0.md", "report_1.md", "report_2.md", etc.
80+
81+
Raises:
82+
FileNotFoundError: If the specified file does not exist.
83+
PermissionError: If the file cannot be read or new files cannot be created.
84+
UnicodeDecodeError: If the file contains invalid UTF-8 encoding.
85+
86+
Examples:
87+
>>> # Split a large report into smaller files
88+
>>> split_markdown_file("large_report.md", 65535)
89+
>>> # This creates: large_report_0.md, large_report_1.md, etc.
2990
"""
3091
with open(file_path, "r", encoding="utf-8") as file:
3192
file_contents = file.read()

0 commit comments

Comments
 (0)