-
Notifications
You must be signed in to change notification settings - Fork 0
Add traffic collection and slack messaging if those fail #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
78f1c92
Add traffic collection and slack messaging if those fail
pvk-developer 36011b6
Fix lint
pvk-developer 4a030e8
Update weekly config
pvk-developer 8291149
Test workflow
pvk-developer ddd8965
Update cancelled status
pvk-developer 7f03c2a
Fix github.job
pvk-developer 0371756
Use py313
pvk-developer c129ee8
Undo workflow changes
pvk-developer a9bd174
Add clones, views and timeframe
pvk-developer 614e4b7
Make gdrive in drive.py constant
pvk-developer 22f04b4
Update colum names and spreadsheets
pvk-developer ce69353
Fix lint
pvk-developer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| name: Biweekly Traffic collection | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| inputs: | ||
| slack_channel: | ||
| description: Slack channel to post the error message to if the builds fail. | ||
| required: false | ||
| default: "sdv-alerts-debug" | ||
|
|
||
| schedule: | ||
| - cron: "0 0 */14 * *" # Runs every 14 days at midnight UTC | ||
|
|
||
| jobs: | ||
| collect_traffic: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Set up Python ${{ matrix.python-version }} | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.13' | ||
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| python -m pip install . | ||
| - name: Collect Github Traffic Data | ||
| run: | | ||
| github-analytics traffic -v -t ${{ secrets.PERSONAL_ACCESS_TOKEN }} -c traffic_config.yaml | ||
| env: | ||
| PYDRIVE_CREDENTIALS: ${{ secrets.PYDRIVE_CREDENTIALS }} | ||
| alert: | ||
| needs: [collect_traffic] | ||
| runs-on: ubuntu-latest | ||
| if: failure() | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.13' | ||
| - name: Install slack dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| python -m pip install invoke | ||
| python -m pip install -e .[dev] | ||
| - name: Slack alert if failure | ||
| run: python -m github_analytics.slack_utils -r ${{ github.run_id }} -c ${{ github.event.inputs.slack_channel || 'sdv-alerts' }} | ||
| env: | ||
| SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,197 @@ | ||||||
| """Traffic client for retrieving github information.""" | ||||||
|
|
||||||
| import logging | ||||||
|
|
||||||
| import pandas as pd | ||||||
| import requests | ||||||
|
|
||||||
| logging.basicConfig(level=logging.INFO) | ||||||
| LOGGER = logging.getLogger(__name__) | ||||||
|
|
||||||
| GITHUB_API_URL = 'https://api.github.com' | ||||||
|
|
||||||
|
|
||||||
| class TrafficClient: | ||||||
| """Client to fetch traffic data (popular referrers & paths) for a given repository. | ||||||
|
|
||||||
| Args: | ||||||
| token (str): | ||||||
| GitHub personal access token for authentication. | ||||||
| quiet (bool, optional): | ||||||
| If True, suppresses output logging. Defaults to False. | ||||||
| """ | ||||||
|
|
||||||
| def __init__(self, token): | ||||||
| self.token = token | ||||||
| self.headers = { | ||||||
| 'Authorization': f'token {token}', | ||||||
| 'Accept': 'application/vnd.github.v3+json', | ||||||
| } | ||||||
|
|
||||||
| def _get_traffic_data(self, repo: str, endpoint: str) -> list: | ||||||
| """Helper method to fetch traffic data from GitHub's REST API. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
| endpoint (str): | ||||||
| The traffic API endpoint (e.g., "popular/referrers", "popular/paths", "views" or | ||||||
| "clones"). | ||||||
|
|
||||||
| Returns: | ||||||
| list: | ||||||
| The JSON response containing traffic data. | ||||||
|
|
||||||
| Raises: | ||||||
| RuntimeError: | ||||||
| If the API request fails. | ||||||
| """ | ||||||
| url = f'{GITHUB_API_URL}/repos/{repo}/traffic/{endpoint}' | ||||||
| LOGGER.info(f'Fetching traffic data from: {url}') | ||||||
|
|
||||||
| response = requests.get(url, headers=self.headers) | ||||||
|
|
||||||
| if response.status_code == 200: | ||||||
| LOGGER.info(f'Successfully retrieved {endpoint} data for {repo}.') | ||||||
| return response.json() | ||||||
| else: | ||||||
| LOGGER.error(f'GitHub API Error ({response.status_code}): {response.json()}') | ||||||
| raise RuntimeError(f'GitHub API Error ({response.status_code}): {response.json()}') | ||||||
|
|
||||||
| def get_traffic_referrers(self, repo: str) -> pd.DataFrame: | ||||||
| """Fetches the top referring domains that send traffic to the given repository. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
|
|
||||||
| Returns: | ||||||
| pd.DataFrame: | ||||||
| DataFrame containing referrer traffic details with columns: | ||||||
| - `referrer`: Source domain. | ||||||
| - `count`: Number of views. | ||||||
| - `uniques`: Number of unique visitors. | ||||||
| """ | ||||||
| LOGGER.info(f'Fetching traffic referrers for {repo}.') | ||||||
| data = self._get_traffic_data(repo, 'popular/referrers') | ||||||
| df = pd.DataFrame(data, columns=['referrer', 'count', 'uniques']) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| df = df.rename( | ||||||
| columns={'referrer': 'site', 'count': 'views', 'uniques': 'unique_visitors'}, | ||||||
| ) | ||||||
| LOGGER.info(f'Retrieved {len(df)} referrer records for {repo}.') | ||||||
| return df | ||||||
|
|
||||||
| def get_traffic_paths(self, repo: str) -> pd.DataFrame: | ||||||
| """Fetches the most visited paths in the given repository. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
|
|
||||||
| Returns: | ||||||
| pd.DataFrame: DataFrame containing popular paths with columns: | ||||||
| - `path`: The visited path. | ||||||
| - `title`: Page title. | ||||||
| - `count`: Number of views. | ||||||
| - `uniques`: Number of unique visitors. | ||||||
| """ | ||||||
| LOGGER.info(f'Fetching traffic paths for {repo}.') | ||||||
| data = self._get_traffic_data(repo, 'popular/paths') | ||||||
| df = pd.DataFrame(data, columns=['path', 'title', 'count', 'uniques']) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| df = df.rename( | ||||||
| columns={'path': 'content', 'count': 'views', 'uniques': 'unique_visitors'}, | ||||||
| ) | ||||||
| LOGGER.info(f'Retrieved {len(df)} path records for {repo}.') | ||||||
| return df | ||||||
|
|
||||||
| def get_traffic_views(self, repo: str) -> pd.DataFrame: | ||||||
| """Fetches the number of views for the given repository over time. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
|
|
||||||
| Returns: | ||||||
| pd.DataFrame: | ||||||
| DataFrame containing repository views with columns: | ||||||
| - `timestamp`: Date of views. | ||||||
| - `count`: Number of views. | ||||||
| - `uniques`: Number of unique visitors. | ||||||
| """ | ||||||
| data = self._get_traffic_data(repo, 'views') | ||||||
| df = pd.DataFrame(data['views'], columns=['timestamp', 'count', 'uniques']) | ||||||
| df = df.rename(columns={'count': 'views', 'uniques': 'unique_visitors'}) | ||||||
| LOGGER.info(f'Retrieved {len(df)} views for {repo}.') | ||||||
| return df | ||||||
|
|
||||||
| def get_traffic_clones(self, repo: str) -> pd.DataFrame: | ||||||
| """Fetches the number of repository clones over time. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
|
|
||||||
| Returns: | ||||||
| pd.DataFrame: | ||||||
| DataFrame containing repository clones with columns: | ||||||
| - `timestamp`: Date of clones. | ||||||
| - `count`: Number of clones. | ||||||
| - `uniques`: Number of unique cloners. | ||||||
| """ | ||||||
| data = self._get_traffic_data(repo, 'clones') | ||||||
| df = pd.DataFrame(data['clones'], columns=['timestamp', 'count', 'uniques']) | ||||||
| df = df.rename(columns={'count': 'clones', 'uniques': 'unique_cloners'}) | ||||||
| LOGGER.info(f'Retrieved {len(df)} clones for {repo}.') | ||||||
| return df | ||||||
|
|
||||||
| def generate_timeframe(cls, traffic_data): | ||||||
| """Generates a timeframe DataFrame with the start and end timestamps from traffic data. | ||||||
|
|
||||||
| Args: | ||||||
| traffic_data (dict[str, pd.DataFrame]): | ||||||
| Dictionary containing traffic data, including "Traffic Visitors" and | ||||||
| "Traffic Git Clones". | ||||||
|
|
||||||
| Returns: | ||||||
| pd.DataFrame: | ||||||
| A DataFrame with a single row containing 'Start Date' and 'End Date'. | ||||||
| """ | ||||||
| start_date = None | ||||||
| end_date = None | ||||||
| all_timestamps = [] | ||||||
|
|
||||||
| if 'Traffic Visitors' in traffic_data and not traffic_data['Traffic Visitors'].empty: | ||||||
| all_timestamps.extend(traffic_data['Traffic Visitors']['timestamp'].tolist()) | ||||||
|
|
||||||
| if 'Traffic Git Clones' in traffic_data and not traffic_data['Traffic Git Clones'].empty: | ||||||
| all_timestamps.extend(traffic_data['Traffic Git Clones']['timestamp'].tolist()) | ||||||
|
|
||||||
| if all_timestamps: | ||||||
| start_date = min(all_timestamps) | ||||||
| end_date = max(all_timestamps) | ||||||
|
|
||||||
| return pd.DataFrame({'Start Date': [start_date], 'End Date': [end_date]}) | ||||||
|
|
||||||
| def get_all_traffic(self, repo: str) -> dict[str, pd.DataFrame]: | ||||||
| """Fetches all available traffic data for the given repository. | ||||||
|
|
||||||
| Args: | ||||||
| repo (str): | ||||||
| The repository in the format "owner/repo". | ||||||
|
|
||||||
| Returns: | ||||||
| dict[str, pd.DataFrame]: | ||||||
| A dictionary containing traffic data: | ||||||
| - `"referrers"`: DataFrame with referrer traffic. | ||||||
| - `"paths"`: DataFrame with popular paths. | ||||||
| - `"views"`: DataFrame with repository views over time. | ||||||
| - `"clones"`: DataFrame with repository clones over time. | ||||||
| """ | ||||||
| traffic_data = { | ||||||
| 'Traffic Referring Sites': self.get_traffic_referrers(repo), | ||||||
| 'Traffic Popular Content': self.get_traffic_paths(repo), | ||||||
| 'Traffic Visitors': self.get_traffic_views(repo), | ||||||
| 'Traffic Git Clones': self.get_traffic_clones(repo), | ||||||
| } | ||||||
| traffic_data['Timeframe'] = self.generate_timeframe(traffic_data) | ||||||
| return traffic_data | ||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.