Skip to content

Conversation

@pvk-developer
Copy link
Member

Resolves #5
CU-86b40hf4k

Workflow: https://github.com/datacebo/github-analytics/actions/runs/13764032556
The workflow failed because the token doesn't have access to the repo data. I'm not sure if we have to add it to the organization or something (it is the [email protected] account.

Results are stored in here and organized in folders, one per each sdv-dev repo. Then filenames are a timestamp. Here is SDV example

@pvk-developer pvk-developer self-assigned this Mar 10, 2025
run: |
python -m pip install --upgrade pip
python -m pip install invoke
python -m pip install -e .[dev]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python -m pip install -e .[dev]
python -m pip install .[dev]

repo (str):
The repository in the format "owner/repo".
endpoint (str):
The traffic API endpoint (e.g., "popular/referrers" or "popular/paths").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's list the values this could be.

Suggested change
The traffic API endpoint (e.g., "popular/referrers" or "popular/paths").
The traffic API endpoint (e.g., "popular/referrers", "popular/paths", "views", "clones").

ouptut_folder (str):
Folder in which the metrics will be stored.
"""
timestamp = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to save the timeframe for the data as another sheet? So the start and end timestamp (last 2 weeks)?

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me!

Comment on lines 140 to 141
if parent_folder.startswith('gdrive://'):
parent_folder = parent_folder.replace('gdrive://', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can we make 'gdrive://' a constant?

Copy link
Member Author

@pvk-developer pvk-developer Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was updated 614e4b7

@pvk-developer
Copy link
Member Author

Addressed @gsheni feedback, here are the results: https://docs.google.com/spreadsheets/d/1ggFsHydE7csoL95qJHRtR2LXQMTotYi07SkoH3lEpmE/edit?usp=sharing
Also added @amontanez24 's feedback in this same commit: a9bd174

@pvk-developer pvk-developer force-pushed the issue-5-save-traffic-from-sdv-repos branch from 444938b to 614e4b7 Compare March 11, 2025 12:46
Copy link
Collaborator

@gsheni gsheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will approve after renaming for readability is addressed. I want us to match the headers/title seen on the GitHub page:
Screenshot 2025-03-11 at 10 09 39 AM

"""
LOGGER.info(f'Fetching traffic referrers for {repo}.')
data = self._get_traffic_data(repo, 'popular/referrers')
df = pd.DataFrame(data, columns=['referrer', 'count', 'uniques'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's match what we see on GitHub. These titles will make it easier for Kalyan and non-engineering people to understand the data.
Screenshot 2025-03-11 at 10 03 25 AM

Suggested change
df = pd.DataFrame(data, columns=['referrer', 'count', 'uniques'])
df = pd.DataFrame(data, columns=['site', 'views', 'unique_visitors'])

"""
LOGGER.info(f'Fetching traffic paths for {repo}.')
data = self._get_traffic_data(repo, 'popular/paths')
df = pd.DataFrame(data, columns=['path', 'title', 'count', 'uniques'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df = pd.DataFrame(data, columns=['path', 'title', 'count', 'uniques'])
df = pd.DataFrame(data, columns=['content', 'title', 'views', 'unique_visitors'])

- `uniques`: Number of unique visitors.
"""
data = self._get_traffic_data(repo, 'views')
return pd.DataFrame(data['views'], columns=['timestamp', 'count', 'uniques'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return pd.DataFrame(data['views'], columns=['timestamp', 'count', 'uniques'])
return pd.DataFrame(data['views'], columns=['timestamp', 'views', 'unique_visitors'])

- `uniques`: Number of unique cloners.
"""
data = self._get_traffic_data(repo, 'clones')
return pd.DataFrame(data['clones'], columns=['timestamp', 'count', 'uniques'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return pd.DataFrame(data['clones'], columns=['timestamp', 'count', 'uniques'])
return pd.DataFrame(data['clones'], columns=['timestamp', 'clones', 'unique_cloners'])

Comment on lines 178 to 181
'Traffic Referrers': self.get_traffic_referrers(repo),
'Traffic Paths': self.get_traffic_paths(repo),
'Traffic Views': self.get_traffic_views(repo),
'Traffic Clones': self.get_traffic_clones(repo),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'Traffic Referrers': self.get_traffic_referrers(repo),
'Traffic Paths': self.get_traffic_paths(repo),
'Traffic Views': self.get_traffic_views(repo),
'Traffic Clones': self.get_traffic_clones(repo),
'Traffic Referring Sites': self.get_traffic_referrers(repo),
'Traffic Popular Content': self.get_traffic_paths(repo),
'Traffic Visitors': self.get_traffic_views(repo),
'Traffic Git Clones': self.get_traffic_clones(repo),

@pvk-developer pvk-developer force-pushed the issue-5-save-traffic-from-sdv-repos branch from 584c7ec to 22f04b4 Compare March 11, 2025 15:20
@pvk-developer
Copy link
Member Author

Will approve after renaming for readability is addressed. I want us to match the headers/title seen on the GitHub page: Screenshot 2025-03-11 at 10 09 39 AM

New format.

@pvk-developer pvk-developer merged commit 653234d into main Mar 11, 2025
1 check passed
@pvk-developer pvk-developer deleted the issue-5-save-traffic-from-sdv-repos branch March 11, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save traffic data from SDV repos

4 participants