Skip to content

feat: contributor metric#3213

Open
officialasishkumar wants to merge 10 commits intoaugurlabs:mainfrom
officialasishkumar:d0-contributor-metric
Open

feat: contributor metric#3213
officialasishkumar wants to merge 10 commits intoaugurlabs:mainfrom
officialasishkumar:d0-contributor-metric

Conversation

@officialasishkumar
Copy link
Copy Markdown

@officialasishkumar officialasishkumar commented Jul 6, 2025

Description

This PR introduces contributor metric feature, that records and stores the contributors information that are in different level to the database using a background process with the help of celery.

  • d0: first fork or star/watch event.
  • d1: first issue opened, first PR opened, first PR comment (within a configurable window).
  • d2: longer-term flags and counts—PR merge, >5 issues opened, total comments, PRs with >3 commits, comments on >2 distinct PRs.

The final table includes:

  • PK engagement_id, FKs repo_id, cntrb_id
  • Contributor identity (username, full_name, country, platform)
  • d0 flags and timestamp; d1 timestamps; d2 booleans and comment count
  • Metadata columns (tool_source, tool_version, data_source, data_collection_date)

Indexes are on repo_id, cntrb_id, username, and platform.

This PR fixes #2992

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

@officialasishkumar officialasishkumar changed the title D0 contributor metric feat: contributor metric Jul 6, 2025
@officialasishkumar officialasishkumar marked this pull request as ready for review July 14, 2025 18:58
@MoralCode
Copy link
Copy Markdown
Collaborator

Hello @officialasishkumar, thanks for this contribution (and apologies for the delay in getting around to this).

We've made some fairly substantial changes to the repo (notably using uv for python dependency management). Could you rebase this PR on top of the current main branch? This should also fix a lot of the linter warnings.

@officialasishkumar
Copy link
Copy Markdown
Author

Sure @MoralCode

Will do by the EOD

@MoralCode
Copy link
Copy Markdown
Collaborator

@officialasishkumar Rebasing would probably be a better way to update this PR so that the diffs are easier to review because they only contain the changes you made, rather than also containing code from other peoples unrelated PRs that already exist on the main branch.

Are you familiar with the process of rebasing in git? Happy to provide guidance if you would like

Akshatb2006 and others added 7 commits July 18, 2025 23:50
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
@Akshatb2006 Akshatb2006 force-pushed the d0-contributor-metric branch from 4f525e2 to 91a2bf8 Compare July 18, 2025 18:21
@Akshatb2006
Copy link
Copy Markdown

Hey @MoralCode
Could you please review this once??

@sgoggins
Copy link
Copy Markdown
Collaborator

FYI -- @MoralCode is out of office until early next week.

repo = relationship("Repo") No newline at end of file
repo = relationship("Repo")

class ContributorEngagement(Base):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@officialasishkumar : New database objects should be in a file in {repo root}/augur/application/schema/alembic/versions

I think with the PR open for the other GSOC team the next number in sequence is 35.

That enables alembic upgrades and downgrade.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgoggins updated with commit 299bd90

Co-authored-by: Akshat <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Copy link
Copy Markdown
Collaborator

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for confirmation from @Ulincsys and @ABrain7710 ... but I think we don't modify the maindatabase creation script, and we do modify the version script ... so a person creating a new install gets all the old tables and your new ones last.

repo = relationship("Repo")

class ContributorEngagement(Base):
__tablename__ = "contributor_engagement"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABrain7710 / @Ulincsys : Can you confirm that its our practice not to modify the main script for table creation, but to have the versioning script also included so that new installs just get "all the upgrades"?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way i've done it in the past for other projects is that both the main schema gets modified AND migrations get created. Then if someone creates a new database, they get the latest schema (this has required a small bit of code when augur detects a new DB and creates the tables to stamp it with the current alembic version). Then that database can be upgraded as time goes on, but new dbs are always starting out on the latest version

@sgoggins sgoggins added the add-feature Adds new features label Oct 16, 2025
@MoralCode MoralCode added the database Related to Augur's unifed data model label Oct 16, 2025
Resolve conflict in augur/application/db/models/augur_data.py by keeping
both the ContributorEngagement model (from this branch) and the
TopicModelMeta/TopicModelEvent models (from upstream).
Upstream main now has alembic versions 35-38 (topic_model_meta,
topic_model_event, sync migrations, historical_repo_urls). Renumber
the contributor_engagement migration from 35 to 39 so it chains
correctly after the latest upstream migration (38).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

add-feature Adds new features database Related to Augur's unifed data model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conversion Rate

4 participants