Skip to content

Conversation

@guptapratykshh
Copy link
Contributor

Description
i have introduced platform column to contributors table, creating source of truth for contributor data origin. this resolvesdata contamination where gitlab data was being stuffed into github specific columns (gh_login, gh_id, etc.) and gives way for flexible multi platform support(github, gitlab, forgejo etc)

i have added platform (default: 'github') and platform_username columns to contributors table via new Alembic migration and also updated extraction logic to populate these new fields. gitlab tasks now correctly tag data as platform='gitlab' and leave gh_ columns as NULL. refactored data_parse.py to use helper function (_extract_base_contributor_data), reducing code duplication and making it easier to add new platforms in the future and also migration automatically populates platform='github' and platform_username for all existing records.

This PR fixes #3469

Signed commits

  • Yes, I signed my commits.

…gic, and refactor extraction (chaoss#3469)

Signed-off-by: guptapratykshh <pratykshgupta9999@gmail.com>
Copy link
Contributor

@MoralCode MoralCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a longer version of this in #3517 (comment)

TL;DR while the issue suggests the fix for this is to put gitlab data in the gitlab columns, I think its actually better for our future plans of adding more platforms if we use this as an opportunity to make the columns where usernames are simply be platform independent. Thats going to be an easier database migration and a better set up for a less platform-dependent future of augur.

Once things are a little more planned out as far as how platform support should work schema-wise (and the PR backlog is lower, and we arent backlogged on planned database migrations that may cause conflict with your migration 39), I think adding the platform column is a good step towards this.

Thanks for making this contribution though!

server_default=text("'github'::character varying"),
nullable=False
)
platform_username = Column(String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is platform username for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform_username is the platform agnostic column intended to serve as unified source of truth for contributor's handle/login, regardless of source.

row with platform='github' stores login in platform_username, row with platform='gitlab' also stores login in platform_username.

this decouples our data model from specific platforms. when we add forgejo or bitbucket later, we wo not need to add forgejo_username columns, we will just use platform='forgejo' and platform_username.

@MoralCode MoralCode marked this pull request as draft January 16, 2026 19:40
@sgoggins sgoggins requested a review from ABrain7710 January 20, 2026 21:36
@sgoggins sgoggins added deployed version Live problems with deployed versions database Related to Augur's unifed data model admin Administrative/housekeeping/community tasks labels Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

admin Administrative/housekeeping/community tasks database Related to Augur's unifed data model deployed version Live problems with deployed versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-contamination between github and gitlab columns in contributors table

3 participants