Exploring anonymization approach for GlossAPI GSoC idea #85
Unanswered
raja-jaloka
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Mentors ((@nikostsekos @myrsiniioannou @jimmmyss),
Hi, I’m Yuvraj, a student at KIIT University, India, pursuing a B.Tech in CSE and exploring the ML-assisted Anonymization Layer idea for GSoC’26. I’ve experimented with a small prototype(https://github.com/raja-jaloka/Hugging-Face-NER-Anonymizers-Regex-etc-GSOC-26-Practice-) combining:
1.Regex-based masking for structured identifiers (email, phone)
2.NER-based masking for semantic entities (names, orgs, locations)
I am also optimizing my prototype with GlINER, also exploring and working on my prototype with models like XLM-R,mT5,ByT5 to choose which one i should integrate finally for the project, it would be wonderful if you can point to where i should proceed?
The idea is to separate deterministic masking from ML-driven recognition before integrating into a dataset processing pipeline.
Currently I’m trying to understand how anonymization is envisioned within GlossAPI plus would it be more suitable as a standalone preprocessing stage or integrated directly into the existing pipeline?
Much appreciated thanks..
Beta Was this translation helpful? Give feedback.
All reactions