GSoC 2026 Interest: ML-assisted Anonymization Layer for Greek Datasets #83
Closed
GovindhKishore
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone!
I’m Govindh Kishore, a Mathematics and Computing student at BIT Mesra. I am very interested in contributing to the ML-assisted Anonymization Layer for GlossAPI.
I specialize in building search and retrieval pipelines that handle complex, unstructured data. My relevant technical background includes:
Advanced Search & Indexing: Built a Semantic Code Search Engine using Python AST for structural analysis.
RAG & NLP Pipelines: Developed an Assessment Recommendation System using Sentence-Transformers for embeddings and a two-stage retrieval process (candidate generation + LLM reranking) served via FastAPI.
Data Engineering: Experienced in handling dirty data and OCR noise from building large-scale web scrapers (BeautifulSoup).
Regarding the project:
Since GlossAPI handles diverse Greek datasets, I believe my experience with PyLucene will be useful for building efficient lookups for rule-based anonymization (e.g., blacklists of sensitive organizations).
Questions for the mentors (@nikostsekos @myrsiniioannou @jimmmyss):
I don't speak Greek, but I am proficient in working with Unicode/UTF-8 and utilizing pre-trained NLP models. Will this be an issue for the ML-assisted part of the project?
Are there any specific issues or "warm-up" tasks in the glossAPI repo related to data preprocessing or regex filtering where I could start contributing?
I’ve explored the repository and would love to start working on a small Proof of Concept (PoC) for the anonymization module.
Best regards,
Govindh Kishore
GitHub Profile
Beta Was this translation helpful? Give feedback.
All reactions