GSoC 2026 Interest: ML-assisted Anonymization Layer for Greek Datasets #83

GovindhKishore · 2026-02-21T11:21:38Z

GovindhKishore
Feb 21, 2026

Hi everyone!

I’m Govindh Kishore, a Mathematics and Computing student at BIT Mesra. I am very interested in contributing to the ML-assisted Anonymization Layer for GlossAPI.

I specialize in building search and retrieval pipelines that handle complex, unstructured data. My relevant technical background includes:

Advanced Search & Indexing: Built a Semantic Code Search Engine using Python AST for structural analysis.

RAG & NLP Pipelines: Developed an Assessment Recommendation System using Sentence-Transformers for embeddings and a two-stage retrieval process (candidate generation + LLM reranking) served via FastAPI.

Data Engineering: Experienced in handling dirty data and OCR noise from building large-scale web scrapers (BeautifulSoup).

Regarding the project:
Since GlossAPI handles diverse Greek datasets, I believe my experience with PyLucene will be useful for building efficient lookups for rule-based anonymization (e.g., blacklists of sensitive organizations).

Questions for the mentors (@nikostsekos @myrsiniioannou @jimmmyss):

I don't speak Greek, but I am proficient in working with Unicode/UTF-8 and utilizing pre-trained NLP models. Will this be an issue for the ML-assisted part of the project?

Are there any specific issues or "warm-up" tasks in the glossAPI repo related to data preprocessing or regex filtering where I could start contributing?

I’ve explored the repository and would love to start working on a small Proof of Concept (PoC) for the anonymization module.

Best regards,
Govindh Kishore
GitHub Profile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2026 Interest: ML-assisted Anonymization Layer for Greek Datasets #83

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

GSoC 2026 Interest: ML-assisted Anonymization Layer for Greek Datasets #83

Uh oh!

GovindhKishore Feb 21, 2026

Replies: 0 comments

GovindhKishore
Feb 21, 2026