-
Notifications
You must be signed in to change notification settings - Fork 180
Labels
area/corearea/researcharea/user-experiencehelp wantedExtra attention is neededExtra attention is neededpriority/P0Critical / Must-HaveCritical / Must-Have
Milestone
Description
Is your feature request related to a problem? Please describe.
The existing modernBERT based PII and Prompt Guard classification models are trained using English dataset (although the PII training scripts support AI4Privacy, a collection of multilingual dataset.)
Most recently, Qwen3 Guard published the methodology leading to multilingual dataset generation and model training. There are quite some helpful steps to take to build datasets for the classification models.
Describe the solution you'd like
- Identify processes to generate multilingual datasets
- Train and evaluate classification
Copilot
Metadata
Metadata
Assignees
Labels
area/corearea/researcharea/user-experiencehelp wantedExtra attention is neededExtra attention is neededpriority/P0Critical / Must-HaveCritical / Must-Have