Skip to content

Support Multilingual PII and Prompt Guard filter #215

@rootfs

Description

@rootfs

Is your feature request related to a problem? Please describe.
The existing modernBERT based PII and Prompt Guard classification models are trained using English dataset (although the PII training scripts support AI4Privacy, a collection of multilingual dataset.)

Most recently, Qwen3 Guard published the methodology leading to multilingual dataset generation and model training. There are quite some helpful steps to take to build datasets for the classification models.

Describe the solution you'd like

  • Identify processes to generate multilingual datasets
  • Train and evaluate classification

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions