Skip to content

repo for "An Invasive Embedding Model in Favor of Low-Resource Languages Understanding" (2025)

Notifications You must be signed in to change notification settings

saedeht/Cross-Lingual-NLU_Invasive-Embedding-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitHub Repo DOI

An Invasive Embedding Model in Favor of Low-Resource Languages Understanding

πŸ“Œ If you use this work, please cite our paper as follows:

@article{tahery2025,
  author    = {Saedeh Tahery and Saeed Farzi},
  title     = {An Invasive Embedding Model in Favor of Low-Resource Languages Understanding},
  journal   = {ACM Transactions on Asian and Low-Resource Language Information Processing},
  year      = {2025},
  url       = {https://dl.acm.org/doi/10.1145/3771926}
}

ACM Author-Izer Service: Download Paper (PDF)

πŸ—ΊοΈ Overview

Cross-lingual natural language understanding (NLU) tasks, such as intent detection (ID) and slot filling (SF), suffer from performance degradation due to language-specific information embedded in multilingual pre-trained models. This is especially problematic in data-scarce scenarios.
We propose an encoder-decoder model with adversarial learning that eliminates language-specific information while preserving semantic meaning to tackle this issue. Our approach enhances knowledge transferability across languages, leading to better zero-shot cross-lingual performance.

πŸ‹οΈβ€β™‚οΈ Training and Data Utilization

The training process consists of a strategic adversarial learning phase, where three key components interact dynamically:

  1. Generator β†’ Creates language-independent contextual representations.
  2. Discriminator β†’ Evaluates representations to detect language identity.
  3. Decoder β†’ Reconstructs the original input, ensuring semantic preservation.

πŸ“Š Main Results

Our model demonstrates strong zero-shot performance across multiple languages on Facebook-multilingual (XTOD) and Persian-ATIS datasets:

Language ID Accuracy SF F1-Score
Spanish 94.15 70.44
Thai 82.61 17.60
Persian 86.45 59.60

About

repo for "An Invasive Embedding Model in Favor of Low-Resource Languages Understanding" (2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published