Weaponization of Migration Flows (WOM) Classification

Ever wondered how transit states turn migrant flows into a diplomatic bargaining chip? In this project, we develop and compare two NLP pipelines to automatically detect when political leaders threaten to “open the borders” or criticize destination countries’ migration policies—a tactic we term the Weaponization of Migration Flows (WOM).

Project Overview

I focus on all English‐language speeches by Turkish President Erdoğan (2014–2025), web‐scraped from the official presidency website. After manually annotating a balanced sample, I train:

Logistic Regression + TF–IDF (with spaCy/NLTK preprocessing & SMOTE)
Fine-tuned DistilBERT (with Hugging Face Transformers)

and evaluate both on a held-out test set as well as via 5-fold cross-validation.

Goals

Quantify WOM by detecting “threat” or “criticism” rhetoric in presidential speeches
Compare classical vs. transformer-based models
Demonstrate how contextual embeddings improve detection recall (critical for catching true threats)
Provide a reusable pipeline for analyzing WOM in other transit-migration contexts

Methods

Data Acquisition

Web-scraped 242 speeches → 3,976 paragraphs
Filtered out “visit”/“phone call” notices & salutations

Annotation

Hand-label 472 random paragraphs → only 16 positive WOM
Zero-shot classification (facebook/bart-large-mnli) → hand-verify → 338 WOM-positive + 500 WOM-negative

Baseline Model

Preprocessing: spaCy tokenization & lemmatization + NLTK stop-word removal + custom NER tokens
Features: TfidfVectorizer (grid-searched n-grams & min_df)
Balancing: SMOTE
Classifier: LogisticRegression

Advanced Model

Preprocessing: spaCy/NLTK pipeline → BERT tokenization
Model: Fine-tune distilbert-base-uncased for 3 epochs
Library: Hugging Face Transformers

Evaluation

Hold-out (20% data): Accuracy, Precision, Recall, F₁
5-Fold CV: mean ± std accuracy

Key Results

Model	Accuracy	Precision	Recall	F₁
Logistic Regression with TF–IDF	0.7143	0.7143	0.7224	0.7117
Advanced DistilBERT Model	0.8095	0.7195	0.8676	0.7867

Recall Boost: +14.5 pts (72.2 → 86.8) reduces missed threats
F₁ Improvement: +7.5 pts (71.2 → 78.7) for balanced detection
5-Fold CV (DistilBERT): 84.3 % ± 2.9 % accuracy (stable performance)

Repository Structure

sp2025-migration/
├── notebooks/
│   ├── wom_data_webscraping.ipynb
│   ├── wom_zeroshot_classification.ipynb
│   └── wom_nlp_model.ipynb
├── models/
│   └── wom_nlp_model.ipynb
├── README.Rmd            
└── report/
    └── report.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
nlp		nlp
notebooks		notebooks
report		report
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
credentials.json		credentials.json
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
web.png		web.png
wom_presentation.pptx		wom_presentation.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weaponization of Migration Flows (WOM) Classification

Project Overview

Goals

Methods

Data Acquisition

Annotation

Baseline Model

Advanced Model

Evaluation

Key Results

Repository Structure

Further Reading

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Weaponization of Migration Flows (WOM) Classification

Project Overview

Goals

Methods

Data Acquisition

Annotation

Baseline Model

Advanced Model

Evaluation

Key Results

Repository Structure

Further Reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages