Group project — UCD Connected Politics Lab
Alp Dikmen · Kun Dong · Mohamed Moheeb · Moises A. Silva Servin
This project investigates whether the SDG topics that countries emphasise in their UN General Debate (UNGD) speeches align with how they implement the SDGs in practice. Using a fine-tuned BERT classifier, we scored diplomatic speeches against all 17 UN Sustainable Development Goals and compared those scores against real-world implementation data.
Research question: Which SDG topics are mentioned in UNGD speeches, and how do the topics mentioned differ from the SDGs prioritised in national implementation?
Existing SDG classifiers had a critical limitation: the widely-used BERT-based baseline only covered SDGs 1–16, with no residual "No SDG" category — causing the model to over-classify almost all content as SDG-relevant.
We addressed this by fine-tuning a new classifier on the OSDG Community Dataset, augmented with a synthetic corpus of non-SDG-related speech segments. The synthetic data gave the model negative examples to learn from, allowing it to distinguish genuine SDG discourse from general diplomatic language.
The final model classifies text into 18 categories: SDGs 1–17 plus a "No SDG" residual class.
Stack: Python · Hugging Face Transformers · PyTorch · Azure ML
| Dataset | Source | Coverage |
|---|---|---|
| UN General Debate Corpus | Baturo, Dasandi & Mikhaylov (2017) | 1946–2024 |
| SDG Implementation Scores | United Nations | 2015–2023 |
| GDP Rankings (PPP-based) | World Bank | 2023 |
├── Base papers/ # Reference literature + SDG keyword dictionary
├── Datasets and code/ # Preprocessing, merging, and modelling scripts
├── UN Corpus/ # UN General Assembly speeches corpus (2008–2023)
└── Methodology.docx # Detailed methodology documentation
Developed as part of the UCD Connected Politics Lab module, 2024–2025.