acl-org
diff --git a/‎data/xml/2022.cogalex.xml‎
Lines changed: 1 addition & 1 deletion b/‎data/xml/2022.cogalex.xml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎data/xml/2023.findings.xml‎
Lines changed: 1 addition & 1 deletion b/‎data/xml/2023.findings.xml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎data/xml/2024.acl.xml‎
Lines changed: 1 addition & 1 deletion b/‎data/xml/2024.acl.xml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎data/xml/2025.acl.xml‎
Lines changed: 25 additions & 25 deletions b/‎data/xml/2025.acl.xml‎
Lines changed: 25 additions & 25 deletions
@@ -95,7 +95,7 @@
     </paper>
     <paper id="8">
       <title>A Frame-Based Model of Inherent Polysemy, Copredication and Argument Coercion</title>
-      <author><first>Chen</first><last>Long</last></author>
+      <author><first>Long</first><last>Chen</last></author>
       <author><first>Laura</first><last>Kallmeyer</last></author>
       <author><first>Rainer</first><last>Osswald</last></author>
       <pages>58–67</pages>
 
@@ -16972,7 +16972,7 @@
       <title>Ask To The Point: Open-Domain Entity-Centric Question Generation</title>
       <author><first>Yuxiang</first><last>Liu</last></author>
       <author><first>Jie</first><last>Huang</last></author>
-      <author><first>Kevin</first><last>Chang</last></author>
+      <author><first>Kevin Chen-Chuan</first><last>Chang</last></author>
       <pages>2703-2716</pages>
       <abstract>We introduce a new task called *entity-centric question generation* (ECQG), motivated by real-world applications such as topic-specific learning, assisted reading, and fact-checking. The task aims to generate questions from an entity perspective. To solve ECQG, we propose a coherent PLM-based framework GenCONE with two novel modules: content focusing and question verification. The content focusing module first identifies a focus as “what to ask” to form draft questions, and the question verification module refines the questions afterwards by verifying the answerability. We also construct a large-scale open-domain dataset from SQuAD to support this task. Our extensive experiments demonstrate that GenCONE significantly and consistently outperforms various baselines, and two modules are effective and complementary in generating high-quality questions.</abstract>
       <url hash="fcd671ff">2023.findings-emnlp.178</url>
 
@@ -13885,7 +13885,7 @@
       <author><first>Takehito</first><last>Utsuro</last><affiliation>University of Tsukuba</affiliation></author>
       <author><first>Masaaki</first><last>Nagata</last><affiliation>NTT Corporation</affiliation></author>
       <pages>51-61</pages>
-      <abstract>Acquiring large-scale parallel corpora is crucial for NLP tasks such asNeural Machine Translation, and web crawling has become a popularmethodology for this purpose. Previous studies have been conductedbased on sentence-based segmentation (SBS) when aligning documents invarious languages which are obtained through web crawling. Among them,the TK-PERT method (Thompson and Koehn, 2020) achieved state-of-the-artresults and addressed the boilerplate text in web crawling data wellthrough a down-weighting approach. However, there remains a problemwith how to handle long-text encoding better. Thus, we introduce thestrategy of Overlapping Fixed-Length Segmentation (OFLS) in place ofSBS, and observe a pronounced enhancement when performing the sameapproach for document alignment. In this paper, we compare the SBS andOFLS using three previous methods, Mean-Pool, TK-PERT (Thompson andKoehn, 2020), and Optimal Transport (Clark et al., 2019; El- Kishky andGuzman, 2020), on the WMT16 document alignment shared task forFrench-English, as well as on our self-established Japanese-Englishdataset MnRN. As a result, for the WMT16 task, various SBS basedmethods showed an increase in recall by 1% to 10% after reproductionwith OFLS. For MnRN data, OFLS demonstrated notable accuracyimprovements and exhibited faster document embedding speed.</abstract>
+      <abstract>Acquiring large-scale parallel corpora is crucial for NLP tasks such as Neural Machine Translation, and web crawling has become a popular methodology for this purpose. Previous studies have been conducted based on sentence-based segmentation (SBS) when aligning documents in various languages which are obtained through web crawling. Among them, the TK-PERT method (Thompson and Koehn, 2020) achieved state-of-the-art results and addressed the boilerplate text in web crawling data well through a down-weighting approach. However, there remains a problem with how to handle long-text encoding better. Thus, we introduce the strategy of Overlapping Fixed-Length Segmentation (OFLS) in place of SBS, and observe a pronounced enhancement when performing the same approach for document alignment. In this paper, we compare the SBS and OFLS using three previous methods, Mean-Pool, TK-PERT (Thompson and Koehn, 2020), and Optimal Transport (Clark et al., 2019; El-Kishky and Guzman, 2020), on the WMT16 document alignment shared task for French-English, as well as on our self-established Japanese-English dataset MnRN. As a result, for the WMT16 task, various SBS based methods showed an increase in recall by 1% to 10% after reproduction with OFLS. For MnRN data, OFLS demonstrated notable accuracy improvements and exhibited faster document embedding speed.</abstract>
       <url hash="84902172">2024.acl-srw.10</url>
       <bibkey>wang-etal-2024-document</bibkey>
       <doi>10.18653/v1/2024.acl-srw.10</doi>
 
@@ -1005,12 +1005,12 @@
     <paper id="69">
       <title>Capturing Author Self Beliefs in Social Media Language</title>
       <author><first>Siddharth</first><last>Mangalik</last></author>
-      <author><first>Adithya</first><last>V Ganesan</last><affiliation>, State University of New York, Stony Brook</affiliation></author>
-      <author><first>Abigail B.</first><last>Wheeler</last><affiliation>University of Pennsylvania, University of Pennsylvania</affiliation></author>
+      <author><first>Adithya</first><last>V. Ganesan</last></author>
+      <author><first>Abigail</first><last>Wheeler</last></author>
       <author><first>Nicholas</first><last>Kerry</last></author>
-      <author><first>Jeremy D. W.</first><last>Clifton</last><affiliation>University of Pennsylvania, University of Pennsylvania</affiliation></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
-      <author><first>Ryan L.</first><last>Boyd</last><affiliation>University of Texas at Dallas</affiliation></author>
+      <author><first>Jeremy D. W.</first><last>Clifton</last></author>
+      <author><first>H. Andrew</first><last>Schwartz</last></author>
+      <author><first>Ryan L.</first><last>Boyd</last></author>
       <pages>1362-1376</pages>
       <abstract>Measuring the prevalence and dimensions of self beliefs is essential for understanding human self-perception and various psychological outcomes. In this paper, we develop a novel task for classifying language that contains explicit or implicit mentions of the author’s self beliefs. We contribute a set of 2,000 human-annotated self beliefs, 100,000 LLM-labeled examples, and 10,000 surveyed self belief paragraphs. We then evaluate several encoder-based classifiers and training routines for this task. Our trained model, SelfAwareNet, achieved an AUC of 0.944, outperforming 0.839 from OpenAI’s state-of-the-art GPT-4o model. Using this model we derive data-driven categories of self beliefs and demonstrate their ability to predict valence, depression, anxiety, and stress. We release the resulting self belief classification model and annotated datasets for use in future research.</abstract>
       <url hash="ef456e0d">2025.acl-long.69</url>
@@ -13586,11 +13586,11 @@
       <title><fixed-case>CULEMO</fixed-case>: Cultural Lenses on Emotion - Benchmarking <fixed-case>LLM</fixed-case>s for Cross-Cultural Emotion Understanding</title>
       <author><first>Tadesse Destaw</first><last>Belay</last></author>
       <author><first>Ahmed Haj</first><last>Ahmed</last></author>
-      <author><first>Alvin C</first><last>Grissom Ii</last><affiliation>Haverford College</affiliation></author>
+      <author><first>Alvin</first><last>Grissom II</last></author>
       <author><first>Iqra</first><last>Ameer</last></author>
-      <author><first>Grigori</first><last>Sidorov</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
-      <author><first>Olga</first><last>Kolesnikova</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
-      <author><first>Seid Muhie</first><last>Yimam</last><affiliation>Universität Hamburg</affiliation></author>
+      <author><first>Grigori</first><last>Sidorov</last></author>
+      <author><first>Olga</first><last>Kolesnikova</last></author>
+      <author><first>Seid Muhie</first><last>Yimam</last></author>
       <pages>18894-18909</pages>
       <abstract>NLP research has increasingly focused on subjective tasks such as emotion analysis. However, existing emotion benchmarks suffer fromtwo major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required fordeeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to potentially unreliable evaluation. To address these issues, we introduce Cultural Lenses on Emotion (CuLEmo), the first benchmark designedto evaluate culture-aware emotion prediction across six languages: Amharic, Arabic, English, German, Hindi, and Spanish. CuLEmocomprises 400 crafted questions per language, each requiring nuanced cultural reasoning and understanding. We use this benchmark to evaluate several state-of-the-art LLMs on culture-aware emotion prediction and sentiment analysis tasks. Our findings reveal that (1) emotion conceptualizations vary significantly across languages and cultures, (2) LLMs performance likewise varies by language and cultural context, and (3) prompting in English with explicit country context often outperforms in-language prompts for culture-aware emotion and sentiment understanding. The dataset and evaluation code is available.</abstract>
       <url hash="00bdf0d8">2025.acl-long.925</url>
@@ -16014,18 +16014,18 @@
     <paper id="1098">
       <title><fixed-case>W</fixed-case>hi<fixed-case>SPA</fixed-case>: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning</title>
       <author><first>Rajath</first><last>Rao</last></author>
-      <author><first>Adithya</first><last>V Ganesan</last><affiliation>, State University of New York, Stony Brook</affiliation></author>
-      <author><first>Oscar</first><last>Kjell</last><affiliation>Stony Brook University and Lund University</affiliation></author>
+      <author><first>Adithya</first><last>V Ganesan</last></author>
+      <author><first>Oscar</first><last>Kjell</last></author>
       <author><first>Jonah</first><last>Luby</last></author>
       <author><first>Akshay</first><last>Raghavan</last></author>
-      <author><first>Scott M.</first><last>Feltman</last><affiliation>State University of New York at Stony Brook</affiliation></author>
-      <author><first>Whitney</first><last>Ringwald</last><affiliation>University of Minnesota - Twin Cities</affiliation></author>
-      <author><first>Ryan L.</first><last>Boyd</last><affiliation>University of Texas at Dallas</affiliation></author>
-      <author><first>Benjamin J.</first><last>Luft</last><affiliation>State University of New York at Stony Brook</affiliation></author>
-      <author><first>Camilo J.</first><last>Ruggero</last><affiliation>University of Texas at Dallas</affiliation></author>
-      <author><first>Neville</first><last>Ryant</last><affiliation>Linguistic Data Consortium</affiliation></author>
-      <author><first>Roman</first><last>Kotov</last><affiliation>State University of New York at Stony Brook</affiliation></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>Scott</first><last>Feltman</last></author>
+      <author><first>Whitney</first><last>Ringwald</last></author>
+      <author><first>Ryan L.</first><last>Boyd</last></author>
+      <author><first>Benjamin</first><last>Luft</last></author>
+      <author><first>Camilo</first><last>Ruggero</last></author>
+      <author><first>Neville</first><last>Ryant</last></author>
+      <author><first>Roman</first><last>Kotov</last></author>
+      <author><first>H. Andrew</first><last>Schwartz</last></author>
       <pages>22529-22544</pages>
       <abstract>Current speech encoding pipelines often rely on an additional text-based LM to get robust representations of human communication, even though SotA speech-to-text models often have a LM within. This work proposes an approach to improve the LM within an audio model such that the subsequent text-LM is unnecessary. We introduce **WhiSPA** (**Whi**sper with **S**emantic and **P**sychological **A**lignment), which leverages a novel audio training objective: contrastive loss with a language model embedding as a teacher. Using over 500k speech segments from mental health audio interviews, we evaluate the utility of aligning Whisper’s latent space with semantic representations from a text autoencoder (SBERT) and lexically derived embeddings of basic psychological dimensions: emotion and personality. Over self-supervised affective tasks and downstream psychological tasks, WhiSPA surpasses current speech encoders, achieving an average error reduction of 73.4% and 83.8%, respectively. WhiSPA demonstrates that it is not always necessary to run a subsequent text LM on speech-to-text output in order to get a rich psychological representation of human communication.</abstract>
       <url hash="41f7e9c9">2025.acl-long.1098</url>
@@ -26497,13 +26497,13 @@
     </paper>
     <paper id="84">
       <title><fixed-case>G</fixed-case>er<fixed-case>M</fixed-case>ed<fixed-case>IQ</fixed-case>: A Resource for Simulated and Synthesized Anamnesis Interview Responses in <fixed-case>G</fixed-case>erman</title>
-      <author><first>Justin</first><last>Hofenbitzer</last><affiliation>Technische Universität München</affiliation></author>
+      <author><first>Justin</first><last>Hofenbitzer</last></author>
       <author><first>Sebastian</first><last>Schöning</last></author>
-      <author><first>Belle</first><last>Sebastian</last><affiliation>Die Universitätsmedizin Mannheim (UMM), Ruprecht-Karls-Universität Heidelberg</affiliation></author>
-      <author><first>Jacqueline</first><last>Lammert</last><affiliation>Technical University of Munich and Technical University of Munich</affiliation></author>
-      <author><first>Luise</first><last>Modersohn</last><affiliation>Technische Universität München and Friedrich-Schiller Universität Jena</affiliation></author>
-      <author><first>Martin</first><last>Boeker</last><affiliation>Technische Universität München</affiliation></author>
-      <author><first>Diego</first><last>Frassinelli</last><affiliation>Ludwig-Maximilians-Universität München</affiliation></author>
+      <author><first>Sebastian</first><last>Belle</last></author>
+      <author><first>Jacqueline</first><last>Lammert</last></author>
+      <author><first>Luise</first><last>Modersohn</last></author>
+      <author><first>Martin</first><last>Boeker</last></author>
+      <author><first>Diego</first><last>Frassinelli</last></author>
       <pages>1064-1078</pages>
       <abstract>Due to strict privacy regulations, text corpora in non-English clinical contexts are scarce. Consequently, synthetic data generation using Large Language Models (LLMs) emerges as a promising strategy to address this data gap. To evaluate the ability of LLMs in generating synthetic data, we applied them to our novel German Medical Interview Questions Corpus (GerMedIQ), which consists of 4,524 unique, simulated question-response pairs in German. We augmented our corpus by prompting 18 different LLMs to generate responses to the same questions. Structural and semantic evaluations of the generated responses revealed that large-sized language models produced responses comparable to those provided by humans. Additionally, an LLM-as-a-judge study, combined with a human baseline experiment assessing response acceptability, demonstrated that human raters preferred the responses generated by Mistral (124B) over those produced by humans. Nonetheless, our findings indicate that using LLMs for data augmentation in non-English clinical contexts requires caution.</abstract>
       <url hash="c99e372a">2025.acl-srw.84</url>