From eb485bbdab7e717602603d0cd23e4f71da1817d2 Mon Sep 17 00:00:00 2001
From: David Stap
Date: Tue, 11 Feb 2025 13:25:13 +0100
Subject: [PATCH] added videos for papers presented at NAACL24 + plenaries
---
data/xml/2023.tacl.xml | 1 +
data/xml/2024.americasnlp.xml | 6 +
data/xml/2024.bea.xml | 55 +++
data/xml/2024.cl.xml | 1 +
data/xml/2024.clinicalnlp.xml | 24 ++
data/xml/2024.findings.xml | 216 ++++++++++++
data/xml/2024.hcinlp.xml | 6 +
data/xml/2024.insights.xml | 17 +
data/xml/2024.naacl.xml | 636 ++++++++++++++++++++++++++++++++++
data/xml/2024.semeval.xml | 47 +++
data/xml/2024.sigmorphon.xml | 4 +
data/xml/2024.starsem.xml | 11 +
data/xml/2024.tacl.xml | 8 +
data/xml/2024.trustnlp.xml | 1 +
data/xml/2024.vardial.xml | 2 +
data/xml/2024.woah.xml | 1 +
16 files changed, 1036 insertions(+)
diff --git a/data/xml/2023.tacl.xml b/data/xml/2023.tacl.xml
index 5154e6037f..80d5f5ad8d 100644
--- a/data/xml/2023.tacl.xml
+++ b/data/xml/2023.tacl.xml
@@ -850,6 +850,7 @@
1114–1131
2023.tacl-1.63
zhang-etal-2023-miracl
+
DMDD: A Large-Scale Dataset for Dataset Mentions Detection
diff --git a/data/xml/2024.americasnlp.xml b/data/xml/2024.americasnlp.xml
index c93569a866..a50e65a8ad 100644
--- a/data/xml/2024.americasnlp.xml
+++ b/data/xml/2024.americasnlp.xml
@@ -46,6 +46,7 @@
2024.americasnlp-1.2.SupplementaryMaterial.zip
prieto-etal-2024-translation
10.18653/v1/2024.americasnlp-1.2
+
Word-level prediction in Plains Cree: First steps
@@ -56,6 +57,7 @@
2024.americasnlp-1.3
kriukova-arppe-2024-word
10.18653/v1/2024.americasnlp-1.3
+
Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology
@@ -68,6 +70,7 @@
10.18653/v1/2024.americasnlp-1.4
Compressed PDF version.
+
Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
@@ -116,6 +119,7 @@
2024.americasnlp-1.8
karson-coto-solano-2024-morphological
10.18653/v1/2024.americasnlp-1.8
+
LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages
@@ -129,6 +133,7 @@
2024.americasnlp-1.9.SupplementaryMaterial.zip
coleman-etal-2024-llm
10.18653/v1/2024.americasnlp-1.9
+
A Concise Survey of OCR for Low-Resource Languages
@@ -139,6 +144,7 @@
2024.americasnlp-1.10
agarwal-anastasopoulos-2024-concise
10.18653/v1/2024.americasnlp-1.10
+
Unlocking Knowledge with OCR-Driven Document Digitization for Peruvian Indigenous Languages
diff --git a/data/xml/2024.bea.xml b/data/xml/2024.bea.xml
index 6ada6f412b..cc576145f5 100644
--- a/data/xml/2024.bea.xml
+++ b/data/xml/2024.bea.xml
@@ -31,6 +31,7 @@
The creation of pedagogically effective questions is a challenge for teachers and requires significant time and meticulous planning, especially in resource-constrained economies. For example, in India, assessments for social science in high schools are characterized by rote memorization without regard to higher-order skill levels. Automated educational question generation (AEQG) using large language models (LLMs) has the potential to help teachers develop assessments at scale. However, it is important to evaluate the quality and relevance of these questions. In this study, we examine the ability of different LLMs (Falcon 40B, Llama2 70B, Palm 2, GPT 3.5, and GPT 4) to generate relevant and high-quality questions of different cognitive levels, as defined by Bloom’s taxonomy. We prompt each model with the same instructions and different contexts to generate 510 questions in the social science curriculum of a state educational board in India. Two human experts used a nine-item rubric to assess linguistic correctness, pedagogical relevance and quality, and adherence to Bloom’s skill levels. Our results showed that 91.56% of the LLM-generated questions were relevant and of high quality. This suggests that LLMs can generate relevant and high-quality questions at different cognitive levels, making them useful for creating assessments for scaling education in resource-constrained economies.
2024.bea-1.1
scaria-etal-2024-good
+
Synthetic Data Generation for Low-resource Grammatical Error Correction with Tagged Corruption Models
@@ -40,6 +41,7 @@
Tagged corruption models provide precise control over the introduction of grammatical errors into clean text. This capability has made them a powerful tool for generating pre-training data for grammatical error correction (GEC) in English. In this work, we demonstrate their application to four languages with substantially fewer GEC resources than English: German, Romanian, Russian, and Spanish. We release a new tagged-corruption dataset consisting of 2.5M examples per language that was generated by a fine-tuned PaLM 2 foundation model. Pre-training on tagged corruptions yields consistent gains across all four languages, especially for small model sizes and languages with limited human-labelled data.
2024.bea-1.2
stahlberg-kumar-2024-synthetic
+
Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models
@@ -53,6 +55,7 @@
In this paper, we carry out experimental research on Grammatical Error Correction, delving into the nuances of single-model systems, comparing the efficiency of ensembling and ranking methods, and exploring the application of large language models to GEC as single-model systems, as parts of ensembles, and as ranking methods. We set new state-of-the-art records with F_0.5 scores of 72.8 on CoNLL-2014-test and 81.4 on BEA-test, respectively. To support further advancements in GEC and ensure the reproducibility of our research, we make our code, trained models, and systems’ outputs publicly available, facilitating future findings.
2024.bea-1.3
omelianchuk-etal-2024-pillars
+
Using Adaptive Empathetic Responses for Teaching English
@@ -64,6 +67,7 @@
Existing English-teaching chatbots rarely incorporate empathy explicitly in their feedback, but empathetic feedback could help keep students engaged and reduce learner anxiety. Toward this end, we propose the task of negative emotion detection via audio, for recognizing empathetic feedback opportunities in language learning. We then build the first spoken English-teaching chatbot with adaptive, empathetic feedback. This feedback is synthesized through automatic prompt optimization of ChatGPT and is evaluated with English learners. We demonstrate the effectiveness of our system through a preliminary user study.
2024.bea-1.4
siyan-etal-2024-using
+
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
@@ -75,6 +79,7 @@
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation success reliably. However, current Static metrics for text difficulty, like the Flesch-Kincaid Reading Ease score, are known to be crude and brittle. We, therefore, introduce and evaluate a new set of Prompt-based metrics for text difficulty. Based on a user study, we create Prompt-based metrics as inputs for LLMs. They leverage LLM’s general language understanding capabilities to capture more abstract and complex features than Static metrics. Regression experiments show that adding our Prompt-based metrics significantly improves text difficulty classification over Static metrics alone. Our results demonstrate the promise of using LLMs to evaluate text adaptation to different education levels.
2024.bea-1.5
rooein-etal-2024-beyond
+
Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction
@@ -85,6 +90,7 @@
Large Language Models (LLMs) have been reported to outperform existing automatic evaluation metrics in some tasks, such as text summarization and machine translation. However, there has been a lack of research on LLMs as evaluators in grammatical error correction (GEC). In this study, we investigate the performance of LLMs in GEC evaluation by employing prompts designed to incorporate various evaluation criteria inspired by previous research. Our extensive experimental results demonstrate that GPT-4 achieved Kendall’s rank correlation of 0.662 with human judgments, surpassing all existing methods. Furthermore, in recent GEC evaluations, we have underscored the significance of the LLMs scale and particularly emphasized the importance of fluency among evaluation criteria.
2024.bea-1.6
kobayashi-etal-2024-large
+
Can Language Models Guess Your Identity? Analyzing Demographic Biases in AI Essay Scoring
@@ -94,6 +100,7 @@
Large language models (LLMs) are increasingly used for automated scoring of student essays. However, these models may perpetuate societal biases if not carefully monitored. This study analyzes potential biases in an LLM (XLNet) trained to score persuasive student essays, based on data from the PERSUADE corpus. XLNet achieved strong performance based on quadratic weighted kappa, standardized mean difference, and exact agreement with human scores. Using available metadata, we performed analyses of scoring differences across gender, race/ethnicity, English language learning status, socioeconomic status, and disability status. Automated scores exhibited small magnifications of marginal differences in human scoring, favoring female students over males and White students over Black students. To further probe potential biases, we found that separate XLNet classifiers and XLNet hidden states weakly predicted demographic membership. Overall, results reinforce the need for continued fairness analyses as use of LLMs expands in education.
2024.bea-1.7
kwako-ormerod-2024-language
+
Automated Scoring of Clinical Patient Notes: Findings From the Kaggle Competition and Their Translation into Practice
@@ -107,6 +114,7 @@
Scoring clinical patient notes (PNs) written by medical students is a necessary but resource-intensive task in medical education. This paper describes the organization and key lessons from a Kaggle competition on automated scoring of such notes. 1,471 teams took part in the competition and developed an extensive, publicly available code repository of varying solutions evaluated over the first public dataset for this task. The most successful approaches from this community effort are described and utilized in the development of a PN scoring system. We discuss the choice of models and system architecture with a view to operational use and scalability, and evaluate its performance on both the public Kaggle data (10 clinical cases, 43,985 PNs) and an extended internal dataset (178 clinical cases, 6,940 PNs). The results show that the system significantly outperforms a state-of-the-art existing tool for PN scoring and that task-adaptive pretraining using masked language modeling can be an effective approach even for small training samples.
2024.bea-1.8
yaneva-etal-2024-automated
+
A World CLASSE Student Summary Corpus
@@ -118,6 +126,7 @@
This paper introduces the Common Lit Augmented Student Summary Evaluation (CLASSE) corpus. The corpus comprises 11,213 summaries written over six prompts by students in grades 3-12 while using the CommonLit website. Each summary was scored by expert human raters on analytic features related to main points, details, organization, voice, paraphrasing, and language beyond the source text. The human scores were aggregated into two component scores related to content and wording. The final corpus was the focus of a Kaggle competition hosted in late 2022 and completed in 2023 in which over 2,000 teams participated. The paper includes a baseline scoring model for the corpus based on a Large Language Model (Longformer model). The paper also provides an overview of the winning models from the Kaggle competition.
2024.bea-1.9
crossley-etal-2024-world
+
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
@@ -127,6 +136,7 @@
The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem by asking incremental questions. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Also, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized LLama 2-7B model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.
2024.bea-1.10
ashok-kumar-lan-2024-improving
+
Scoring with Confidence? – Exploring High-confidence Scoring for Saving Manual Grading Effort
@@ -139,6 +149,7 @@
A possible way to save manual grading effort in short answer scoring is to automatically score answers for which the classifier is highly confident. We explore the feasibility of this approach in a high-stakes exam setting, evaluating three different similarity-based scoring methods, where the similarity score is a direct proxy for model confidence. The decision on an appropriate level of confidence should ideally be made before scoring a new prompt. We thus probe to what extent confidence thresholds are consistent across different datasets and prompts. We find that high-confidence thresholds vary on a prompt-to-prompt basis, and that the overall potential of increased performance at a reasonable cost of additional manual effort is limited.
2024.bea-1.11
bexte-etal-2024-scoring
+
Predicting Initial Essay Quality Scores to Increase the Efficiency of Comparative Judgment Assessments
@@ -151,6 +162,7 @@
Comparative judgment (CJ) is a method that can be used to assess the writing quality of student essays based on repeated pairwise comparisons by multiple assessors. Although the assessment method is known to have high validity and reliability, it can be particularly inefficient, as assessors must make many judgments before the scores become reliable. Prior research has investigated methods to improve the efficiency of CJ, yet these methods introduce additional challenges, notably stemming from the initial lack of information at the start of the assessment, which is known as a cold-start problem. This paper reports on a study in which we predict the initial quality scores of essays to establish a warm start for CJ. To achieve this, we construct informative prior distributions for the quality scores based on the predicted initial quality scores. Through simulation studies, we demonstrate that our approach increases the efficiency of CJ: On average, assessors need to make 30% fewer judgments for each essay to reach an overall reliability level of 0.70.
2024.bea-1.12
de-vrindt-etal-2024-predicting
+
Improving Transfer Learning for Early Forecasting of Academic Performance by Contextualizing Language Models
@@ -161,6 +173,7 @@
This paper presents a cutting-edge method that harnesses contextualized language models (LMs) to significantly enhance the prediction of early academic performance in STEM fields. Our approach uniquely tackles the challenge of transfer learning with limited-domain data. Specifically, we overcome this challenge by contextualizing students’ cognitive trajectory data through the integration of both distal background factors (comprising academic information, demographic details, and socioeconomic indicators) and proximal non-cognitive factors (such as emotional engagement). By tapping into the rich prior knowledge encoded within pre-trained LMs, we effectively reframe academic performance forecasting as a task ideally suited for natural language processing.Our research rigorously examines three key aspects: the impact of data contextualization on prediction improvement, the effectiveness of our approach compared to traditional numeric-based models, and the influence of LM capacity on prediction accuracy. The results underscore the significant advantages of utilizing larger LMs with contextualized inputs, representing a notable advancement in the precision of early performance forecasts. These findings emphasize the importance of employing contextualized LMs to enhance artificial intelligence-driven educational support systems and overcome data scarcity challenges.
2024.bea-1.13
hayat-etal-2024-improving
+
Can GPT-4 do L2 analytic assessment?
@@ -172,6 +185,7 @@
Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of large language models presents new opportunities for automating the evaluation of specific aspects of L2 writing proficiency. In this paper, we perform a series of experiments using GPT-4 in a zero-shot fashion on a publicly available dataset annotated with holistic scores based on the Common European Framework of Reference and aim to extract detailed information about their underlying analytic components. We observe significant correlations between the automatically predicted analytic scores and multiple features associated with the individual proficiency components.
2024.bea-1.14
banno-etal-2024-gpt
+
Using Program Repair as a Proxy for Language Models’ Feedback Ability in Programming Education
@@ -182,6 +196,7 @@
One of the key challenges in programming education is being able to provide high-quality feedback to learners. Such feedback often includes explanations of the issues in students’ programs coupled with suggestions on how to fix these issues. Large language models (LLMs) have recently emerged as valuable tools that can help in this effort. In this article, we explore the relationship between the program repair ability of LLMs and their proficiency in providing natural language explanations of coding mistakes. We outline a benchmarking study that evaluates leading LLMs (including open-source ones) on program repair and explanation tasks. Our experiments study the capabilities of LLMs both on a course level and on a programming concept level, allowing us to assess whether the programming concepts practised in exercises with faulty student programs relate to the performance of the models. Our results highlight that LLMs proficient in repairing student programs tend to provide more complete and accurate natural language explanations of code issues. Overall, these results enhance our understanding of the role and capabilities of LLMs in programming education. Using program repair as a proxy for explanation evaluation opens the door for cost-effective assessment methods.
2024.bea-1.15
koutcheme-etal-2024-using
+
Automated Evaluation of Teacher Encouragement of Student-to-Student Interactions in a Simulated Classroom Discussion
@@ -192,6 +207,7 @@
Leading students to engage in argumentation-focused discussions is a challenge for elementary school teachers, as doing so requires facilitating group discussions with student-to-student interaction. The Mystery Powder (MP) Task was designed to be used in online simulated classrooms to develop teachers’ skill in facilitating small group science discussions. In order to provide timely and scaleable feedback to teachers facilitating a discussion in the simulated classroom, we employ a hybrid modeling approach that successfully combines fine-tuned large language models with features capturing important elements of the discourse dynamic to evaluate MP discussion transcripts. To our knowledge, this is the first application of a hybrid model to automate evaluation of teacher discourse.
2024.bea-1.16
ilagan-etal-2024-automated
+
Explainable AI in Language Learning: Linking Empirical Evidence and Theoretical Concepts in Proficiency and Readability Modeling of Portuguese
@@ -202,6 +218,7 @@
While machine learning methods have supported significantly improved results in education research, a common deficiency lies in the explainability of the result. Explainable AI (XAI) aims to fill that gap by providing transparent, conceptually understandable explanations for the classification decisions, enhancing human comprehension and trust in the outcomes. This paper explores an XAI approach to proficiency and readability assessment employing a comprehensive set of 465 linguistic complexity measures. We identify theoretical descriptions associating such measures with varying levels of proficiency and readability and validate them using cross-corpus experiments employing supervised machine learning and Shapley Additive Explanations. The results not only highlight the utility of a diverse set of complexity measures in effectively modeling proficiency and readability in Portuguese, achieving a state-of-the-art accuracy of 0.70 in the proficiency classification task and of 0.84 in the readability classification task, but they largely corroborate the theoretical research assumptions, especially in the lexical domain.
2024.bea-1.17
ribeiro-flucht-etal-2024-explainable
+
Fairness in Automated Essay Scoring: A Comparative Analysis of Algorithms on German Learner Essays from Secondary Education
@@ -214,6 +231,7 @@
Pursuing educational equity, particularly in writing instruction, requires that all students receive fair (i.e., accurate and unbiased) assessment and feedback on their texts. Automated Essay Scoring (AES) algorithms have so far focused on optimizing the mean accuracy of their scores and paid less attention to fair scores for all subgroups, although research shows that students receive unfair scores on their essays in relation to demographic variables, which in turn are related to their writing competence. We add to the literature arguing that AES should also optimize for fairness by presenting insights on the fairness of scoring algorithms on a corpus of learner texts in the German language and introduce the novelty of examining fairness on psychological and demographic differences in addition to demographic differences. We compare shallow learning, deep learning, and large language models with full and skewed subsets of training data to investigate what is needed for fair scoring. The results show that training on a skewed subset of higher and lower cognitive ability students shows no bias but very low accuracy for students outside the training set. Our results highlight the need for specific training data on all relevant user groups, not only for demographic background variables but also for cognitive abilities as psychological student characteristics.
2024.bea-1.18
schaller-etal-2024-fairness
+
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank
@@ -226,6 +244,7 @@
Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a novel method to enhance the quality of generated distractors through overgenerate-and-rank, training a ranking model to predict how likely distractors are to be selected by real students. Experimental results on a real-world dataset and human evaluation with math teachers show that our ranking model increases alignment with human-authored distractors, although human-authored ones are still preferred over generated ones.
2024.bea-1.19
scarlatos-etal-2024-improving
+
Identifying Fairness Issues in Automatically Generated Testing Content
@@ -241,6 +260,7 @@
stowe-2024-identifying
This revision corrects the count of samples for the dataset in two places, from 620 to 601.
+
Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond
@@ -254,6 +274,7 @@
Natural language processing (NLP) technology has rapidly improved automated grammatical error correction (GEC) tasks, and the GEC community has begun to explore document-level revision. However, there are two major obstacles to going beyond automated sentence-level GEC to NLP-based document-level revision support: (1) there are few public corpora with document-level revisions annotated by professional editors, and (2) it is infeasible to obtain all possible references and evaluate revision quality using such references because there are infinite revision possibilities. To address these challenges, this paper proposes a new document revision corpus, Text Revision of ACL papers (TETRA), in which multiple professional editors have revised academic papers sampled from the ACL anthology. This corpus enables us to focus on document-level and paragraph-level edits, such as edits related to coherence and consistency. Additionally, as a case study using the TETRA corpus, we investigate reference-less and interpretable methods for meta-evaluation to detect quality improvements according to document revisions. We show the uniqueness of TETRA compared with existing document revision corpora and demonstrate that a fine-tuned pre-trained language model can discriminate the quality of documents after revision even when the difference is subtle.
2024.bea-1.21
mita-etal-2024-towards
+
Evaluating Vocabulary Usage in LLMs
@@ -263,6 +284,7 @@
The paper focuses on investigating vocabulary usage for AI and human-generated text. We define vocabulary usage in two ways: structural differences and keyword differences. Structural differences are evaluated by converting text into Vocabulary-Managment Profiles, initially used for discourse analysis. Through VMPs, we can treat the text data as a time series, allowing an evaluation by implementing Dynamic time-warping distance measures and subsequently deriving similarity scores to provide an indication of whether the structural dynamics in AI texts resemble human texts. To analyze keywords, we use a measure that emphasizes frequency and dispersion to source ‘key’ keywords. A qualitative approach is then applied, noting thematic differences between human and AI writing.
2024.bea-1.22
durward-thomson-2024-evaluating
+
Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation
@@ -274,6 +296,7 @@
Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.
2024.bea-1.23
stahl-etal-2024-exploring
+
Towards Fine-Grained Pedagogical Control over English Grammar Complexity in Educational Text Generation
@@ -283,6 +306,7 @@
Teaching foreign languages and fostering language awareness in subject matter teaching requires a profound knowledge of grammar structures. Yet, while Large Language Models can act as tutors, it is unclear how effectively they can control grammar in generated text and adapt to learner needs. In this study, we investigate the ability of these models to exemplify pedagogically relevant grammar patterns, detect instances of grammar in a given text, and constrain text generation to grammar characteristic of a proficiency level. Concretely, we (1) evaluate the ability of GPT3.5 and GPT4 to generate example sentences for the standard English Grammar Profile CEFR taxonomy using few-shot in-context learning, (2) train BERT-based detectors with these generated examples of grammatical patterns, and (3) control the grammatical complexity of text generated by the open Mistral model by ranking sentence candidates with these detectors. We show that the grammar pattern instantiation quality is accurate but too homogeneous, and our classifiers successfully detect these patterns. A GPT-generated dataset of almost 1 million positive and negative examples for the English Grammar Profile is released with this work. With our method, Mistral’s output significantly increases the number of characteristic grammar constructions on the desired level, outperforming GPT4. This showcases how language domain knowledge can enhance Large Language Models for specific education needs, facilitating their effective use for intelligent tutor development and AI-generated materials. Code, models, and data are available at https://github.com/dominikglandorf/LLM-grammar.
2024.bea-1.24
glandorf-meurers-2024-towards
+
LLMs in Short Answer Scoring: Limitations and Promise of Zero-Shot and Few-Shot Approaches
@@ -293,6 +317,7 @@
In this work, we investigate the potential of Large Language Models (LLMs) for automated short answer scoring. We test zero-shot and few-shot settings, and compare with fine-tuned models and a supervised upper-bound, across three diverse datasets. Our results, in zero-shot and few-shot settings, show that LLMs perform poorly in these settings: LLMs have difficulty with tasks that require complex reasoning or domain-specific knowledge. While the models show promise on general knowledge tasks. The fine-tuned model come close to the supervised results but are still not feasible for application, highlighting potential overfitting issues. Overall, our study highlights the challenges and limitations of LLMs in short answer scoring and indicates that there currently seems to be no basis for applying LLMs for short answer scoring.
2024.bea-1.25
chamieh-etal-2024-llms
+
Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory
@@ -303,6 +328,7 @@
This study examines the effect of grammatical features in automatic essay scoring (AES). We use two kinds of grammatical features as input to an AES model: (1) grammatical items that writers used correctly in essays, and (2) the number of grammatical errors. Experimental results show that grammatical features improve the performance of AES models that predict the holistic scores of essays. Multi-task learning with the holistic and grammar scores, alongside using grammatical features, resulted in a larger improvement in model performance. We also show that a model using grammar abilities estimated using Item Response Theory (IRT) as the labels for the auxiliary task achieved comparable performance to when we used grammar scores assigned by human raters. In addition, we weight the grammatical features using IRT to consider the difficulty of grammatical items and writers’ grammar abilities. We found that weighting grammatical features with the difficulty led to further improvement in performance.
2024.bea-1.26
doi-etal-2024-automated
+
Error Tracing in Programming: A Path to Personalised Feedback
@@ -313,6 +339,7 @@
Knowledge tracing, the process of estimating students’ mastery over concepts from their past performance and predicting future outcomes, often relies on binary pass/fail predictions. This hinders the provision of specific feedback by failing to diagnose precise errors. We present an error-tracing model for learning programming that advances traditional knowledge tracing by employing multi-label classification to forecast exact errors students may generate. Through experiments on a real student dataset, we validate our approach and compare it to two baseline knowledge-tracing methods. We demonstrate an improved ability to predict specific errors, for first attempts and for subsequent attempts at individual problems.
2024.bea-1.27
shaka-etal-2024-error
+
Improving Readability Assessment with Ordinal Log-Loss
@@ -322,6 +349,7 @@
Automatic Readability Assessment (ARA) predicts the level of difficulty of a text, e.g. at Grade 1 to Grade 12. ARA is an ordinal classification task since the predicted levels follow an underlying order, from easy to difficult. However, most neural ARA models ignore the distance between the gold level and predicted level, treating all levels as independent labels. This paper investigates whether distance-sensitive loss functions can improve ARA performance. We evaluate a variety of loss functions on neural ARA models, and show that ordinal log-loss can produce statistically significant improvement over the standard cross-entropy loss in terms of adjacent accuracy in a majority of our datasets.
2024.bea-1.28
lim-lee-2024-improving
+
Automated Sentence Generation for a Spaced Repetition Software
@@ -332,6 +360,7 @@
This paper presents and tests AllAI, an app that utilizes state-of-the-art NLP technology to assist second language acquisition through a novel method of sentence-based spaced repetition. Diverging from current single word or fixed sentence repetition, AllAI dynamically combines words due for repetition into sentences, enabling learning words in context while scheduling them independently. This research explores various suitable NLP paradigms and finds a few-shot prompting approach and retrieval of existing sentences from a corpus to yield the best correctness and scheduling accuracy. Subsequently, it evaluates these methods on 26 learners of Danish, finding a four-fold increase in the speed at which new words are learned, compared to conventional spaced repetition. Users of the retrieval method also reported significantly higher enjoyment, hinting at a higher user engagement.
2024.bea-1.29
paddags-etal-2024-automated
+
Using Large Language Models to Assess Young Students’ Writing Revisions
@@ -345,6 +374,7 @@
Although effective revision is the crucial component of writing instruction, few automated writing evaluation (AWE) systems specifically focus on the quality of the revisions students undertake. In this study we investigate the use of a large language model (GPT-4) with Chain-of-Thought (CoT) prompting for assessing the quality of young students’ essay revisions aligned with the automated feedback messages they received. Results indicate that GPT-4 has significant potential for evaluating revision quality, particularly when detailed rubrics are included that describe common revision patterns shown by young writers. However, the addition of CoT prompting did not significantly improve performance. Further examination of GPT-4’s scoring performance across various levels of student writing proficiency revealed variable agreement with human ratings. The implications for improving AWE systems focusing on young students are discussed.
2024.bea-1.30
li-etal-2024-using
+
Automatic Crossword Clues Extraction for Language Learning
@@ -357,6 +387,7 @@
Crosswords are a powerful tool that could be used in educational contexts, but they are not that easy to build. In this work, we present experiments on automatically extracting clues from simple texts that could be used to create crosswords, with the aim of using them in the context of teaching English at the beginner level. We present a series of heuristic patterns based on NLP tools for extracting clues, and use them to create a set of 2209 clues from a collection of 400 simple texts. Human annotators labeled the clues, and this dataset is used to evaluate the performance of our heuristics, and also to create a classifier that predicts if an extracted clue is correct. Our best classifier achieves an accuracy of 84%.
2024.bea-1.31
berruti-etal-2024-automatic
+
Anna Karenina Strikes Again: Pre-Trained LLM Embeddings May Favor High-Performing Learners
@@ -370,6 +401,7 @@
gurin-schleifer-etal-2024-anna
Corrected a typo.
+
Assessing Student Explanations with Large Language Models Using Fine-Tuning and Few-Shot Learning
@@ -383,6 +415,7 @@
The practice of soliciting self-explanations from students is widely recognized for its pedagogical benefits. However, the labor-intensive effort required to manually assess students’ explanations makes it impractical for classroom settings. As a result, many current solutions to gauge students’ understanding during class are often limited to multiple choice or fill-in-the-blank questions, which are less effective at exposing misconceptions or helping students to understand and integrate new concepts. Recent advances in large language models (LLMs) present an opportunity to assess student explanations in real-time, making explanation-based classroom response systems feasible for implementation. In this work, we investigate LLM-based approaches for assessing the correctness of students’ explanations in response to undergraduate computer science questions. We investigate alternative prompting approaches for multiple LLMs (i.e., Llama 2, GPT-3.5, and GPT-4) and compare their performance to FLAN-T5 models trained in a fine-tuning manner. The results suggest that the highest accuracy and weighted F1 score were achieved by fine-tuning FLAN-T5, while an in-context learning approach with GPT-4 attains the highest macro F1 score.
2024.bea-1.33
carpenter-etal-2024-assessing
+
Harnessing GPT to Study Second Language Learner Essays: Can We Use Perplexity to Determine Linguistic Competence?
@@ -393,6 +426,7 @@
Generative language models have been used to study a wide variety of phenomena in NLP. This allows us to better understand the linguistic capabilities of those models and to better analyse the texts that we are working with. However, these studies have mainly focused on text generated by L1 speakers of English. In this paper we study whether linguistic competence of L2 learners of Swedish (through their performance on essay tasks) correlates with the perplexity of a decoder-only model (GPT-SW3). We run two sets of experiments, doing both quantitative and qualitative analyses for each of them. In the first one, we analyse the perplexities of the essays and compare them with the CEFR level of the essays, both from an essay-wide level and from a token level. In our second experiment, we compare the perplexity of an L2 learner essay with a normalised version of it. We find that the perplexity of essays tends to be lower for higher CEFR levels and that normalised essays have a lower perplexity than the original versions. Moreover, we find that different factors can lead to spikes in perplexity, not all of them being related to L2 learner language.
2024.bea-1.34
munoz-sanchez-etal-2024-harnessing
+
BERT-IRT: Accelerating Item Piloting with BERT Embeddings and Explainable IRT Models
@@ -404,6 +438,7 @@
Estimating item parameters (e.g., the difficulty of a question) is an important part of modern high-stakes tests. Conventional methods require lengthy pilots to collect response data from a representative population of test-takers. The need for these pilots limit item bank size and how often those item banks can be refreshed, impacting test security, while increasing costs needed to support the test and taking up the test-taker’s valuable time. Our paper presents a novel explanatory item response theory (IRT) model, BERT-IRT, that has been used on the Duolingo English Test (DET), a high-stakes test of English, to reduce the length of pilots by a factor of 10. Our evaluation shows how the model uses BERT embeddings and engineered NLP features to accelerate item piloting without sacrificing criterion validity or reliability.
2024.bea-1.35
yancey-etal-2024-bert
+
Transfer Learning of Argument Mining in Student Essays
@@ -416,6 +451,7 @@
This paper explores the transferability of a cross-prompt argument mining model trained on argumentative essays authored by native English-speaking learners (EN-L1) across educational contexts and languages. Specifically, the adaptability of a multilingual transformer model is assessed through its application to comparable argumentative essays authored by English-as-a-foreign-language learners (EN-L2) for context transfer, and a dataset composed of essays written by native German learners (DE) for both language and task transfer. To separate language effects from educational context effects, we also perform experiments on a machine-translated version of the German dataset (DE-MT). Our findings demonstrate that, even under zero-shot conditions, a model trained on native English speakers exhibits satisfactory performance on the EN-L2/DE datasets. Machine translation does not substantially enhance this performance, suggesting that distinct writing styles across educational contexts impact performance more than language differences.
2024.bea-1.36
ding-etal-2024-transfer
+
Building Robust Content Scoring Models for Student Explanations of Social Justice Science Issues
@@ -427,6 +463,7 @@
With increased attention to connecting science topics to real-world contexts, like issues of social justice, teachers need support to assess student progress in explaining such issues. In this work, we explore the robustness of NLP-based automatic content scoring models that provide insight into student ability to integrate their science and social justice ideas in two different environmental science contexts. We leverage encoder-only transformer models to capture the degree to which students explain a science phenomenon, understand the intersecting justice issues, and integrate their understanding of science and social justice. We developed models training on data from each of the contexts as well as from a combined dataset. We found that the models developed in one context generate educationally useful scores in the other context. The model trained on the combined dataset performed as well as or better than the models trained on separate datasets in most cases. Quadratic weighted kappas demonstrate that these models are above threshold for use in classrooms.
2024.bea-1.37
bradford-etal-2024-building
+
From Miscue to Evidence of Difficulty: Analysis of Automatically Detected Miscues in Oral Reading for Feedback Potential
@@ -438,6 +475,7 @@
This research is situated in the space between an existing NLP capability and its use(s) in an educational context. We analyze oral reading data collected with a deployed automated speech analysis software and consider how the results of automated speech analysis can be interpreted and used to inform the ideation and design of a new feature – feedback to learners and teachers. Our analysis shows how the details of the system’s performance and the details of the context of use both significantly impact the ideation process.
2024.bea-1.38
beigman-klebanov-etal-2024-miscue
+
Findings from the First Shared Task on Automated Prediction of Difficulty and Response Time for Multiple-Choice Questions
@@ -465,6 +503,7 @@
This paper describes a contribution to the BEA 2024 Shared Task on Automated Prediction of Item Difficulty and Response Time. The participants in this shared task are to develop models for predicting the difficulty and response time of multiple-choice items in the medical field. These items were taken from the United States Medical Licensing Examination® (USMLE®), a high-stakes medical exam. For this purpose, we evaluated multiple BERT-like pre-trained transformer encoder models, which we combined with Scalar Mixing and two custom 2-layer classification heads using learnable Rational Activations as an activation function, each for predicting one of the two variables of interest in a multi-task setup. Our best models placed first out of 43 for predicting item difficulty and fifth out of 34 for predicting Item Response Time.
2024.bea-1.40
gombert-etal-2024-predicting
+
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions
@@ -474,6 +513,7 @@
This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. Our approach is based on augmenting the dataset with answers from zero-shot LLMs (Falcon, Meditron, Mistral) and employing transformer-based models based on six alternative feature combinations. The results suggest that predicting the difficulty of questions is more challenging. Notably, our top performing methods consistently include the question text, and benefit from the variability of LLM answers, highlighting the potential of LLMs for improving automated assessment in medical licensing exams. We make our code available at: https://github.com/ana-rogoz/BEA-2024.
2024.bea-1.41
rogoz-ionescu-2024-unibucllm
+
The British Council submission to the BEA 2024 shared task
@@ -483,6 +523,7 @@
This paper describes our submission to the item difficulty prediction track of the BEA 2024 shared task. Our submission included the output of three systems: 1) a feature-based linear regression model, 2) a RoBERTa-based model and 3) a linear regression ensemble built on the predictions of the two previous models. Our systems ranked 7th, 8th and 5th respectively, demonstrating that simple models can achieve optimal results. A closer look at the results shows that predictions are more accurate for items in the middle of the difficulty range, with no other obvious relationships between difficulty and the accuracy of predictions.
2024.bea-1.42
felice-duran-karaoz-2024-british
+
ITEC at BEA 2024 Shared Task: Predicting Difficulty and Response Time of Medical Exam Questions with Statistical, Machine Learning, and Language Models
@@ -499,6 +540,7 @@
This paper presents the results of our participation in the BEA 2024 shared task on the automated prediction of item difficulty and item response time (APIDIRT), hosted by the NBME (National Board of Medical Examiners). During this task, practice multiple-choice questions from the United States Medical Licensing Examination® (USMLE®) were shared, and research teams were tasked with devising systems capable of predicting the difficulty and average response time for new exam questions.Our team, part of the interdisciplinary itec research group, participated in the task. We extracted linguistic features and clinical embeddings from question items and tested various modeling techniques, including statistical regression, machine learning, language models, and ensemble methods. Surprisingly, simplermodels such as Lasso and random forest regression, utilizing principal component features from linguistic and clinical embeddings, outperformed more complex models. In the competition, our random forest model ranked 4th out of 43 submissions for difficulty prediction, while the Lasso model secured the 2nd position out of 34 submissions for response time prediction. Further analysis suggests that had we submitted the Lasso model for difficulty prediction, we would have achieved an even higher ranking. We also observed that predicting response time is easier than predicting difficulty, with features such as item length, type, exam step, and analytical thinking influencing response time prediction more significantly.
2024.bea-1.43
tack-etal-2024-itec
+
Item Difficulty and Response Time Prediction with Large Language Models: An Empirical Analysis of USMLE Items
@@ -509,6 +551,7 @@
This paper summarizes our methodology and results for the BEA 2024 Shared Task. This competition focused on predicting item difficulty and response time for retired multiple-choice items from the United States Medical Licensing Examination® (USMLE®). We extracted linguistic features from the item stem and response options using multiple methods, including the BiomedBERT model, FastText embeddings, and Coh-Metrix. The extracted features were combined with additional features available in item metadata (e.g., item type) to predict item difficulty and average response time. The results showed that the BiomedBERT model was the most effective in predicting item difficulty, while the fine-tuned model based on FastText word embeddings was the best model for predicting response time.
2024.bea-1.44
bulut-etal-2024-item
+
Utilizing Machine Learning to Predict Question Difficulty and Response Time for Enhanced Test Construction
@@ -518,6 +561,7 @@
In this paper, we present the details of ourcontribution to the BEA Shared Task on Automated Prediction of Item Difficulty and Response Time. Participants in this collaborativeeffort are tasked with developing models to predict the difficulty and response time of multiplechoice items within the medical domain. Theseitems are sourced from the United States Medical Licensing Examination® (USMLE®), asignificant medical assessment. In order toachieve this, we experimented with two featurization techniques, one using lingusitic features and the other using embeddings generated by BERT fine-tuned over MS-MARCOdataset. Further, we tried several different machine learning models such as Linear Regression, Decision Trees, KNN and Boosting models such as XGBoost and GBDT. We found thatout of all the models we experimented withRandom Forest Regressor trained on Linguisticfeatures gave the least root mean squared error.
2024.bea-1.45
fulari-rusert-2024-utilizing
+
Leveraging Physical and Semantic Features of text item for Difficulty and Response Time Prediction of USMLE Questions
@@ -528,6 +572,7 @@
This paper presents our system developed for the Shared Task on Automated Prediction of Item Difficulty and Item Response Time for USMLE questions, organized by the Association for Computational Linguistics (ACL) Special Interest Group for building Educational Applications (BEA SIGEDU). The Shared Task, held as a workshop at the North American Chapter of the Association for Computational Linguistics (NAACL) 2024 conference, aimed to advance the state-of-the-art in predicting item characteristics directly from item text, with implications for the fairness and validity of standardized exams. We compared various methods ranging from BERT for regression to Random forest, Gradient Boosting(GB), Linear Regression, Support Vector Regressor (SVR), k-nearest neighbours (KNN) Regressor, MultiLayer Perceptron(MLP) to custom-ANN using BioBERT and Word2Vec embeddings and provided inferences on which performed better. This paper also explains the importance of data augmentation to balance the data in order to get better results. We also proposed five hypotheses regarding factors impacting difficulty and response time for a question and also verified it thereby helping researchers to derive meaningful numerical attributes for accurate prediction. We achieved a RSME score of 0.315 for Difficulty prediction and 26.945 for Response Time.
2024.bea-1.46
venkata-ravi-ram-etal-2024-leveraging
+
UPN-ICC at BEA 2024 Shared Task: Leveraging LLMs for Multiple-Choice Questions Difficulty Prediction
@@ -559,6 +604,7 @@
This work presents a novel framework for the automated prediction of item difficulty and response time within educational assessments. Utilizing data from the BEA 2024 Shared Task, we integrate Named Entity Recognition, Semantic Role Labeling, and linguistic features to prompt a Large Language Model (LLM). Our best approach achieves an RMSE of 0.308 for item difficulty and 27.474 for response time prediction, improving on the provided baseline. The framework’s adaptability is demonstrated on audio recordings of 3rd-8th graders from the Atlanta, Georgia area responding to the Test of Narrative Language. These results highlight the framework’s potential to enhance test development efficiency.
2024.bea-1.49
veeramani-etal-2024-large
+
UNED team at BEA 2024 Shared Task: Testing different Input Formats for predicting Item Difficulty and Response Time in Medical Exams
@@ -569,6 +615,7 @@
This paper presents the description and primary outcomes of our team’s participation in the BEA 2024 shared task. Our primary exploration involved employing transformer-based systems, particularly BERT models, due to their suitability for Natural Language Processing tasks and efficiency with computational resources. We experimented with various input formats, including concatenating all text elements and incorporating only the clinical case. Surprisingly, our results revealed different impacts on predicting difficulty versus response time, with the former favoring clinical text only and the latter benefiting from including the correct answer. Despite moderate performance in difficulty prediction, our models excelled in response time prediction, ranking highest among all participants. This study lays the groundwork for future investigations into more complex approaches and configurations, aiming to advance the automatic prediction of exam difficulty and response time.
2024.bea-1.50
rodrigo-etal-2024-uned
+
The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline
@@ -598,6 +645,7 @@
We report the findings of the 2024 Multilingual Lexical Simplification Pipeline shared task. We released a new dataset comprising 5,927 instances of lexical complexity prediction and lexical simplification on common contexts across 10 languages, split into trial (300) and test (5,627). 10 teams participated across 2 tracks and 10 languages with 233 runs evaluated across all systems. Five teams participated in all languages for the lexical complexity prediction task and 4 teams participated in all languages for the lexical simplification task. Teams employed a range of strategies, making use of open and closed source large language models for lexical simplification, as well as feature-based approaches for lexical complexity prediction. The highest scoring team on the combined multilingual data was able to obtain a Pearson’s correlation of 0.6241 and an ACC@1@Top1 of 0.3772, both demonstrating that there is still room for improvement on two difficult sub-tasks of the lexical simplification pipeline.
2024.bea-1.51
shardlow-etal-2024-bea
+
TMU-HIT at MLSP 2024: How Well Can GPT-4 Tackle Multilingual Lexical Simplification?
@@ -612,6 +660,7 @@
Lexical simplification (LS) is a process of replacing complex words with simpler alternatives to help readers understand sentences seamlessly. This process is divided into two primary subtasks: assessing word complexities and replacing high-complexity words with simpler alternatives. Employing task-specific supervised data to train models is a prevalent strategy for addressing these subtasks. However, such approach cannot be employed for low-resource languages. Therefore, this paper introduces a multilingual LS pipeline system that does not rely on supervised data. Specifically, we have developed systems based on GPT-4 for each subtask. Our systems demonstrated top-class performance on both tasks in many languages. The results indicate that GPT-4 can effectively assess lexical complexity and simplify complex words in a multilingual context with high quality.
2024.bea-1.52
enomoto-etal-2024-tmu
+
ANU at MLSP-2024: Prompt-based Lexical Simplification for English and Sinhala
@@ -621,6 +670,7 @@
Lexical simplification, the process of simplifying complex content in text without any modifications to the syntactical structure of text, plays a crucial role in enhancing comprehension and accessibility. This paper presents an approach to lexical simplification that relies on the capabilities of generative Artificial Intelligence (AI) models to predict the complexity of words and substitute complex words with simpler alternatives. Early lexical simplification methods predominantly relied on rule-based approaches, transitioning gradually to machine learning and deep learning techniques, leveraging contextual embeddings from large language models. However, the the emergence of generative AI models revolutionized the landscape of natural language processing, including lexical simplification. In this study, we proposed a straightforward yet effective method that employs generative AI models for both predicting lexical complexity and generating appropriate substitutions. To predict lexical complexity, we adopted three distinct types of prompt templates, while for lexical substitution, we employed three prompt templates alongside an ensemble approach. Extending our experimentation to include both English and Sinhala data, our approach demonstrated comparable performance across both languages, with particular strengths in lexical substitution.
2024.bea-1.53
seneviratne-suominen-2024-anu
+
ISEP_Presidency_University at MLSP 2024 Shared Task: Using GPT-3.5 to Generate Substitutes for Lexical Simplification
@@ -631,6 +681,7 @@
Lexical substitute generation is a task where we generate substitutes for a given word to fit in the required context. It is one of the main steps for automatic lexical simplifcation. In this paper, we introduce an automatic lexical simplification system using the GPT-3 large language model. The system generates simplified candidate substitutions for complex words to aid readability and comprehension for the reader. The paper describes the system that we submitted for the Multilingual Lexical Simplification Pipeline Shared Task at the 2024 BEA Workshop. During the shared task, we experimented with Catalan, English, French, Italian, Portuguese, and German for the Lexical Simplification Shared Task. We achieved the best results in Catalan and Portuguese, and were runners-up in English, French and Italian. To further research in this domain, we also release our code upon acceptance of the paper.
2024.bea-1.54
dutilleul-etal-2024-isep
+
Archaeology at MLSP 2024: Machine Translation for Lexical Complexity Prediction and Lexical Simplification
@@ -640,6 +691,7 @@
We present the submissions of team Archaeology for the Lexical Simplification and Lexical Complexity Prediction Shared Tasks at BEA2024. Our approach for this shared task consists in creating two pipelines for generating lexical substitutions and estimating the complexity: one using machine translation texts into English and one using the original language.For the LCP subtask, our xgb regressor is trained with engineered features (based primarily on English language resources) and shallow word structure features. For the LS subtask we use a locally-executed quantized LLM to generate candidates and sort them by complexity score computed using the pipeline designed for LCP.These pipelines provide distinct perspectives on the lexical simplification process, offering insights into the efficacy and limitations of employing Machine Translation versus direct processing on the original language data.
2024.bea-1.55
cristea-nisioi-2024-machine
+
RETUYT-INCO at MLSP 2024: Experiments on Language Simplification using Embeddings, Classifiers and Large Language Models
@@ -656,6 +708,7 @@
In this paper we present the participation of the RETUYT-INCO team at the BEA-MLSP 2024 shared task. We followed different approaches, from Multilayer Perceptron models with word embeddings to Large Language Models fine-tuned on different datasets: already existing, crowd-annotated, and synthetic.Our best models are based on fine-tuning Mistral-7B, either with a manually annotated dataset or with synthetic data.
2024.bea-1.56
sastre-etal-2024-retuyt
+
GMU at MLSP 2024: Multilingual Lexical Simplification with Transformer Models
@@ -666,6 +719,7 @@
This paper presents GMU’s submission to the Multilingual Lexical Simplification Pipeline (MLSP) shared task at the BEA workshop 2024. The task includes Lexical Complexity Prediction (LCP) and Lexical Simplification (LS) sub-tasks across 10 languages. Our submissions achieved rankings ranging from 1st to 5th in LCP and from 1st to 3rd in LS. Our best performing approach for LCP is a weighted ensemble based on Pearson correlation of language specific transformer models trained on all languages combined. For LS, GPT4-turbo zero-shot prompting achieved the best performance.
2024.bea-1.57
goswami-etal-2024-gmu
+
ITEC at MLSP 2024: Transferring Predictions of Lexical Difficulty from Non-Native Readers
@@ -674,6 +728,7 @@
This paper presents the results of our team’s participation in the BEA 2024 shared task on the multilingual lexical simplification pipeline (MLSP; Shardlow et al., 2024). During the task, organizers supplied data that combined two components of the simplification pipeline: lexical complexity prediction and lexical substitution. This dataset encompassed ten languages, including French. Given the absence of dedicated training data, teams were challenged with employing systems trained on pre-existing resources and evaluating their performance on unexplored test data.Our team contributed to the task using previously developed models for predicting lexical difficulty in French (Tack, 2021). These models were built on deep learning architectures, adding to our participation in the CWI 2018 shared task (De Hertog and Tack, 2018). The training dataset comprised 262,054 binary decision annotations, capturing perceived lexical difficulty, collected from a sample of 56 non-native French readers. Two pre-trained neural logistic models were used: (1) a model for predicting difficulty for words within their sentence context, and (2) a model for predicting difficulty for isolated words.The findings revealed that despite being trained for a distinct prediction task (as indicated by a negative R2 fit), transferring the logistic predictions of lexical difficulty to continuous scores of lexical complexity exhibited a positive correlation. Specifically, the results indicated that isolated predictions exhibited a higher correlation (r = .36) compared to contextualized predictions (r = .33). Moreover, isolated predictions demonstrated a remarkably higher Spearman rank correlation (ρ = .50) than contextualized predictions (ρ = .35). These results align with earlier observations by Tack (2021), suggesting that the ground truth primarily captures more lexical access difficulties than word-to-context integration problems.
2024.bea-1.58
tack-2024-itec
+
diff --git a/data/xml/2024.cl.xml b/data/xml/2024.cl.xml
index ac3f64e17c..2a0c153902 100644
--- a/data/xml/2024.cl.xml
+++ b/data/xml/2024.cl.xml
@@ -102,6 +102,7 @@
237–291
2024.cl-1.8
ziems-etal-2024-large
+
Language Model Behavior: A Comprehensive Survey
diff --git a/data/xml/2024.clinicalnlp.xml b/data/xml/2024.clinicalnlp.xml
index df661a3a13..fc068bcce3 100644
--- a/data/xml/2024.clinicalnlp.xml
+++ b/data/xml/2024.clinicalnlp.xml
@@ -29,6 +29,7 @@
2024.clinicalnlp-1.1
chen-hirschberg-2024-exploring
10.18653/v1/2024.clinicalnlp-1.1
+
Efficient Medical Question Answering with Knowledge-Augmented Question Generation
@@ -110,6 +111,7 @@
2024.clinicalnlp-1.8
burdisso-etal-2024-daic
10.18653/v1/2024.clinicalnlp-1.8
+
Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain
@@ -134,6 +136,7 @@
2024.clinicalnlp-1.10
sanchez-carmona-etal-2024-multilevel
10.18653/v1/2024.clinicalnlp-1.10
+
A Privacy-Preserving Corpus for Occupational Health in Spanish: Evaluation for NER and Classification Tasks
@@ -150,6 +153,7 @@
2024.clinicalnlp-1.11
aracena-etal-2024-privacy
10.18653/v1/2024.clinicalnlp-1.11
+
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
@@ -162,6 +166,7 @@
2024.clinicalnlp-1.12
nair-etal-2024-dera
10.18653/v1/2024.clinicalnlp-1.12
+
LlamaMTS: Optimizing Metastasis Detection with Llama Instruction Tuning and BERT-Based Ensemble in Italian Clinical Reports
@@ -190,6 +195,7 @@
2024.clinicalnlp-1.14
boulanger-etal-2024-using
10.18653/v1/2024.clinicalnlp-1.14
+
Large Language Models Provide Human-Level Medical Text Snippet Labeling
@@ -229,6 +235,7 @@
2024.clinicalnlp-1.17
mustafa-etal-2024-leveraging
10.18653/v1/2024.clinicalnlp-1.17
+
Revisiting Clinical Outcome Prediction for MIMIC-IV
@@ -244,6 +251,7 @@
2024.clinicalnlp-1.18
rohr-etal-2024-revisiting
10.18653/v1/2024.clinicalnlp-1.18
+
Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain
@@ -256,6 +264,7 @@
2024.clinicalnlp-1.19
sayin-etal-2024-llms
10.18653/v1/2024.clinicalnlp-1.19
+
Leveraging pre-trained large language models for aphasia detection in English and Chinese speakers
@@ -267,6 +276,7 @@
2024.clinicalnlp-1.20
cong-etal-2024-leveraging
10.18653/v1/2024.clinicalnlp-1.20
+
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
@@ -281,6 +291,7 @@
2024.clinicalnlp-1.21
ha-etal-2024-fusion
10.18653/v1/2024.clinicalnlp-1.21
+
LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
@@ -305,6 +316,7 @@
2024.clinicalnlp-1.23
cai-etal-2024-adapting
10.18653/v1/2024.clinicalnlp-1.23
+
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
@@ -319,6 +331,7 @@
2024.clinicalnlp-1.24
kapadnis-etal-2024-serpent
10.18653/v1/2024.clinicalnlp-1.24
+
ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification
@@ -351,6 +364,7 @@
Acknowledgments update.
10.18653/v1/2024.clinicalnlp-1.26
+
Context Aggregation with Topic-focused Summarization for Personalized Medical Dialogue Generation
@@ -374,6 +388,7 @@
2024.clinicalnlp-1.28
milintsevich-etal-2024-evaluating
10.18653/v1/2024.clinicalnlp-1.28
+
Semi-automatic Construction of a Word Complexity Lexicon for Japanese Medical Terminology
@@ -453,6 +468,7 @@
2024.clinicalnlp-1.35
gundabathula-kolar-2024-promptmind-team
10.18653/v1/2024.clinicalnlp-1.35
+
Maven at MEDIQA-CORR 2024: Leveraging RAG and Medical LLM for Error Detection and Correction in Medical Notes
@@ -478,6 +494,7 @@
2024.clinicalnlp-1.37
haddadan-etal-2024-lailab
10.18653/v1/2024.clinicalnlp-1.37
+
Lexicans at Chemotimelines 2024: Chemotimeline Chronicles - Leveraging Large Language Models (LLMs) for Temporal Relations Extraction in Oncological Electronic Health Records
@@ -584,6 +601,7 @@
2024.clinicalnlp-1.46
pajaro-etal-2024-verbanexai
10.18653/v1/2024.clinicalnlp-1.46
+
HSE NLP Team at MEDIQA-CORR 2024 Task: In-Prompt Ensemble with Entities and Knowledge Graph for Medical Error Correction
@@ -675,6 +693,7 @@
2024.clinicalnlp-1.53
yao-etal-2024-overview
10.18653/v1/2024.clinicalnlp-1.53
+
IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task on the Shoulders of Medical Agents
@@ -734,6 +753,7 @@
2024.clinicalnlp-1.58
zhao-rios-2024-utsa
10.18653/v1/2024.clinicalnlp-1.58
+
WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction
@@ -760,6 +780,7 @@
2024.clinicalnlp-1.60
toma-etal-2024-wanglab-mediqa
10.18653/v1/2024.clinicalnlp-1.60
+
LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs
@@ -785,6 +806,7 @@
2024.clinicalnlp-1.62
lee-etal-2024-overview
10.18653/v1/2024.clinicalnlp-1.62
+
Saama Technologies at EHRSQL 2024: SQL Generation through Classification Answer Selector by LLM
@@ -796,6 +818,7 @@
2024.clinicalnlp-1.63
jabir-etal-2024-saama
10.18653/v1/2024.clinicalnlp-1.63
+
KU-DMIS at EHRSQL 2024 : Generating SQL query via question templatization in EHR
@@ -825,6 +848,7 @@
2024.clinicalnlp-1.65
kim-etal-2024-probgate
10.18653/v1/2024.clinicalnlp-1.65
+
LTRC-IIITH at EHRSQL 2024: Enhancing Reliability of Text-to-SQL Systems through Abstention and Confidence Thresholding
diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml
index e0770479b3..6eb0828b1d 100644
--- a/data/xml/2024.findings.xml
+++ b/data/xml/2024.findings.xml
@@ -2009,6 +2009,7 @@
2024.findings-naacl.1
zhang-etal-2024-structured
10.18653/v1/2024.findings-naacl.1
+
Weight-Inherited Distillation for Task-Agnostic BERT Compression
@@ -2024,6 +2025,7 @@
2024.findings-naacl.2
wu-etal-2024-weight
10.18653/v1/2024.findings-naacl.2
+
Ignore Me But Don’t Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain
@@ -2039,6 +2041,7 @@
2024.findings-naacl.3
jang-etal-2024-ignore
10.18653/v1/2024.findings-naacl.3
+
Extremely efficient online query encoding for dense retrieval
@@ -2050,6 +2053,7 @@
2024.findings-naacl.4
cohen-etal-2024-extremely
10.18653/v1/2024.findings-naacl.4
+
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text
@@ -2066,6 +2070,7 @@
2024.findings-naacl.5
zhao-etal-2024-divknowqa
10.18653/v1/2024.findings-naacl.5
+
SpeedE: Euclidean Geometric Knowledge Graph Embedding Strikes Back
@@ -2076,6 +2081,7 @@
2024.findings-naacl.6
pavlovic-sallinger-2024-speede
10.18653/v1/2024.findings-naacl.6
+
Language Guided Exploration for RL Agents in Text Environments
@@ -2088,6 +2094,7 @@
2024.findings-naacl.7
golchha-etal-2024-language
10.18653/v1/2024.findings-naacl.7
+
GPT-who: An Information Density-based Machine-Generated Text Detector
@@ -2099,6 +2106,7 @@
2024.findings-naacl.8
venkatraman-etal-2024-gpt
10.18653/v1/2024.findings-naacl.8
+
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
@@ -2154,6 +2162,7 @@
2024.findings-naacl.12
li-etal-2024-self
10.18653/v1/2024.findings-naacl.12
+
Low-resource neural machine translation with morphological modeling
@@ -2203,6 +2212,7 @@
2024.findings-naacl.16
wang-etal-2024-leti
10.18653/v1/2024.findings-naacl.16
+
Bilateral Masking with prompt for Knowledge Graph Completion
@@ -2217,6 +2227,7 @@
2024.findings-naacl.17
kong-etal-2024-bilateral
10.18653/v1/2024.findings-naacl.17
+
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
@@ -2233,6 +2244,7 @@
2024.findings-naacl.18
su-etal-2024-mile
10.18653/v1/2024.findings-naacl.18
+
GOLD: Geometry Problem Solver with Natural Language Description
@@ -2254,6 +2266,7 @@
2024.findings-naacl.20
codrut-etal-2024-rodia
10.18653/v1/2024.findings-naacl.20
+
Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks
@@ -2289,6 +2302,7 @@
2024.findings-naacl.23
jiqunchu-lin-2024-incorporating
10.18653/v1/2024.findings-naacl.23
+
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
@@ -2300,6 +2314,7 @@
2024.findings-naacl.24
kuang-etal-2024-openfmnav
10.18653/v1/2024.findings-naacl.24
+
Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency?
@@ -2310,6 +2325,7 @@
2024.findings-naacl.25
brake-schaaf-2024-comparing
10.18653/v1/2024.findings-naacl.25
+
VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder
@@ -2324,6 +2340,7 @@
2024.findings-naacl.26
ma-etal-2024-volta
10.18653/v1/2024.findings-naacl.26
+
EcoSpeak: Cost-Efficient Bias Mitigation for Partially Cross-Lingual Speaker Verification
@@ -2333,6 +2350,7 @@
2024.findings-naacl.27
sharma-2024-ecospeak
10.18653/v1/2024.findings-naacl.27
+
Leveraging Contextual Information for Effective Entity Salience Detection
@@ -2349,6 +2367,7 @@
2024.findings-naacl.28
bhowmik-etal-2024-leveraging
10.18653/v1/2024.findings-naacl.28
+
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?
@@ -2381,6 +2400,7 @@
2024.findings-naacl.30
verhoeven-etal-2024-realistic
10.18653/v1/2024.findings-naacl.30
+
Citation: A Key to Building Responsible and Accountable Large Language Models
@@ -2404,6 +2424,7 @@
2024.findings-naacl.32
zhang-etal-2024-graph
10.18653/v1/2024.findings-naacl.32
+
Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles
@@ -2420,6 +2441,7 @@
2024.findings-naacl.33
tan-etal-2024-narrowing
10.18653/v1/2024.findings-naacl.33
+
Which Modality should I use - Text, Motif, or Image? : Understanding Graphs with Large Language Models
@@ -2432,6 +2454,7 @@
2024.findings-naacl.34
das-etal-2024-modality
10.18653/v1/2024.findings-naacl.34
+
On-the-Fly Fusion of Large Language Models and Machine Translation
@@ -2443,6 +2466,7 @@
2024.findings-naacl.35
hoang-etal-2024-fly
10.18653/v1/2024.findings-naacl.35
+
READ: Improving Relation Extraction from an ADversarial Perspective
@@ -2454,6 +2478,7 @@
2024.findings-naacl.36
li-etal-2024-read
10.18653/v1/2024.findings-naacl.36
+
REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models
@@ -2465,6 +2490,7 @@
2024.findings-naacl.37
ebrahimi-etal-2024-requal
10.18653/v1/2024.findings-naacl.37
+
Addressing Both Statistical and Causal Gender Fairness in NLP Models
@@ -2476,6 +2502,7 @@
2024.findings-naacl.38
chen-etal-2024-addressing
10.18653/v1/2024.findings-naacl.38
+
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
@@ -2494,6 +2521,7 @@
2024.findings-naacl.39
lyu-etal-2024-llm
10.18653/v1/2024.findings-naacl.39
+
A Robust Semantics-based Watermark for Large Language Model against Paraphrasing
@@ -2509,6 +2537,7 @@
2024.findings-naacl.40
ren-etal-2024-robust
10.18653/v1/2024.findings-naacl.40
+
Solving Data-centric Tasks using Large Language Models
@@ -2544,6 +2573,7 @@
2024.findings-naacl.42
guo-etal-2024-novel
10.18653/v1/2024.findings-naacl.42
+
Measuring Social Norms of Large Language Models
@@ -2557,6 +2587,7 @@
2024.findings-naacl.43
yuan-etal-2024-measuring
10.18653/v1/2024.findings-naacl.43
+
Source-Free Unsupervised Domain Adaptation for Question Answering via Prompt-Assisted Self-learning
@@ -2568,6 +2599,7 @@
2024.findings-naacl.44
yin-etal-2024-source
10.18653/v1/2024.findings-naacl.44
+
Hierarchical Attention Graph for Scientific Document Summarization in Global and Local Level
@@ -2590,6 +2622,7 @@
2024.findings-naacl.46
kumar-dusek-2024-leeets
10.18653/v1/2024.findings-naacl.46
+
Efficient Dependency Tree Sampling Without Replacement
@@ -2599,6 +2632,7 @@
2024.findings-naacl.47
dobre-2024-efficient
10.18653/v1/2024.findings-naacl.47
+
Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization
@@ -2612,6 +2646,7 @@
2024.findings-naacl.48
zhang-etal-2024-towards
10.18653/v1/2024.findings-naacl.48
+
GEE! Grammar Error Explanation with Large Language Models
@@ -2625,6 +2660,7 @@
2024.findings-naacl.49
song-etal-2024-gee
10.18653/v1/2024.findings-naacl.49
+
AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback
@@ -2649,6 +2685,7 @@
2024.findings-naacl.51
zeng-etal-2024-divtod
10.18653/v1/2024.findings-naacl.51
+
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
@@ -2690,6 +2727,7 @@
2024.findings-naacl.54
sharma-etal-2024-r
10.18653/v1/2024.findings-naacl.54
+
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
@@ -2700,6 +2738,7 @@
2024.findings-naacl.55
yu-etal-2024-ovm
10.18653/v1/2024.findings-naacl.55
+
The Whole is Better than the Sum: Using Aggregated Demonstrations in In-Context Learning for Sequential Recommendation
@@ -2710,6 +2749,7 @@
2024.findings-naacl.56
wang-lim-2024-whole
10.18653/v1/2024.findings-naacl.56
+
Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA
@@ -2722,6 +2762,7 @@
2024.findings-naacl.57
agarwal-etal-2024-bring
10.18653/v1/2024.findings-naacl.57
+
GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism
@@ -2732,6 +2773,7 @@
2024.findings-naacl.58
yuan-farber-2024-grasame
10.18653/v1/2024.findings-naacl.58
+
Can Public Large Language Models Help Private Cross-device Federated Learning?
@@ -2763,6 +2805,7 @@
2024.findings-naacl.60
pan-etal-2024-langnav
10.18653/v1/2024.findings-naacl.60
+
Planning and Editing What You Retrieve for Enhanced Tool Learning
@@ -2793,6 +2836,7 @@
2024.findings-naacl.62
carbune-etal-2024-chart
10.18653/v1/2024.findings-naacl.62
+
SLiM: Speculative Decoding with Hypothesis Reduction
@@ -2819,6 +2863,7 @@
2024.findings-naacl.64
kachwala-etal-2024-rematch
10.18653/v1/2024.findings-naacl.64
+
Modeling the Sacred: Considerations when Using Religious Texts in Natural Language Processing
@@ -2828,6 +2873,7 @@
2024.findings-naacl.65
hutchinson-2024-modeling
10.18653/v1/2024.findings-naacl.65
+
Testing the Effect of Code Documentation on Large Language Model Code Understanding
@@ -2838,6 +2884,7 @@
2024.findings-naacl.66
macke-doyle-2024-testing
10.18653/v1/2024.findings-naacl.66
+
Aligning Large Language Models with Recommendation Knowledge
@@ -2854,6 +2901,7 @@
2024.findings-naacl.67
cao-etal-2024-aligning
10.18653/v1/2024.findings-naacl.67
+
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
@@ -2865,6 +2913,7 @@
2024.findings-naacl.68
liu-etal-2024-ofa
10.18653/v1/2024.findings-naacl.68
+
SELF-EXPERTISE: Knowledge-based Instruction Dataset Augmentation for a Legal Expert Language Model
@@ -2876,6 +2925,7 @@
2024.findings-naacl.69
kim-etal-2024-self
10.18653/v1/2024.findings-naacl.69
+
Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction
@@ -2894,6 +2944,7 @@
2024.findings-naacl.70
li-etal-2024-evaluating
10.18653/v1/2024.findings-naacl.70
+
EDEntail: An Entailment-based Few-shot Text Classification with Extensional Definition
@@ -2917,6 +2968,7 @@
2024.findings-naacl.72
srivatsa-kochmar-2024-makes
10.18653/v1/2024.findings-naacl.72
+
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
@@ -2930,6 +2982,7 @@
2024.findings-naacl.73
hyun-etal-2024-smile
10.18653/v1/2024.findings-naacl.73
+
T3M: Text Guided 3D Human Motion Synthesis from Speech
@@ -2966,6 +3019,7 @@
2024.findings-naacl.76
prasad-etal-2024-explanation
10.18653/v1/2024.findings-naacl.76
+
Low-Rank Adaptation for Multilingual Summarization: An Empirical Study
@@ -2980,6 +3034,7 @@
2024.findings-naacl.77
whitehouse-etal-2024-low
10.18653/v1/2024.findings-naacl.77
+
A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
@@ -3005,6 +3060,7 @@
2024.findings-naacl.79
muckatira-etal-2024-emergent
10.18653/v1/2024.findings-naacl.79
+
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
@@ -3029,6 +3085,7 @@
2024.findings-naacl.81
zhou-etal-2024-matching
10.18653/v1/2024.findings-naacl.81
+
Instruction Tuning with Human Curriculum
@@ -3040,6 +3097,7 @@
2024.findings-naacl.82
lee-etal-2024-instruction
10.18653/v1/2024.findings-naacl.82
+
Natural Language-based State Representation in Deep Reinforcement Learning
@@ -3050,6 +3108,7 @@
2024.findings-naacl.83
rahman-xue-2024-natural
10.18653/v1/2024.findings-naacl.83
+
Learning Cross-Architecture Instruction Embeddings for Binary Code Analysis in Low-Resource Architectures
@@ -3061,6 +3120,7 @@
2024.findings-naacl.84
wang-etal-2024-learning-cross
10.18653/v1/2024.findings-naacl.84
+
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks
@@ -3074,6 +3134,7 @@
2024.findings-naacl.85
yu-etal-2024-reeval
10.18653/v1/2024.findings-naacl.85
+
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
@@ -3087,6 +3148,7 @@
2024.findings-naacl.86
lo-etal-2024-effective
10.18653/v1/2024.findings-naacl.86
+
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
@@ -3102,6 +3164,7 @@
2024.findings-naacl.87
zheng-etal-2024-gpt
10.18653/v1/2024.findings-naacl.87
+
Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings
@@ -3112,6 +3175,7 @@
2024.findings-naacl.88
patel-domeniconi-2024-subword
10.18653/v1/2024.findings-naacl.88
+
UGIF-DataSet: A New Dataset for Cross-lingual, Cross-modal Sequential actions on the UI
@@ -3123,6 +3187,7 @@
2024.findings-naacl.89
gubbi-venkatesh-etal-2024-ugif
10.18653/v1/2024.findings-naacl.89
+
SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models
@@ -3135,6 +3200,7 @@
2024.findings-naacl.90
hajipour-etal-2024-simscood
10.18653/v1/2024.findings-naacl.90
+
Pruning as a Domain-specific LLM Extractor
@@ -3151,6 +3217,7 @@
2024.findings-naacl.91
zhang-etal-2024-pruning
10.18653/v1/2024.findings-naacl.91
+
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback
@@ -3183,6 +3250,7 @@
2024.findings-naacl.93
xu-etal-2024-noisy
10.18653/v1/2024.findings-naacl.93
+
Composite Backdoor Attacks Against Large Language Models
@@ -3196,6 +3264,7 @@
2024.findings-naacl.94
huang-etal-2024-composite
10.18653/v1/2024.findings-naacl.94
+
Adapting Fake News Detection to the Era of Large Language Models
@@ -3242,6 +3311,7 @@
2024.findings-naacl.97
qin-etal-2024-large
10.18653/v1/2024.findings-naacl.97
+
FedLFC: Towards Efficient Federated Multilingual Modeling with LoRA-based Language Family Clustering
@@ -3255,6 +3325,7 @@
2024.findings-naacl.98
guo-etal-2024-fedlfc
10.18653/v1/2024.findings-naacl.98
+
Gaussian Process Optimization for Adaptable Multi-Objective Text Generation using Linearly-Weighted Language Models
@@ -3286,6 +3357,7 @@
2024.findings-naacl.101
moslemi-zouaq-2024-tagdebias
10.18653/v1/2024.findings-naacl.101
+
Improving Absent Keyphrase Generation with Diversity Heads
@@ -3296,6 +3368,7 @@
2024.findings-naacl.102
thomas-vajjala-2024-improving
10.18653/v1/2024.findings-naacl.102
+
mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models?
@@ -3307,6 +3380,7 @@
2024.findings-naacl.103
hua-etal-2024-mothello
10.18653/v1/2024.findings-naacl.103
+
Discovering and Mitigating Indirect Bias in Attention-Based Model Explanations
@@ -3318,6 +3392,7 @@
2024.findings-naacl.104
haque-etal-2024-discovering
10.18653/v1/2024.findings-naacl.104
+
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
@@ -3357,6 +3432,7 @@
2024.findings-naacl.106
qiu-etal-2024-think
10.18653/v1/2024.findings-naacl.106
+
It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction
@@ -3368,6 +3444,7 @@
2024.findings-naacl.107
chaudhary-etal-2024-relative
10.18653/v1/2024.findings-naacl.107
+
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
@@ -3381,6 +3458,7 @@
2024.findings-naacl.108
khaki-etal-2024-rs
10.18653/v1/2024.findings-naacl.108
+
Hypernetwork-Assisted Parameter-Efficient Fine-Tuning with Meta-Knowledge Distillation for Domain Knowledge Disentanglement
@@ -3394,6 +3472,7 @@
2024.findings-naacl.109
li-etal-2024-hypernetwork
10.18653/v1/2024.findings-naacl.109
+
MICo: Preventative Detoxification of Large Language Models through Inhibition Control
@@ -3411,6 +3490,7 @@
2024.findings-naacl.110
siegelmann-etal-2024-mico
10.18653/v1/2024.findings-naacl.110
+
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
@@ -3425,6 +3505,7 @@
2024.findings-naacl.111
li-etal-2024-reinforcement
10.18653/v1/2024.findings-naacl.111
+
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
@@ -3436,6 +3517,7 @@
2024.findings-naacl.112
chen-etal-2024-comm
10.18653/v1/2024.findings-naacl.112
+
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
@@ -3453,6 +3535,7 @@
2024.findings-naacl.113
ovalle-etal-2024-tokenization
10.18653/v1/2024.findings-naacl.113
+
AdaPT: A Set of Guidelines for Hyperbolic Multimodal Multilingual NLP
@@ -3466,6 +3549,7 @@
2024.findings-naacl.114
sawhney-etal-2024-adapt
10.18653/v1/2024.findings-naacl.114
+
More Samples or More Prompts? Exploring Effective Few-Shot In-Context Learning for LLMs with In-Context Sampling
@@ -3484,6 +3568,7 @@
2024.findings-naacl.115
yao-etal-2024-samples
10.18653/v1/2024.findings-naacl.115
+
ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform
@@ -3499,6 +3584,7 @@
2024.findings-naacl.116
he-etal-2024-zsee
10.18653/v1/2024.findings-naacl.116
+
Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information
@@ -3511,6 +3597,7 @@
2024.findings-naacl.117
chae-etal-2024-mitigating
10.18653/v1/2024.findings-naacl.117
+
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
@@ -3521,6 +3608,7 @@
2024.findings-naacl.118
kim-lee-2024-adversarial
10.18653/v1/2024.findings-naacl.118
+
Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models
@@ -3550,6 +3638,7 @@
2024.findings-naacl.120
wang-etal-2024-dagcn
10.18653/v1/2024.findings-naacl.120
+
Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs
@@ -3560,6 +3649,7 @@
2024.findings-naacl.121
peng-yang-2024-connecting
10.18653/v1/2024.findings-naacl.121
+
Self-Regulated Sample Diversity in Large Language Models
@@ -3587,6 +3677,7 @@
2024.findings-naacl.123
lee-etal-2024-methods
10.18653/v1/2024.findings-naacl.123
+
When Quantization Affects Confidence of Large Language Models?
@@ -3599,6 +3690,7 @@
2024.findings-naacl.124
proskurina-etal-2024-quantization
10.18653/v1/2024.findings-naacl.124
+
MedCycle: Unpaired Medical Report Generation via Cycle-Consistency
@@ -3610,6 +3702,7 @@
2024.findings-naacl.125
hirsch-etal-2024-medcycle
10.18653/v1/2024.findings-naacl.125
+
Beta-LR: Interpretable Logical Reasoning based on Beta Distribution
@@ -3621,6 +3714,7 @@
2024.findings-naacl.126
ma-etal-2024-beta
10.18653/v1/2024.findings-naacl.126
+
Applications of BERT Models Towards Automation of Clinical Coding in Icelandic
@@ -3631,6 +3725,7 @@
2024.findings-naacl.127
hauksson-einarsson-2024-applications
10.18653/v1/2024.findings-naacl.127
+
“Tell me who you are and I tell you how you argue”: Predicting Stances and Arguments for Stakeholder Groups
@@ -3643,6 +3738,7 @@
2024.findings-naacl.128
heinisch-etal-2024-tell
10.18653/v1/2024.findings-naacl.128
+
Psychometric Predictive Power of Large Language Models
@@ -3654,6 +3750,7 @@
2024.findings-naacl.129
kuribayashi-etal-2024-psychometric
10.18653/v1/2024.findings-naacl.129
+
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
@@ -3664,6 +3761,7 @@
2024.findings-naacl.130
pezeshkpour-hruschka-2024-large
10.18653/v1/2024.findings-naacl.130
+
PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck
@@ -3678,6 +3776,7 @@
2024.findings-naacl.131
pham-etal-2024-peeb
10.18653/v1/2024.findings-naacl.131
+
Ethos: Rectifying Language Models in Orthogonal Parameter Space
@@ -3691,6 +3790,7 @@
2024.findings-naacl.132
gao-etal-2024-ethos
10.18653/v1/2024.findings-naacl.132
+
Crafting In-context Examples according to LMs’ Parametric Knowledge
@@ -3703,6 +3803,7 @@
2024.findings-naacl.133
lee-etal-2024-crafting
10.18653/v1/2024.findings-naacl.133
+
ICXML: An In-Context Learning Framework for Zero-Shot Extreme Multi-Label Classification
@@ -3713,6 +3814,7 @@
2024.findings-naacl.134
zhu-zamani-2024-icxml
10.18653/v1/2024.findings-naacl.134
+
CLGSI: A Multimodal Sentiment Analysis Framework based on Contrastive Learning Guided by Sentiment Intensity
@@ -3724,6 +3826,7 @@
2024.findings-naacl.135
yang-etal-2024-clgsi
10.18653/v1/2024.findings-naacl.135
+
Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains
@@ -3735,6 +3838,7 @@
2024.findings-naacl.136
wang-etal-2024-interpreting
10.18653/v1/2024.findings-naacl.136
+
Enhancing Perception: Refining Explanations of News Claims with LLM Conversations
@@ -3761,6 +3865,7 @@
2024.findings-naacl.138
wei-jie-etal-2024-interpretable
10.18653/v1/2024.findings-naacl.138
+
Plug-in Language Model: Controlling Text Generation with a Simple Regression Model
@@ -3772,6 +3877,7 @@
2024.findings-naacl.139
yang-etal-2024-plug
10.18653/v1/2024.findings-naacl.139
+
Signer Diversity-driven Data Augmentation for Signer-Independent Sign Language Translation
@@ -3799,6 +3905,7 @@
2024.findings-naacl.141
meyer-buys-2024-systematic
10.18653/v1/2024.findings-naacl.141
+
Multi-Granularity Guided Fusion-in-Decoder
@@ -3810,6 +3917,7 @@
2024.findings-naacl.142
choi-etal-2024-multi
10.18653/v1/2024.findings-naacl.142
+
Group Fairness in Multilingual Speech Recognition Models
@@ -3861,6 +3969,7 @@
2024.findings-naacl.146
feger-dietze-2024-bertweets
10.18653/v1/2024.findings-naacl.146
+
Testing the limits of logical reasoning in neural and hybrid models
@@ -3872,6 +3981,7 @@
2024.findings-naacl.147
guzman-etal-2024-testing
10.18653/v1/2024.findings-naacl.147
+
METAL: Towards Multilingual Meta-Evaluation
@@ -3885,6 +3995,7 @@
2024.findings-naacl.148
hada-etal-2024-metal
10.18653/v1/2024.findings-naacl.148
+
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
@@ -3920,6 +4031,7 @@
2024.findings-naacl.150
siledar-etal-2024-product
10.18653/v1/2024.findings-naacl.150
+
COMEM: In-Context Retrieval-Augmented Mass-Editing Memory in Large Language Models
@@ -3943,6 +4055,7 @@
2024.findings-naacl.152
tanaka-etal-2024-content
10.18653/v1/2024.findings-naacl.152
+
Denoising Attention for Query-aware User Modeling
@@ -3966,6 +4079,7 @@
2024.findings-naacl.154
zhang-etal-2024-lightweight
10.18653/v1/2024.findings-naacl.154
+
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models
@@ -3977,6 +4091,7 @@
2024.findings-naacl.155
wiland-etal-2024-bear
10.18653/v1/2024.findings-naacl.155
+
Conformal Intent Classification and Clarification for Fast and Accurate Intent Recognition
@@ -3989,6 +4104,7 @@
2024.findings-naacl.156
hengst-etal-2024-conformal
10.18653/v1/2024.findings-naacl.156
+
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions
@@ -4000,6 +4116,7 @@
2024.findings-naacl.157
nyffenegger-etal-2024-anonymity
10.18653/v1/2024.findings-naacl.157
+
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
@@ -4017,6 +4134,7 @@
2024.findings-naacl.158
shin-etal-2024-x
10.18653/v1/2024.findings-naacl.158
+
Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise
@@ -4033,6 +4151,7 @@
Adding Acknowledgements Section.
Minor updates.
+
Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake
@@ -4059,6 +4178,7 @@
2024.findings-naacl.161
yang-etal-2024-identifying
10.18653/v1/2024.findings-naacl.161
+
Self-Adaptive Sampling for Accurate Video Question Answering on Image Text Models
@@ -4071,6 +4191,7 @@
2024.findings-naacl.162
han-etal-2024-self
10.18653/v1/2024.findings-naacl.162
+
Towards an On-device Agent for Text Rewriting
@@ -4089,6 +4210,7 @@
2024.findings-naacl.163
zhu-etal-2024-towards
10.18653/v1/2024.findings-naacl.163
+
Tailoring Vaccine Messaging with Common-Ground Opinions
@@ -4106,6 +4228,7 @@
2024.findings-naacl.164
stureborg-etal-2024-tailoring
10.18653/v1/2024.findings-naacl.164
+
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification
@@ -4119,6 +4242,7 @@
2024.findings-naacl.165
vacareanu-etal-2024-best
10.18653/v1/2024.findings-naacl.165
+
Q-Tuning: Queue-based Prompt Tuning for Lifelong Few-shot Language Learning
@@ -4133,6 +4257,7 @@
2024.findings-naacl.166
guo-etal-2024-q
10.18653/v1/2024.findings-naacl.166
+
In-Context Example Ordering Guided by Label Distributions
@@ -4145,6 +4270,7 @@
2024.findings-naacl.167
xu-etal-2024-context
10.18653/v1/2024.findings-naacl.167
+
Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives
@@ -4169,6 +4295,7 @@
2024.findings-naacl.169
sharma-etal-2024-laying
10.18653/v1/2024.findings-naacl.169
+
UEGP: Unified Expert-Guided Pre-training for Knowledge Rekindle
@@ -4187,6 +4314,7 @@
2024.findings-naacl.170
mou-etal-2024-uegp
10.18653/v1/2024.findings-naacl.170
+
LatticeGen: Hiding Generated Text in a Lattice for Privacy-Aware Large Language Model Generation on Cloud
@@ -4217,6 +4345,7 @@
2024.findings-naacl.172
zheng-etal-2024-hatemoderate
10.18653/v1/2024.findings-naacl.172
+
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
@@ -4232,6 +4361,7 @@
2024.findings-naacl.173
gao-etal-2024-compensate
10.18653/v1/2024.findings-naacl.173
+
Contrastive Preference Learning for Neural Machine Translation
@@ -4246,6 +4376,7 @@
2024.findings-naacl.174
he-etal-2024-contrastive
10.18653/v1/2024.findings-naacl.174
+
SocREval: Large Language Models with the Socratic Method for Reference-free Reasoning Evaluation
@@ -4257,6 +4388,7 @@
2024.findings-naacl.175
he-etal-2024-socreval
10.18653/v1/2024.findings-naacl.175
+
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
@@ -4273,6 +4405,7 @@
2024.findings-naacl.176
zhu-etal-2024-multilingual
10.18653/v1/2024.findings-naacl.176
+
Unleashing the Power of LLMs in Court View Generation by Stimulating Internal Knowledge and Incorporating External Knowledge
@@ -4305,6 +4438,7 @@
2024.findings-naacl.178
guo-etal-2024-prompting
10.18653/v1/2024.findings-naacl.178
+
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
@@ -4333,6 +4467,7 @@
2024.findings-naacl.180
he-etal-2024-uncertainty
10.18653/v1/2024.findings-naacl.180
+
Exploring Language Model’s Code Generation Ability with Auxiliary Functions
@@ -4346,6 +4481,7 @@
2024.findings-naacl.181
lee-etal-2024-exploring
10.18653/v1/2024.findings-naacl.181
+
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
@@ -4361,6 +4497,7 @@
2024.findings-naacl.182
truong-etal-2024-crossing
10.18653/v1/2024.findings-naacl.182
+
GoT: Effective Graph-of-Thought Reasoning in Language Models
@@ -4372,6 +4509,7 @@
2024.findings-naacl.183
yao-etal-2024-got
10.18653/v1/2024.findings-naacl.183
+
Enhancing the General Agent Capabilities of Low-Paramter LLMs through Tuning and Multi-Branch Reasoning
@@ -4400,6 +4538,7 @@
2024.findings-naacl.185
you-etal-2024-mumath
10.18653/v1/2024.findings-naacl.185
+
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization
@@ -4428,6 +4567,7 @@
2024.findings-naacl.187
li-etal-2024-uno
10.18653/v1/2024.findings-naacl.187
+
Evaluating Step-by-Step Reasoning through Symbolic Verification
@@ -4452,6 +4592,7 @@
2024.findings-naacl.189
slobodkin-etal-2024-multi
10.18653/v1/2024.findings-naacl.189
+
Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison
@@ -4463,6 +4604,7 @@
2024.findings-naacl.190
bouthors-etal-2024-retrieving
10.18653/v1/2024.findings-naacl.190
+
Extending Input Contexts of Language Models through Training on Segmented Sequences
@@ -4474,6 +4616,7 @@
2024.findings-naacl.191
karypis-etal-2024-extending
10.18653/v1/2024.findings-naacl.191
+
Reason from Fallacy: Enhancing Large Language Models’ Logical Reasoning through Logical Fallacy Understanding
@@ -4489,6 +4632,7 @@
2024.findings-naacl.192
li-etal-2024-reason
10.18653/v1/2024.findings-naacl.192
+
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models
@@ -4520,6 +4664,7 @@
Typo fixes.
10.18653/v1/2024.findings-naacl.194
+
IruMozhi: Automatically classifying diglossia in Tamil
@@ -4567,6 +4712,7 @@
2024.findings-naacl.197
kang-etal-2024-human
10.18653/v1/2024.findings-naacl.197
+
COMMIT: Code-Mixing English-Centric Large Language Model for Multilingual Instruction Tuning
@@ -4590,6 +4736,7 @@
2024.findings-naacl.199
maekawa-etal-2024-dilm
10.18653/v1/2024.findings-naacl.199
+
MindAgent: Emergent Gaming Interaction
@@ -4608,6 +4755,7 @@
2024.findings-naacl.200
gong-etal-2024-mindagent
10.18653/v1/2024.findings-naacl.200
+
BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues
@@ -4635,6 +4783,7 @@
2024.findings-naacl.202
wang-etal-2024-learning-mutually
10.18653/v1/2024.findings-naacl.202
+
A Novel Two-step Fine-tuning Framework for Transfer Learning in Low-Resource Neural Machine Translation
@@ -4659,6 +4808,7 @@
2024.findings-naacl.204
miao-etal-2024-enhancing
10.18653/v1/2024.findings-naacl.204
+
C^{3}LPGCN:Integrating Contrastive Learning and Cooperative Learning with Prompt into Graph Convolutional Network for Aspect-based Sentiment Analysis
@@ -4670,6 +4820,7 @@
2024.findings-naacl.205
he-etal-2024-c3lpgcn
10.18653/v1/2024.findings-naacl.205
+
Visual Enhanced Entity-Level Interaction Network for Multimodal Summarization
@@ -4683,6 +4834,7 @@
2024.findings-naacl.206
yan-etal-2024-visual
10.18653/v1/2024.findings-naacl.206
+
Knowledgeable In-Context Tuning: Exploring and Exploiting Factual Knowledge for In-Context Learning
@@ -4708,6 +4860,7 @@
2024.findings-naacl.208
drinkall-etal-2024-time
10.18653/v1/2024.findings-naacl.208
+
An End-to-End Submodular Framework for Data-Efficient In-Context Learning
@@ -4720,6 +4873,7 @@
2024.findings-naacl.209
kumari-etal-2024-end
10.18653/v1/2024.findings-naacl.209
+
Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer
@@ -4732,6 +4886,7 @@
2024.findings-naacl.210
kuulmets-etal-2024-teaching
10.18653/v1/2024.findings-naacl.210
+
Simulating Opinion Dynamics with Networks of LLM-based Agents
@@ -4749,6 +4904,7 @@
2024.findings-naacl.211
chuang-etal-2024-simulating
10.18653/v1/2024.findings-naacl.211
+
Probing the Category of Verbal Aspect in Transformer Language Models
@@ -4759,6 +4915,7 @@
2024.findings-naacl.212
katinskaia-yangarber-2024-probing
10.18653/v1/2024.findings-naacl.212
+
A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets
@@ -4772,6 +4929,7 @@
2024.findings-naacl.213
samardzic-etal-2024-measure
10.18653/v1/2024.findings-naacl.213
+
Beyond Read-Only: Crafting a Comprehensive Chinese Text-to-SQL Dataset for Database Manipulation and Query
@@ -4784,6 +4942,7 @@
2024.findings-naacl.214
chen-etal-2024-beyond
10.18653/v1/2024.findings-naacl.214
+
Normalizing without Modernizing: Keeping Historical Wordforms of Middle French while Reducing Spelling Variants
@@ -4807,6 +4966,7 @@
2024.findings-naacl.216
sia-etal-2024-anti
10.18653/v1/2024.findings-naacl.216
+
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
@@ -4822,6 +4982,7 @@
2024.findings-naacl.217
zhao-etal-2024-defending
10.18653/v1/2024.findings-naacl.217
+
Select and Summarize: Scene Saliency for Movie Script Summarization
@@ -4832,6 +4993,7 @@
2024.findings-naacl.218
saxena-keller-2024-select
10.18653/v1/2024.findings-naacl.218
+
Don’t be a Fool: Pooling Strategies in Offensive Language Detection from User-Intended Adversarial Attacks
@@ -4843,6 +5005,7 @@
2024.findings-naacl.219
yu-etal-2024-dont
10.18653/v1/2024.findings-naacl.219
+
Z-GMOT: Zero-shot Generic Multiple Object Tracking
@@ -4860,6 +5023,7 @@
2024.findings-naacl.220
tran-etal-2024-z
10.18653/v1/2024.findings-naacl.220
+
NLP for Counterspeech against Hate: A Survey and How-To Guide
@@ -4872,6 +5036,7 @@
2024.findings-naacl.221
bonaldi-etal-2024-nlp
10.18653/v1/2024.findings-naacl.221
+
PRODIGy: a PROfile-based DIalogue Generation dataset
@@ -4883,6 +5048,7 @@
2024.findings-naacl.222
occhipinti-etal-2024-prodigy
10.18653/v1/2024.findings-naacl.222
+
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models
@@ -4920,6 +5086,7 @@
2024.findings-naacl.225
ramos-etal-2024-paella
10.18653/v1/2024.findings-naacl.225
+
OSCaR: Object State Captioning and State Change Representation
@@ -4934,6 +5101,7 @@
2024.findings-naacl.226
nguyen-etal-2024-oscar
10.18653/v1/2024.findings-naacl.226
+
SumCSE: Summary as a transformation for Contrastive Learning
@@ -4949,6 +5117,7 @@
2024.findings-naacl.227
thirukovalluru-etal-2024-sumcse
10.18653/v1/2024.findings-naacl.227
+
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
@@ -4961,6 +5130,7 @@
2024.findings-naacl.228
guo-etal-2024-curious
10.18653/v1/2024.findings-naacl.228
+
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits
@@ -4975,6 +5145,7 @@
2024.findings-naacl.229
jiang-etal-2024-personallm
10.18653/v1/2024.findings-naacl.229
+
FIRE: A Dataset for Financial Relation Extraction
@@ -4988,6 +5159,7 @@
2024.findings-naacl.230
hamad-etal-2024-fire
10.18653/v1/2024.findings-naacl.230
+
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
@@ -5004,6 +5176,7 @@
2024.findings-naacl.231
deng-etal-2024-musilingo
10.18653/v1/2024.findings-naacl.231
+
Investigating Acceleration of LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with ‘LITE’
@@ -5045,6 +5218,7 @@
2024.findings-naacl.234
tao-etal-2024-webwise
10.18653/v1/2024.findings-naacl.234
+
CodecLM: Aligning Language Models with Tailored Synthetic Data
@@ -5061,6 +5235,7 @@
2024.findings-naacl.235
wang-etal-2024-codeclm
10.18653/v1/2024.findings-naacl.235
+
Prompting Few-shot Multi-hop Question Generation via Comprehending Type-aware Semantics
@@ -5098,6 +5273,7 @@
2024.findings-naacl.238
evuru-etal-2024-coda
10.18653/v1/2024.findings-naacl.238
+
Synonym relations affect object detection learned on vision-language data
@@ -5108,6 +5284,7 @@
2024.findings-naacl.239
nebbia-kovashka-2024-synonym
10.18653/v1/2024.findings-naacl.239
+
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
@@ -5123,6 +5300,7 @@
2024.findings-naacl.240
li-etal-2024-cm
10.18653/v1/2024.findings-naacl.240
+
RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
@@ -5136,6 +5314,7 @@
2024.findings-naacl.241
rafiei-asl-etal-2024-robustsentembed
10.18653/v1/2024.findings-naacl.241
+
Characterizing Human and Zero-Shot GPT-3.5 Object-Similarity Judgments
@@ -5146,6 +5325,7 @@
2024.findings-naacl.242
mcknight-fyshe-2024-characterizing
10.18653/v1/2024.findings-naacl.242
+
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
@@ -5177,6 +5357,7 @@
2024.findings-naacl.244
fang-etal-2024-getting
10.18653/v1/2024.findings-naacl.244
+
MCECR: A Novel Dataset for Multilingual Cross-Document Event Coreference Resolution
@@ -5203,6 +5384,7 @@
2024.findings-naacl.246
zhang-etal-2024-sentiment
10.18653/v1/2024.findings-naacl.246
+
Tokenizer Choice For LLM Training: Negligible or Crucial?
@@ -5232,6 +5414,7 @@
2024.findings-naacl.247
ali-etal-2024-tokenizer
10.18653/v1/2024.findings-naacl.247
+
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
@@ -5244,6 +5427,7 @@
2024.findings-naacl.248
zhou-etal-2024-think
10.18653/v1/2024.findings-naacl.248
+
The Impact of Differential Privacy on Group Disparity Mitigation
@@ -5257,6 +5441,7 @@
2024.findings-naacl.249
hansen-etal-2024-impact
10.18653/v1/2024.findings-naacl.249
+
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning
@@ -5271,6 +5456,7 @@
2024.findings-naacl.250
mhaskar-etal-2024-isometric
10.18653/v1/2024.findings-naacl.250
+
Read between the lines - Functionality Extraction From READMEs
@@ -5282,6 +5468,7 @@
2024.findings-naacl.251
kumar-etal-2024-read
10.18653/v1/2024.findings-naacl.251
+
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
@@ -5298,6 +5485,7 @@
2024.findings-naacl.252
wang-etal-2024-abspyramid
10.18653/v1/2024.findings-naacl.252
+
Few-TK: A Dataset for Few-shot Scientific Typed Keyphrase Recognition
@@ -5311,6 +5499,7 @@
2024.findings-naacl.253
lahiri-etal-2024-tk
10.18653/v1/2024.findings-naacl.253
+
Language Models can be Deductive Solvers
@@ -5326,6 +5515,7 @@
2024.findings-naacl.254
feng-etal-2024-language
10.18653/v1/2024.findings-naacl.254
+
Interpreting User Requests in the Context of Natural Language Standing Instructions
@@ -5379,6 +5569,7 @@
2024.findings-naacl.258
mao-etal-2024-prompt
10.18653/v1/2024.findings-naacl.258
+
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
@@ -5397,6 +5588,7 @@
2024.findings-naacl.259
zhang-etal-2024-natural
10.18653/v1/2024.findings-naacl.259
+
A Study on Scaling Up Multilingual News Framing Analysis
@@ -5407,6 +5599,7 @@
2024.findings-naacl.260
akter-anastasopoulos-2024-study
10.18653/v1/2024.findings-naacl.260
+
ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models
@@ -5419,6 +5612,7 @@
2024.findings-naacl.261
tran-etal-2024-viglue
10.18653/v1/2024.findings-naacl.261
+
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales
@@ -5430,6 +5624,7 @@
2024.findings-naacl.262
resck-etal-2024-exploring
10.18653/v1/2024.findings-naacl.262
+
Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation
@@ -5444,6 +5639,7 @@
2024.findings-naacl.263
su-etal-2024-unlocking
10.18653/v1/2024.findings-naacl.263
+
ADaPT: As-Needed Decomposition and Planning with Language Models
@@ -5469,6 +5665,7 @@
2024.findings-naacl.265
ki-carpuat-2024-guiding
10.18653/v1/2024.findings-naacl.265
+
Non-contrastive sentence representations via self-supervision
@@ -5479,6 +5676,7 @@
2024.findings-naacl.266
pappadopulo-farina-2024-non
10.18653/v1/2024.findings-naacl.266
+
Semantically-Prompted Language Models Improve Visual Descriptions
@@ -5490,6 +5688,7 @@
2024.findings-naacl.267
ogezi-etal-2024-semantically
10.18653/v1/2024.findings-naacl.267
+
GenTKG: Generative Forecasting on Temporal Knowledge Graph with Large Language Models
@@ -5503,6 +5702,7 @@
2024.findings-naacl.268
liao-etal-2024-gentkg
10.18653/v1/2024.findings-naacl.268
+
A Transformer with Stack Attention
@@ -5515,6 +5715,7 @@
2024.findings-naacl.269
li-etal-2024-transformer
10.18653/v1/2024.findings-naacl.269
+
InstructEval: Systematic Evaluation of Instruction Selection Methods
@@ -5528,6 +5729,7 @@
2024.findings-naacl.270
ajith-etal-2024-instructeval
10.18653/v1/2024.findings-naacl.270
+
RecMind: Large Language Model Powered Agent For Recommendation
@@ -5546,6 +5748,7 @@
2024.findings-naacl.271
wang-etal-2024-recmind
10.18653/v1/2024.findings-naacl.271
+
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
@@ -5560,6 +5763,7 @@
2024.findings-naacl.272
gholami-etal-2024-gold
10.18653/v1/2024.findings-naacl.272
+
How Lexical is Bilingual Lexicon Induction?
@@ -5574,6 +5778,7 @@
2024.findings-naacl.273
kohli-etal-2024-lexical
10.18653/v1/2024.findings-naacl.273
+
Fumbling in Babel: An Investigation into ChatGPT’s Language Identification Ability
@@ -5587,6 +5792,7 @@
2024.findings-naacl.274
chen-etal-2024-fumbling
10.18653/v1/2024.findings-naacl.274
+
Targeted Augmentation for Low-Resource Event Extraction
@@ -5597,6 +5803,7 @@
2024.findings-naacl.275
wang-huang-2024-targeted
10.18653/v1/2024.findings-naacl.275
+
Asking More Informative Questions for Grounded Retrieval
@@ -5625,6 +5832,7 @@
2024.findings-naacl.277
tahaei-etal-2024-efficient
10.18653/v1/2024.findings-naacl.277
+
Addressing Healthcare-related Racial and LGBTQ+ Biases in Pretrained Language Models
@@ -5669,6 +5877,7 @@
2024.findings-naacl.280
liu-etal-2024-benchmarking
10.18653/v1/2024.findings-naacl.280
+
NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge
@@ -5683,6 +5892,7 @@
2024.findings-naacl.281
howard-etal-2024-neurocomparatives
10.18653/v1/2024.findings-naacl.281
+
Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
@@ -5708,6 +5918,7 @@
2024.findings-naacl.283
liu-etal-2024-suql
10.18653/v1/2024.findings-naacl.283
+
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering
@@ -5767,6 +5978,7 @@
2024.findings-naacl.287
guo-etal-2024-sgsh
10.18653/v1/2024.findings-naacl.287
+
Biomedical Entity Representation with Graph-Augmented Multi-Objective Transformer
@@ -5779,6 +5991,7 @@
2024.findings-naacl.288
sakhovskiy-etal-2024-biomedical
10.18653/v1/2024.findings-naacl.288
+
Cross-Lingual Summarization with Pseudo-Label Regularization
@@ -5818,6 +6031,7 @@
2024.findings-naacl.291
artemiev-etal-2024-leveraging
10.18653/v1/2024.findings-naacl.291
+
LLaMA-Rider: Spurring Large Language Models to Explore the Open World
@@ -5843,6 +6057,7 @@
2024.findings-naacl.293
park-etal-2024-contrastive
10.18653/v1/2024.findings-naacl.293
+
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
@@ -5857,6 +6072,7 @@
2024.findings-naacl.294
zhu-etal-2024-pollmgraph
10.18653/v1/2024.findings-naacl.294
+
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
diff --git a/data/xml/2024.hcinlp.xml b/data/xml/2024.hcinlp.xml
index 367167e991..bb2f478403 100644
--- a/data/xml/2024.hcinlp.xml
+++ b/data/xml/2024.hcinlp.xml
@@ -32,6 +32,7 @@
2024.hcinlp-1.1
jiao-etal-2024-examining
10.18653/v1/2024.hcinlp-1.1
+
Properties and Challenges of LLM-Generated Explanations
@@ -42,6 +43,7 @@
2024.hcinlp-1.2
kunz-kuhlmann-2024-properties
10.18653/v1/2024.hcinlp-1.2
+
This Reference Does Not Exist: An Exploration of LLM Citation Accuracy and Relevance
@@ -91,6 +93,7 @@
2024.hcinlp-1.6
nigam-etal-2024-interactive
10.18653/v1/2024.hcinlp-1.6
+
Sensemaking of Socially-Mediated Crisis Information
@@ -102,6 +105,7 @@
2024.hcinlp-1.7
koli-etal-2024-sensemaking
10.18653/v1/2024.hcinlp-1.7
+
Blind Spots and Biases: Exploring the Role of Annotator Cognitive Biases in NLP
@@ -112,6 +116,7 @@
2024.hcinlp-1.8
gautam-srinath-2024-blind
10.18653/v1/2024.hcinlp-1.8
+
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations
@@ -126,6 +131,7 @@
2024.hcinlp-1.9
wang-etal-2024-llmcheckup
10.18653/v1/2024.hcinlp-1.9
+
diff --git a/data/xml/2024.insights.xml b/data/xml/2024.insights.xml
index e1249c1e10..f379582b84 100644
--- a/data/xml/2024.insights.xml
+++ b/data/xml/2024.insights.xml
@@ -32,6 +32,7 @@
2024.insights-1.1
ye-etal-2024-mosecrot
10.18653/v1/2024.insights-1.1
+
What explains the success of cross-modal fine-tuning with ORCA?
@@ -45,6 +46,7 @@
2024.insights-1.2
garcia-de-herreros-etal-2024-explains
10.18653/v1/2024.insights-1.2
+
Does Fine-tuning a Classifier Help in Low-budget Scenarios? Not Much
@@ -57,6 +59,7 @@
2024.insights-1.3
gonzalez-gutierrez-etal-2024-fine
10.18653/v1/2024.insights-1.3
+
How Well Can a Genetic Algorithm Fine-tune Transformer Encoders? A First Approach
@@ -68,6 +71,7 @@
2024.insights-1.4
sanchez-carmona-etal-2024-well
10.18653/v1/2024.insights-1.4
+
I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures
@@ -79,6 +83,7 @@
2024.insights-1.5
mickus-etal-2024-attention
10.18653/v1/2024.insights-1.5
+
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
@@ -91,6 +96,7 @@
2024.insights-1.6
bui-etal-2024-knowledge
10.18653/v1/2024.insights-1.6
+
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
@@ -104,6 +110,7 @@
2024.insights-1.7
cognetta-etal-2024-analysis
10.18653/v1/2024.insights-1.7
+
On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization
@@ -118,6 +125,7 @@
2024.insights-1.8
armengol-estape-etal-2024-limits
10.18653/v1/2024.insights-1.8
+
Pointer-Generator Networks for Low-Resource Machine Translation: Don’t Copy That!
@@ -129,6 +137,7 @@
2024.insights-1.9
bafna-etal-2024-pointer
10.18653/v1/2024.insights-1.9
+
Imaginary Numbers! Evaluating Numerical Referring Expressions by Neural End-to-End Surface Realization Systems
@@ -145,6 +154,7 @@
2024.insights-1.10
cunha-etal-2024-imaginary
10.18653/v1/2024.insights-1.10
+
Using Locally Learnt Word Representations for better Textual Anomaly Detection
@@ -155,6 +165,7 @@
2024.insights-1.11
breidenstein-labeau-2024-using
10.18653/v1/2024.insights-1.11
+
Can probing classifiers reveal the learning by contact center large language models?: No, it doesn’t!
@@ -166,6 +177,7 @@
2024.insights-1.12
nathan-etal-2024-probing
10.18653/v1/2024.insights-1.12
+
Can Abstract Meaning Representation Facilitate Fair Legal Judgement Predictions?
@@ -176,6 +188,7 @@
2024.insights-1.13
vijay-hershcovich-2024-abstract
10.18653/v1/2024.insights-1.13
+
WINOVIZ: Probing Visual Properties of Objects Under Different States
@@ -199,6 +212,7 @@
2024.insights-1.15
srivatsa-etal-2024-harnessing
10.18653/v1/2024.insights-1.15
+
The Paradox of Preference: A Study on LLM Alignment Algorithms and Data Acquisition Methods
@@ -210,6 +224,7 @@
2024.insights-1.16
devanathan-etal-2024-paradox
10.18653/v1/2024.insights-1.16
+
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics
@@ -232,6 +247,7 @@
2024.insights-1.18
eichel-schulte-im-walde-2024-multi
10.18653/v1/2024.insights-1.18
+
Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models
@@ -243,6 +259,7 @@
2024.insights-1.19
mohammadshahi-etal-2024-investigating
10.18653/v1/2024.insights-1.19
+
diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml
index eecf724263..cbc2815af8 100644
--- a/data/xml/2024.naacl.xml
+++ b/data/xml/2024.naacl.xml
@@ -28,6 +28,7 @@
2024.naacl-long.1
liu-etal-2024-named
10.18653/v1/2024.naacl-long.1
+
Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation
@@ -51,6 +52,7 @@
2024.naacl-long.3
mehta-goldwasser-2024-interactive
10.18653/v1/2024.naacl-long.3
+
Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study
@@ -62,6 +64,7 @@
2024.naacl-long.4
li-etal-2024-assessing-logical
10.18653/v1/2024.naacl-long.4
+
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation
@@ -74,6 +77,7 @@
2024.naacl-long.5
yun-etal-2024-telme
10.18653/v1/2024.naacl-long.5
+
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries
@@ -87,6 +91,7 @@
2024.naacl-long.6
lee-etal-2024-effective
10.18653/v1/2024.naacl-long.6
+
Promptly Predicting Structures: The Return of Inference
@@ -98,6 +103,7 @@
2024.naacl-long.7
mehta-etal-2024-promptly
10.18653/v1/2024.naacl-long.7
+
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL
@@ -108,6 +114,7 @@
2024.naacl-long.8
shao-nakashole-2024-linearizing
10.18653/v1/2024.naacl-long.8
+
Extractive Summarization with Text Generator
@@ -118,6 +125,7 @@
2024.naacl-long.9
le-luu-2024-extractive
10.18653/v1/2024.naacl-long.9
+
Self-generated Replay Memories for Continual Neural Machine Translation
@@ -128,6 +136,7 @@
2024.naacl-long.10
resta-bacciu-2024-self
10.18653/v1/2024.naacl-long.10
+
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
@@ -141,6 +150,7 @@
2024.naacl-long.11
chen-etal-2024-measuring
10.18653/v1/2024.naacl-long.11
+
Building Knowledge-Guided Lexica to Model Cultural Variation
@@ -155,6 +165,7 @@
2024.naacl-long.12
havaldar-etal-2024-building
10.18653/v1/2024.naacl-long.12
+
Adaptive Rank Selections for Low-Rank Approximation of Language Models
@@ -168,6 +179,7 @@
2024.naacl-long.13
gao-etal-2024-adaptive
10.18653/v1/2024.naacl-long.13
+
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation
@@ -181,6 +193,7 @@
2024.naacl-long.14
gao-etal-2024-empirical
10.18653/v1/2024.naacl-long.14
+
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
@@ -195,6 +208,7 @@
2024.naacl-long.15
wang-etal-2024-unleashing
10.18653/v1/2024.naacl-long.15
+
FPT: Feature Prompt Tuning for Few-shot Readability Assessment
@@ -207,6 +221,7 @@
2024.naacl-long.16
wang-etal-2024-fpt
10.18653/v1/2024.naacl-long.16
+
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA
@@ -219,6 +234,7 @@
2024.naacl-long.17
li-etal-2024-self-prompting
10.18653/v1/2024.naacl-long.17
+
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
@@ -232,6 +248,7 @@
2024.naacl-long.18
sun-etal-2024-head
10.18653/v1/2024.naacl-long.18
+
kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning
@@ -250,6 +267,7 @@
2024.naacl-long.19
zhao-etal-2024-knn
10.18653/v1/2024.naacl-long.19
+
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
@@ -262,6 +280,7 @@
2024.naacl-long.20
saad-falcon-etal-2024-ares
10.18653/v1/2024.naacl-long.20
+
DEMO: A Statistical Perspective for Efficient Image-Text Matching
@@ -274,6 +293,7 @@
2024.naacl-long.21
zhang-etal-2024-demo
10.18653/v1/2024.naacl-long.21
+
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
@@ -291,6 +311,7 @@
Updated acknowledgement.
10.18653/v1/2024.naacl-long.22
+
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
@@ -316,6 +337,7 @@
Minor updates.
10.18653/v1/2024.naacl-long.24
+
Simple and effective data augmentation for compositional generalization
@@ -326,6 +348,7 @@
2024.naacl-long.25
yao-koller-2024-simple
10.18653/v1/2024.naacl-long.25
+
Rethinking Tabular Data Understanding with Large Language Models
@@ -349,6 +372,7 @@
2024.naacl-long.27
liu-etal-2024-shortcuts
10.18653/v1/2024.naacl-long.27
+
BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain
@@ -375,6 +399,7 @@
2024.naacl-long.29
roy-etal-2024-flap
10.18653/v1/2024.naacl-long.29
+
DuRE: Dual Contrastive Self Training for Semi-Supervised Relation Extraction
@@ -385,6 +410,7 @@
2024.naacl-long.30
feng-lakshmanan-2024-dure
10.18653/v1/2024.naacl-long.30
+
Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks
@@ -396,6 +422,7 @@
2024.naacl-long.31
yu-etal-2024-query
10.18653/v1/2024.naacl-long.31
+
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
@@ -411,6 +438,7 @@
2024.naacl-long.32
huang-etal-2024-embrace
10.18653/v1/2024.naacl-long.32
+
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
@@ -423,6 +451,7 @@
2024.naacl-long.33
qiu-etal-2024-amrfact
10.18653/v1/2024.naacl-long.33
+
PILOT: Legal Case Outcome Prediction with Case Law
@@ -435,6 +464,7 @@
2024.naacl-long.34
cao-etal-2024-pilot
10.18653/v1/2024.naacl-long.34
+
ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models
@@ -458,6 +488,7 @@
2024.naacl-long.36
chang-glass-2024-r
10.18653/v1/2024.naacl-long.36
+
InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions
@@ -473,6 +504,7 @@
2024.naacl-long.37
wang-etal-2024-inscl
10.18653/v1/2024.naacl-long.37
+
Language Agnostic Code Embeddings
@@ -484,6 +516,7 @@
2024.naacl-long.38
utpala-etal-2024-language
10.18653/v1/2024.naacl-long.38
+
An Examination of the Compositionality of Large Generative Vision-Language Models
@@ -495,6 +528,7 @@
2024.naacl-long.39
ma-etal-2024-examination
10.18653/v1/2024.naacl-long.39
+
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
@@ -506,6 +540,7 @@
2024.naacl-long.40
graf-etal-2024-two
10.18653/v1/2024.naacl-long.40
+
VertAttack: Taking Advantage of Text Classifiers’ Horizontal Vision
@@ -518,6 +553,7 @@
Minor update.
10.18653/v1/2024.naacl-long.41
+
KDMCSE: Knowledge Distillation Multimodal Sentence Embeddings with Adaptive Angular margin Contrastive Learning
@@ -530,6 +566,7 @@
2024.naacl-long.42
nguyen-etal-2024-kdmcse
10.18653/v1/2024.naacl-long.42
+
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
@@ -542,6 +579,7 @@
2024.naacl-long.43
zhu-etal-2024-taste
10.18653/v1/2024.naacl-long.43
+
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
@@ -555,6 +593,7 @@
2024.naacl-long.44
zhang-etal-2024-think
10.18653/v1/2024.naacl-long.44
+
BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings
@@ -565,6 +604,7 @@
2024.naacl-long.45
li-li-2024-bellm
10.18653/v1/2024.naacl-long.45
+
Assessing Factual Reliability of Large Language Model Knowledge
@@ -577,6 +617,7 @@
2024.naacl-long.46
wang-etal-2024-assessing
10.18653/v1/2024.naacl-long.46
+
Dial-MAE: ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems
@@ -590,6 +631,7 @@
2024.naacl-long.47
su-etal-2024-dial
10.18653/v1/2024.naacl-long.47
+
Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model
@@ -602,6 +644,7 @@
2024.naacl-long.48
qian-etal-2024-toolink
10.18653/v1/2024.naacl-long.48
+
Create! Don’t Repeat: A Paradigm Shift in Multi-Label Augmentation through Label Creative Generation
@@ -613,6 +656,7 @@
2024.naacl-long.49
wang-etal-2024-create
10.18653/v1/2024.naacl-long.49
+
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling
@@ -623,6 +667,7 @@
2024.naacl-long.50
safaya-yuret-2024-neurocache
10.18653/v1/2024.naacl-long.50
+
Unveiling the Generalization Power of Fine-Tuned Large Language Models
@@ -637,6 +682,7 @@
2024.naacl-long.51
yang-etal-2024-unveiling
10.18653/v1/2024.naacl-long.51
+
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning
@@ -650,6 +696,7 @@
2024.naacl-long.52
hong-etal-2024-closer
10.18653/v1/2024.naacl-long.52
+
Exploring Self-supervised Logic-enhanced Training for Large Language Models
@@ -667,6 +714,7 @@
This revision corrects the institution of the corresponding author, Dr. Nancy F. Chen, who should be a professor at Nanyang Technological University as well as research scientist at Institute for Infocomm Research, A*STAR.
10.18653/v1/2024.naacl-long.53
+
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning
@@ -679,6 +727,7 @@
2024.naacl-long.54
das-etal-2024-mathsensei
10.18653/v1/2024.naacl-long.54
+
CoUDA: Coherence Evaluation via Unified Data Augmentation
@@ -693,6 +742,7 @@
2024.naacl-long.55
zhu-etal-2024-couda
10.18653/v1/2024.naacl-long.55
+
mEdIT: Multilingual Text Editing via Instruction Tuning
@@ -706,6 +756,7 @@
2024.naacl-long.56
raheja-etal-2024-medit
10.18653/v1/2024.naacl-long.56
+
Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning
@@ -719,6 +770,7 @@
2024.naacl-long.57
zhang-etal-2024-navigation
10.18653/v1/2024.naacl-long.57
+
In-context Learning and Gradient Descent Revisited
@@ -731,6 +783,7 @@
2024.naacl-long.58
deutch-etal-2024-context
10.18653/v1/2024.naacl-long.58
+
Corpus Considerations for Annotator Modeling and Scaling
@@ -745,6 +798,7 @@
2024.naacl-long.59
sarumi-etal-2024-corpus
10.18653/v1/2024.naacl-long.59
+
On Large Language Models’ Hallucination with Regard to Known Facts
@@ -762,6 +816,7 @@
2024.naacl-long.60
jiang-etal-2024-large
10.18653/v1/2024.naacl-long.60
+
“One-Size-Fits-All”? Examining Expectations around What Constitute “Fair” or “Good” NLG System Behaviors
@@ -775,6 +830,7 @@
2024.naacl-long.61
lucy-etal-2024-one
10.18653/v1/2024.naacl-long.61
+
Language Models Hallucinate, but May Excel at Fact Verification
@@ -788,6 +844,7 @@
2024.naacl-long.62
guan-etal-2024-language
10.18653/v1/2024.naacl-long.62
+
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution
@@ -802,6 +859,7 @@
2024.naacl-long.63
ding-etal-2024-rationale
10.18653/v1/2024.naacl-long.63
+
TrojFSP: Trojan Insertion in Few-shot Prompt Tuning
@@ -816,6 +874,7 @@
2024.naacl-long.64
zheng-etal-2024-trojfsp
10.18653/v1/2024.naacl-long.64
+
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
@@ -834,6 +893,7 @@
2024.naacl-long.65
luo-etal-2024-ensuring
10.18653/v1/2024.naacl-long.65
+
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs
@@ -845,6 +905,7 @@
2024.naacl-long.66
rodriguez-etal-2024-x
10.18653/v1/2024.naacl-long.66
+
Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers
@@ -859,6 +920,7 @@
2024.naacl-long.67
movva-etal-2024-topics
10.18653/v1/2024.naacl-long.67
+
E^5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate
@@ -870,6 +932,7 @@
2024.naacl-long.68
zhang-etal-2024-e5
10.18653/v1/2024.naacl-long.68
+
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Model
@@ -884,6 +947,7 @@
2024.naacl-long.69
lei-etal-2024-s3eval
10.18653/v1/2024.naacl-long.69
+
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
@@ -900,6 +964,7 @@
2024.naacl-long.70
liu-etal-2024-mmc
10.18653/v1/2024.naacl-long.70
+
Visual Grounding Helps Learn Word Meanings in Low-Data Regimes
@@ -911,6 +976,7 @@
2024.naacl-long.71
zhuang-etal-2024-visual
10.18653/v1/2024.naacl-long.71
+
Accurate Knowledge Distillation via n-best Reranking
@@ -920,6 +986,7 @@
2024.naacl-long.72
setiawan-2024-accurate
10.18653/v1/2024.naacl-long.72
+
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
@@ -935,6 +1002,7 @@
2024.naacl-long.73
chen-etal-2024-autoprm
10.18653/v1/2024.naacl-long.73
+
SEMQA: Semi-Extractive Multi-Source Question Answering
@@ -961,6 +1029,7 @@
2024.naacl-long.75
lang-etal-2024-fine
10.18653/v1/2024.naacl-long.75
+
A Universal Dependencies Treebank for Highland Puebla Nahuatl
@@ -984,6 +1053,7 @@
2024.naacl-long.77
wibowo-etal-2024-copal
10.18653/v1/2024.naacl-long.77
+
IterAlign: Iterative Constitutional Alignment of Large Language Models
@@ -1000,6 +1070,7 @@
2024.naacl-long.78
chen-etal-2024-iteralign
10.18653/v1/2024.naacl-long.78
+
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
@@ -1013,6 +1084,7 @@
10.18653/v1/2024.naacl-long.79
Minor updates.
+
Multi-Operational Mathematical Derivations in Latent Space
@@ -1040,6 +1112,7 @@
2024.naacl-long.81
si-etal-2024-large
10.18653/v1/2024.naacl-long.81
+
XferBench: a Data-Driven Benchmark for Emergent Language
@@ -1050,6 +1123,7 @@
2024.naacl-long.82
boldt-mortensen-2024-xferbench
10.18653/v1/2024.naacl-long.82
+
Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
@@ -1062,6 +1136,7 @@
2024.naacl-long.83
yoon-etal-2024-evaluating
10.18653/v1/2024.naacl-long.83
+
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
@@ -1074,6 +1149,7 @@
2024.naacl-long.84
meadows-etal-2024-symbolic
10.18653/v1/2024.naacl-long.84
+
Identifying Linear Relational Concepts in Large Language Models
@@ -1085,6 +1161,7 @@
2024.naacl-long.85
chanin-etal-2024-identifying
10.18653/v1/2024.naacl-long.85
+
Benchmark Transparency: Measuring the Impact of Data on Evaluation
@@ -1095,6 +1172,7 @@
2024.naacl-long.86
kovatchev-lease-2024-benchmark
10.18653/v1/2024.naacl-long.86
+
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
@@ -1109,6 +1187,7 @@
2024.naacl-long.87
fisher-etal-2024-jamdec
10.18653/v1/2024.naacl-long.87
+
REST: Retrieval-Based Speculative Decoding
@@ -1122,6 +1201,7 @@
2024.naacl-long.88
he-etal-2024-rest
10.18653/v1/2024.naacl-long.88
+
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations
@@ -1140,6 +1220,7 @@
2024.naacl-long.89
chen-etal-2024-sub
10.18653/v1/2024.naacl-long.89
+
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
@@ -1150,6 +1231,7 @@
2024.naacl-long.90
sadat-caragea-2024-mscinli
10.18653/v1/2024.naacl-long.90
+
Causal Inference for Human-Language Model Collaboration
@@ -1161,6 +1243,7 @@
2024.naacl-long.91
zhang-etal-2024-causal
10.18653/v1/2024.naacl-long.91
+
SELF-GUARD: Empower the LLM to Safeguard Itself
@@ -1177,6 +1260,7 @@
2024.naacl-long.92
wang-etal-2024-self
10.18653/v1/2024.naacl-long.92
+
COSIGN: Contextual Facts Guided Generation for Knowledge Graph Completion
@@ -1189,6 +1273,7 @@
2024.naacl-long.93
li-etal-2024-cosign
10.18653/v1/2024.naacl-long.93
+
Toward Informal Language Processing: Knowledge of Slang in Large Language Models
@@ -1202,6 +1287,7 @@
2024.naacl-long.94
sun-etal-2024-toward
10.18653/v1/2024.naacl-long.94
+
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
@@ -1214,6 +1300,7 @@
2024.naacl-long.95
verma-etal-2024-ghostbuster
10.18653/v1/2024.naacl-long.95
+
End-to-End Beam Retrieval for Multi-Hop Question Answering
@@ -1227,6 +1314,7 @@
2024.naacl-long.96
zhang-etal-2024-end
10.18653/v1/2024.naacl-long.96
+
Leveraging Generative Large Language Models with Visual Instruction and Demonstration Retrieval for Multimodal Sarcasm Detection
@@ -1239,6 +1327,7 @@
2024.naacl-long.97
tang-etal-2024-leveraging
10.18653/v1/2024.naacl-long.97
+
Multi-Scale Prompt Memory-Augmented Model for Black-Box Scenarios
@@ -1251,6 +1340,7 @@
2024.naacl-long.98
kuang-etal-2024-multi
10.18653/v1/2024.naacl-long.98
+
Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction
@@ -1262,6 +1352,7 @@
2024.naacl-long.99
tang-etal-2024-ungrammatical
10.18653/v1/2024.naacl-long.99
+
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer
@@ -1279,6 +1370,7 @@
2024.naacl-long.100
asai-etal-2024-buffet
10.18653/v1/2024.naacl-long.100
+
TISE: A Tripartite In-context Selection Method for Event Argument Extraction
@@ -1291,6 +1383,7 @@
2024.naacl-long.101
fu-etal-2024-tise
10.18653/v1/2024.naacl-long.101
+
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
@@ -1308,6 +1401,7 @@
2024.naacl-long.102
wu-etal-2024-reasoning
10.18653/v1/2024.naacl-long.102
+
TRUE-UIE: Two Universal Relations Unify Information Extraction Tasks
@@ -1320,6 +1414,7 @@
2024.naacl-long.103
wang-etal-2024-true
10.18653/v1/2024.naacl-long.103
+
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models
@@ -1335,6 +1430,7 @@
2024.naacl-long.104
ding-etal-2024-zrllm
10.18653/v1/2024.naacl-long.104
+
Embodied Executable Policy Learning with Language-based Scene Summarization
@@ -1348,6 +1444,7 @@
2024.naacl-long.105
qiu-etal-2024-embodied
10.18653/v1/2024.naacl-long.105
+
Metacognitive Prompting Improves Understanding in Large Language Models
@@ -1358,6 +1455,7 @@
2024.naacl-long.106
wang-zhao-2024-metacognitive
10.18653/v1/2024.naacl-long.106
+
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
@@ -1374,6 +1472,7 @@
2024.naacl-long.107
ge-etal-2024-mart
10.18653/v1/2024.naacl-long.107
+
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
@@ -1387,6 +1486,7 @@
2024.naacl-long.108
lee-etal-2024-dialogcc
10.18653/v1/2024.naacl-long.108
+
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
@@ -1414,6 +1514,7 @@
2024.naacl-long.110
liu-etal-2024-automatic
10.18653/v1/2024.naacl-long.110
+
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
@@ -1426,6 +1527,7 @@
2024.naacl-long.111
liu-etal-2024-fun
10.18653/v1/2024.naacl-long.111
+
Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings
@@ -1438,6 +1540,7 @@
2024.naacl-long.112
liu-etal-2024-multilingual
10.18653/v1/2024.naacl-long.112
+
The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth
@@ -1453,6 +1556,7 @@
2024.naacl-long.113
lissak-etal-2024-colorful
10.18653/v1/2024.naacl-long.113
+
IPED: An Implicit Perspective for Relational Triple Extraction based on Diffusion Model
@@ -1464,6 +1568,7 @@
2024.naacl-long.114
zhao-etal-2024-iped
10.18653/v1/2024.naacl-long.114
+
QualEval: Qualitative Evaluation for Model Improvement
@@ -1479,6 +1584,7 @@
2024.naacl-long.115
murahari-etal-2024-qualeval
10.18653/v1/2024.naacl-long.115
+
Quantum-inspired Language Model with Lindblad Master Equation and Interference Measurement for Sentiment Analysis
@@ -1490,6 +1596,7 @@
2024.naacl-long.116
yan-etal-2024-quantum
10.18653/v1/2024.naacl-long.116
+
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization
@@ -1509,6 +1616,7 @@
Minor update.
10.18653/v1/2024.naacl-long.117
+
A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
@@ -1524,6 +1632,7 @@
2024.naacl-long.118
ding-etal-2024-wolf
10.18653/v1/2024.naacl-long.118
+
P^3Sum: Preserving Author’s Perspective in News Summarization with Diffusion Language Models
@@ -1539,6 +1648,7 @@
2024.naacl-long.119
liu-etal-2024-p3sum
10.18653/v1/2024.naacl-long.119
+
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes
@@ -1552,6 +1662,7 @@
2024.naacl-long.120
wang-etal-2024-bridging
10.18653/v1/2024.naacl-long.120
+
RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization
@@ -1562,6 +1673,7 @@
2024.naacl-long.121
pu-demberg-2024-rst
10.18653/v1/2024.naacl-long.121
+
Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation
@@ -1575,6 +1687,7 @@
2024.naacl-long.122
lu-etal-2024-strings
10.18653/v1/2024.naacl-long.122
+
ReTA: Recursively Thinking Ahead to Improve the Strategic Reasoning of Large Language Models
@@ -1590,6 +1703,7 @@
2024.naacl-long.123
duan-etal-2024-reta
10.18653/v1/2024.naacl-long.123
+
Fact Checking Beyond Training Set
@@ -1600,6 +1714,7 @@
2024.naacl-long.124
karisani-ji-2024-fact
10.18653/v1/2024.naacl-long.124
+
Program-Aided Reasoners (Better) Know What They Know
@@ -1614,6 +1729,7 @@
2024.naacl-long.125
kabra-etal-2024-program
10.18653/v1/2024.naacl-long.125
+
The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels
@@ -1626,6 +1742,7 @@
2024.naacl-long.126
fleisig-etal-2024-perspectivist
10.18653/v1/2024.naacl-long.126
+
Principles from Clinical Research for NLP Model Generalization
@@ -1638,6 +1755,7 @@
2024.naacl-long.127
elangovan-etal-2024-principles
10.18653/v1/2024.naacl-long.127
+
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
@@ -1663,6 +1781,7 @@
2024.naacl-long.129
tang-etal-2024-found
10.18653/v1/2024.naacl-long.129
+
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
@@ -1678,6 +1797,7 @@
2024.naacl-long.130
wu-etal-2024-language
10.18653/v1/2024.naacl-long.130
+
POLYIE: A Dataset of Information Extraction from Polymer Material Scientific Literature
@@ -1694,6 +1814,7 @@
2024.naacl-long.131
cheung-etal-2024-polyie
10.18653/v1/2024.naacl-long.131
+
LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination
@@ -1706,6 +1827,7 @@
2024.naacl-long.132
zhang-etal-2024-llm-based
10.18653/v1/2024.naacl-long.132
+
SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual Summarization
@@ -1717,6 +1839,7 @@
2024.naacl-long.133
parnell-etal-2024-sumtra
10.18653/v1/2024.naacl-long.133
+
KTRL+F: Knowledge-Augmented In-Document Search
@@ -1730,6 +1853,7 @@
2024.naacl-long.134
oh-etal-2024-ktrl
10.18653/v1/2024.naacl-long.134
+
How Well Do Large Language Models Truly Ground?
@@ -1745,6 +1869,7 @@
2024.naacl-long.135
lee-etal-2024-well
10.18653/v1/2024.naacl-long.135
+
ALBA: Adaptive Language-Based Assessments for Mental Health
@@ -1757,6 +1882,7 @@
2024.naacl-long.136
varadarajan-etal-2024-alba
10.18653/v1/2024.naacl-long.136
+
FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering
@@ -1769,6 +1895,7 @@
2024.naacl-long.137
zhou-etal-2024-freb
10.18653/v1/2024.naacl-long.137
+
MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
@@ -1784,6 +1911,7 @@
2024.naacl-long.138
jia-etal-2024-mill
10.18653/v1/2024.naacl-long.138
+
Efficient Benchmarking (of Language Models)
@@ -1801,6 +1929,7 @@
2024.naacl-long.139
perlitz-etal-2024-efficient
10.18653/v1/2024.naacl-long.139
+
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
@@ -1812,6 +1941,7 @@
2024.naacl-long.140
arad-etal-2024-refact
10.18653/v1/2024.naacl-long.140
+
A Likelihood Ratio Test of Genetic Relationship among Languages
@@ -1822,6 +1952,7 @@
2024.naacl-long.141
akavarapu-bhattacharya-2024-likelihood
10.18653/v1/2024.naacl-long.141
+
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
@@ -1836,6 +1967,7 @@
2024.naacl-long.142
zhu-etal-2024-pad
10.18653/v1/2024.naacl-long.142
+
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
@@ -1855,6 +1987,7 @@
2024.naacl-long.143
ahuja-etal-2024-megaverse
10.18653/v1/2024.naacl-long.143
+
Unlocking Emergent Modularity in Large Language Models
@@ -1866,6 +1999,7 @@
2024.naacl-long.144
qiu-etal-2024-unlocking
10.18653/v1/2024.naacl-long.144
+
A School Student Essay Corpus for Analyzing Interactions of Argumentative Structure and Quality
@@ -1880,6 +2014,7 @@
2024.naacl-long.145
stahl-etal-2024-school
10.18653/v1/2024.naacl-long.145
+
Adjusting Interpretable Dimensions in Embedding Space with Human Judgments
@@ -1890,6 +2025,7 @@
2024.naacl-long.146
erk-apidianaki-2024-adjusting
10.18653/v1/2024.naacl-long.146
+
PatentEval: Understanding Errors in Patent Generation
@@ -1902,6 +2038,7 @@
2024.naacl-long.147
zuo-etal-2024-patenteval
10.18653/v1/2024.naacl-long.147
+
Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing
@@ -1914,6 +2051,7 @@
2024.naacl-long.148
koneru-etal-2024-contextual
10.18653/v1/2024.naacl-long.148
+
Metaphor Detection with Context Enhancement and Curriculum Learning
@@ -1924,6 +2062,7 @@
2024.naacl-long.149
jia-li-2024-metaphor
10.18653/v1/2024.naacl-long.149
+
What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?
@@ -1935,6 +2074,7 @@
2024.naacl-long.150
liu-etal-2024-causes
10.18653/v1/2024.naacl-long.150
+
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
@@ -1952,6 +2092,7 @@
2024.naacl-long.151
arora-etal-2024-universlu
10.18653/v1/2024.naacl-long.151
+
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
@@ -1964,6 +2105,7 @@
2024.naacl-long.152
mo-etal-2024-trustworthy
10.18653/v1/2024.naacl-long.152
+
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models
@@ -1977,6 +2119,7 @@
2024.naacl-long.153
zhou-etal-2024-paraphrase
10.18653/v1/2024.naacl-long.153
+
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
@@ -1991,6 +2134,7 @@
2024.naacl-long.154
jiang-etal-2024-trisum
10.18653/v1/2024.naacl-long.154
+
GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models
@@ -2004,6 +2148,7 @@
2024.naacl-long.155
jiang-etal-2024-genres
10.18653/v1/2024.naacl-long.155
+
Curated Datasets and Neural Models for Machine Translation of Informal Registers between Mayan and Spanish Vernaculars
@@ -2016,6 +2161,7 @@
2024.naacl-long.156
lou-etal-2024-curated
10.18653/v1/2024.naacl-long.156
+
The Effect of Data Partitioning Strategy on Model Generalizability: A Case Study of Morphological Segmentation
@@ -2026,6 +2172,7 @@
2024.naacl-long.157
liu-dorr-2024-effect
10.18653/v1/2024.naacl-long.157
+
Measuring Entrainment in Spontaneous Code-switched Speech
@@ -2038,6 +2185,7 @@
2024.naacl-long.158
bhattacharya-etal-2024-measuring
10.18653/v1/2024.naacl-long.158
+
A Survey of Meaning Representations – From Theory to Practical Utility
@@ -2052,6 +2200,7 @@
V2 adds acknowledgements, corrects a minor inaccuracy in wording, and fixes an abstract anaphoric reference in Fig 1 and 6.
10.18653/v1/2024.naacl-long.159
+
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
@@ -2067,6 +2216,7 @@
2024.naacl-long.160
zhao-etal-2024-mitigating
10.18653/v1/2024.naacl-long.160
+
Evaluating In-Context Learning of Libraries for Code Generation
@@ -2079,6 +2229,7 @@
2024.naacl-long.161
patel-etal-2024-evaluating
10.18653/v1/2024.naacl-long.161
+
Visually-Aware Context Modeling for News Image Captioning
@@ -2090,6 +2241,7 @@
2024.naacl-long.162
qu-etal-2024-visually
10.18653/v1/2024.naacl-long.162
+
Regularized Conventions: Equilibrium Computation as a Model of Pragmatic Reasoning
@@ -2101,6 +2253,7 @@
2024.naacl-long.163
jacob-etal-2024-regularized
10.18653/v1/2024.naacl-long.163
+
TopicGPT: A Prompt-based Topic Modeling Framework
@@ -2114,6 +2267,7 @@
2024.naacl-long.164
pham-etal-2024-topicgpt
10.18653/v1/2024.naacl-long.164
+
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
@@ -2127,6 +2281,7 @@
2024.naacl-long.165
li-etal-2024-chatgpt
10.18653/v1/2024.naacl-long.165
+
Social Meme-ing: Measuring Linguistic Variation in Memes
@@ -2140,6 +2295,7 @@
Update paper to camera-ready version.
10.18653/v1/2024.naacl-long.166
+
ExpertQA: Expert-Curated Questions and Attributed Answers
@@ -2154,6 +2310,7 @@
2024.naacl-long.167
malaviya-etal-2024-expertqa
10.18653/v1/2024.naacl-long.167
+
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception
@@ -2166,6 +2323,7 @@
2024.naacl-long.168
malaviya-etal-2024-said
10.18653/v1/2024.naacl-long.168
+
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
@@ -2204,6 +2362,7 @@
2024.naacl-long.170
robinson-etal-2024-kreyol
10.18653/v1/2024.naacl-long.170
+
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
@@ -2227,6 +2386,7 @@
2024.naacl-long.172
yang-jurgens-2024-modeling
10.18653/v1/2024.naacl-long.172
+
Native Language Identification in Texts: A Survey
@@ -2240,6 +2400,7 @@
2024.naacl-long.173
goswami-etal-2024-native
10.18653/v1/2024.naacl-long.173
+
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
@@ -2252,6 +2413,7 @@
2024.naacl-long.174
yang-etal-2024-loretta
10.18653/v1/2024.naacl-long.174
+
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
@@ -2266,6 +2428,7 @@
2024.naacl-long.175
mitra-etal-2024-one
10.18653/v1/2024.naacl-long.175
+
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
@@ -2290,6 +2453,7 @@
2024.naacl-long.177
zhang-etal-2024-promptfix
10.18653/v1/2024.naacl-long.177
+
Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
@@ -2300,6 +2464,7 @@
2024.naacl-long.178
zhao-aletras-2024-comparing
10.18653/v1/2024.naacl-long.178
+
A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
@@ -2319,6 +2484,7 @@
2024.naacl-long.179
longpre-etal-2024-pretrainers
10.18653/v1/2024.naacl-long.179
+
Instructional Fingerprinting of Large Language Models
@@ -2348,6 +2514,7 @@
2024.naacl-long.181
salkhordeh-ziabari-etal-2024-reinforced
10.18653/v1/2024.naacl-long.181
+
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
@@ -2362,6 +2529,7 @@
2024.naacl-long.182
tuli-etal-2024-dynamo
10.18653/v1/2024.naacl-long.182
+
Few-shot Knowledge Graph Relational Reasoning via Subgraph Adaptation
@@ -2374,6 +2542,7 @@
2024.naacl-long.183
liu-etal-2024-shot
10.18653/v1/2024.naacl-long.183
+
Uncertainty Quantification for In-Context Learning of Large Language Models
@@ -2395,6 +2564,7 @@
2024.naacl-long.184
ling-etal-2024-uncertainty
10.18653/v1/2024.naacl-long.184
+
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
@@ -2414,6 +2584,7 @@
2024.naacl-long.185
wang-etal-2024-helpsteer
10.18653/v1/2024.naacl-long.185
+
A Preference-driven Paradigm for Enhanced Translation with Large Language Models
@@ -2428,6 +2599,7 @@
2024.naacl-long.186
zhu-etal-2024-preference
10.18653/v1/2024.naacl-long.186
+
Fair Abstractive Summarization of Diverse Perspectives
@@ -2448,6 +2620,7 @@
2024.naacl-long.187
zhang-etal-2024-fair
10.18653/v1/2024.naacl-long.187
+
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases
@@ -2462,6 +2635,7 @@
2024.naacl-long.188
tiong-etal-2024-measuring
10.18653/v1/2024.naacl-long.188
+
Show Your Work with Confidence: Confidence Bands for Tuning Curves
@@ -2473,6 +2647,7 @@
2024.naacl-long.189
lourie-etal-2024-show
10.18653/v1/2024.naacl-long.189
+
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
@@ -2490,6 +2665,7 @@
2024.naacl-long.190
prabhakaran-etal-2024-grasp
10.18653/v1/2024.naacl-long.190
+
Event Causality Is Key to Computational Story Understanding
@@ -2501,6 +2677,7 @@
2024.naacl-long.191
sun-etal-2024-event
10.18653/v1/2024.naacl-long.191
+
Subspace Representations for Soft Set Operations and Sentence Similarities
@@ -2513,6 +2690,7 @@
2024.naacl-long.192
ishibashi-etal-2024-subspace
10.18653/v1/2024.naacl-long.192
+
My Heart Skipped a Beat! Recognizing Expressions of Embodied Emotion in Natural Language
@@ -2524,6 +2702,7 @@
2024.naacl-long.193
zhuang-etal-2024-heart
10.18653/v1/2024.naacl-long.193
+
Low-Cost Generation and Evaluation of Dictionary Example Sentences
@@ -2536,6 +2715,7 @@
2024.naacl-long.194
cai-etal-2024-low
10.18653/v1/2024.naacl-long.194
+
Making Language Models Better Tool Learners with Execution Feedback
@@ -2550,6 +2730,7 @@
2024.naacl-long.195
qiao-etal-2024-making
10.18653/v1/2024.naacl-long.195
+
Complex Claim Verification with Evidence Retrieved in the Wild
@@ -2563,6 +2744,7 @@
2024.naacl-long.196
chen-etal-2024-complex
10.18653/v1/2024.naacl-long.196
+
Multimodal Multi-loss Fusion Network for Sentiment Analysis
@@ -2575,6 +2757,7 @@
2024.naacl-long.197
wu-etal-2024-multimodal
10.18653/v1/2024.naacl-long.197
+
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
@@ -2587,6 +2770,7 @@
2024.naacl-long.198
liu-etal-2024-confronting
10.18653/v1/2024.naacl-long.198
+
Analyzing the Use of Metaphors in News Editorials for Political Framing
@@ -2599,6 +2783,7 @@
2024.naacl-long.199
sengupta-etal-2024-analyzing
10.18653/v1/2024.naacl-long.199
+
SharpSeq: Empowering Continual Event Detection through Sharpness-Aware Sequential-task Learning
@@ -2613,6 +2798,7 @@
2024.naacl-long.200
le-etal-2024-sharpseq
10.18653/v1/2024.naacl-long.200
+
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models
@@ -2627,6 +2813,7 @@
2024.naacl-long.201
linzbach-etal-2024-dissecting
10.18653/v1/2024.naacl-long.201
+
Know When To Stop: A Study of Semantic Drift in Text Generation
@@ -2639,6 +2826,7 @@
2024.naacl-long.202
spataru-2024-know
10.18653/v1/2024.naacl-long.202
+
Curriculum Masking in Vision-Language Pretraining to Maximize Cross Modal Interaction
@@ -2649,6 +2837,7 @@
2024.naacl-long.203
tou-sun-2024-curriculum
10.18653/v1/2024.naacl-long.203
+
Elote, Choclo and Mazorca: on the Varieties of Spanish
@@ -2659,6 +2848,7 @@
2024.naacl-long.204
espana-bonet-barron-cedeno-2024-elote
10.18653/v1/2024.naacl-long.204
+
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
@@ -2672,6 +2862,7 @@
2024.naacl-long.205
wang-etal-2024-ada
10.18653/v1/2024.naacl-long.205
+
A Zero-Shot Monolingual Dual Stage Information Retrieval System for Spanish Biomedical Systematic Literature Reviews
@@ -2684,6 +2875,7 @@
2024.naacl-long.206
ofori-boateng-etal-2024-zero
10.18653/v1/2024.naacl-long.206
+
LayoutPointer: A Spatial-Context Adaptive Pointer Network for Visual Information Extraction
@@ -2695,6 +2887,7 @@
2024.naacl-long.207
siyuan-etal-2024-layoutpointer
10.18653/v1/2024.naacl-long.207
+
Long-form evaluation of model editing
@@ -2710,6 +2903,7 @@
2024.naacl-long.208
rosati-etal-2024-long
10.18653/v1/2024.naacl-long.208
+
Analyzing the Role of Semantic Representations in the Era of Large Language Models
@@ -2738,6 +2932,7 @@
2024.naacl-long.210
li-etal-2024-traq
10.18653/v1/2024.naacl-long.210
+
MapGuide: A Simple yet Effective Method to Reconstruct Continuous Language from Brain Activities
@@ -2752,6 +2947,7 @@
2024.naacl-long.211
zhao-etal-2024-mapguide
10.18653/v1/2024.naacl-long.211
+
On-the-fly Definition Augmentation of LLMs for Biomedical NER
@@ -2766,6 +2962,7 @@
2024.naacl-long.212
munnangi-etal-2024-fly
10.18653/v1/2024.naacl-long.212
+
This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes
@@ -2777,6 +2974,7 @@
2024.naacl-long.213
li-etal-2024-land
10.18653/v1/2024.naacl-long.213
+
Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation
@@ -2789,6 +2987,7 @@
2024.naacl-long.214
tan-etal-2024-set
10.18653/v1/2024.naacl-long.214
+
LanguageFlow: Advancing Diffusion Language Generation with Probabilistic Flows
@@ -2814,6 +3013,7 @@
2024.naacl-long.216
patel-etal-2024-towards
10.18653/v1/2024.naacl-long.216
+
Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models
@@ -2828,6 +3028,7 @@
2024.naacl-long.217
carranza-etal-2024-synthetic
10.18653/v1/2024.naacl-long.217
+
Okay, Let’s Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation
@@ -2840,6 +3041,7 @@
2024.naacl-long.218
nath-etal-2024-okay
10.18653/v1/2024.naacl-long.218
+
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
@@ -2852,6 +3054,7 @@
2024.naacl-long.219
agrawal-etal-2024-knowledge
10.18653/v1/2024.naacl-long.219
+
Pedagogically Aligned Objectives Create Reliable Automatic Cloze Tests
@@ -2863,6 +3066,7 @@
2024.naacl-long.220
ondov-etal-2024-pedagogically
10.18653/v1/2024.naacl-long.220
+
Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning
@@ -2889,6 +3093,7 @@
2024.naacl-long.222
han-etal-2024-lm
10.18653/v1/2024.naacl-long.222
+
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
@@ -2901,6 +3106,7 @@
2024.naacl-long.223
sun-etal-2024-conscendi
10.18653/v1/2024.naacl-long.223
+
Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
@@ -2912,6 +3118,7 @@
2024.naacl-long.224
yoo-etal-2024-advancing
10.18653/v1/2024.naacl-long.224
+
HTCCN: Temporal Causal Convolutional Networks with Hawkes Process for Extrapolation Reasoning in Temporal Knowledge Graphs
@@ -2926,6 +3133,7 @@
2024.naacl-long.225
chen-etal-2024-htccn
10.18653/v1/2024.naacl-long.225
+
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
@@ -2944,6 +3152,7 @@
2024.naacl-long.226
hou-etal-2024-semstamp
10.18653/v1/2024.naacl-long.226
+
Media Bias Detection Across Families of Language Models
@@ -2956,6 +3165,7 @@
2024.naacl-long.227
maab-etal-2024-media
10.18653/v1/2024.naacl-long.227
+
Better Zero-Shot Reasoning with Role-Play Prompting
@@ -2973,6 +3183,7 @@
2024.naacl-long.228
kong-etal-2024-better
10.18653/v1/2024.naacl-long.228
+
Event-Content-Oriented Dialogue Generation in Short Video
@@ -2986,6 +3197,7 @@
2024.naacl-long.229
cheng-etal-2024-event
10.18653/v1/2024.naacl-long.229
+
DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping
@@ -2999,6 +3211,7 @@
2024.naacl-long.230
chen-etal-2024-dog
10.18653/v1/2024.naacl-long.230
+
Beyond Borders: Investigating Cross-Jurisdiction Transfer in Legal Case Summarization
@@ -3011,6 +3224,7 @@
2024.naacl-long.231
t-y-s-s-etal-2024-beyond
10.18653/v1/2024.naacl-long.231
+
EDC: Effective and Efficient Dialog Comprehension For Dialog State Tracking
@@ -3022,6 +3236,7 @@
2024.naacl-long.232
lu-etal-2024-edc
10.18653/v1/2024.naacl-long.232
+
Automatic Restoration of Diacritics for Speech Data Sets
@@ -3033,6 +3248,7 @@
2024.naacl-long.233
shatnawi-etal-2024-automatic
10.18653/v1/2024.naacl-long.233
+
XNLIeu: a dataset for cross-lingual NLI in Basque
@@ -3047,6 +3263,7 @@
2024.naacl-long.234
heredia-etal-2024-xnlieu
10.18653/v1/2024.naacl-long.234
+
MDR: Model-Specific Demonstration Retrieval at Inference Time for In-Context Learning
@@ -3063,6 +3280,7 @@
2024.naacl-long.235
wang-etal-2024-mdr
10.18653/v1/2024.naacl-long.235
+
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis
@@ -3080,6 +3298,7 @@
Minor updates.
10.18653/v1/2024.naacl-long.236
+
Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding
@@ -3092,6 +3311,7 @@
2024.naacl-long.237
zhao-etal-2024-enhancing
10.18653/v1/2024.naacl-long.237
+
Generalizable Sarcasm Detection is Just Around the Corner, of Course!
@@ -3102,6 +3322,7 @@
2024.naacl-long.238
jang-frassinelli-2024-generalizable
10.18653/v1/2024.naacl-long.238
+
Encoding of lexical tone in self-supervised models of spoken language
@@ -3115,6 +3336,7 @@
2024.naacl-long.239
shen-etal-2024-encoding
10.18653/v1/2024.naacl-long.239
+
A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change
@@ -3137,6 +3359,7 @@
2024.naacl-long.241
xu-etal-2024-iacos
10.18653/v1/2024.naacl-long.241
+
Rectifying Demonstration Shortcut in In-Context Learning
@@ -3150,6 +3373,7 @@
2024.naacl-long.242
jang-etal-2024-rectifying
10.18653/v1/2024.naacl-long.242
+
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
@@ -3171,6 +3395,7 @@
2024.naacl-long.243
mayhew-etal-2024-universal
10.18653/v1/2024.naacl-long.243
+
ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection
@@ -3189,6 +3414,7 @@
2024.naacl-long.244
kwon-etal-2024-odd
10.18653/v1/2024.naacl-long.244
+
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models
@@ -3200,6 +3426,7 @@
2024.naacl-long.245
zhao-etal-2024-comprehensive
10.18653/v1/2024.naacl-long.245
+
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
@@ -3213,6 +3440,7 @@
2024.naacl-long.246
xu-etal-2024-promises
10.18653/v1/2024.naacl-long.246
+
Differentially Private Next-Token Prediction of Large Language Models
@@ -3224,6 +3452,7 @@
2024.naacl-long.247
flemings-etal-2024-differentially
10.18653/v1/2024.naacl-long.247
+
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
@@ -3235,6 +3464,7 @@
2024.naacl-long.248
goldzycher-etal-2024-improving
10.18653/v1/2024.naacl-long.248
+
Memory Augmented Language Models through Mixture of Word Experts
@@ -3248,6 +3478,7 @@
2024.naacl-long.249
nogueira-dos-santos-etal-2024-memory
10.18653/v1/2024.naacl-long.249
+
Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Model
@@ -3264,6 +3495,7 @@
2024.naacl-long.250
jung-etal-2024-impossible
10.18653/v1/2024.naacl-long.250
+
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
@@ -3286,6 +3518,7 @@
2024.naacl-long.251
tang-etal-2024-tofueval
10.18653/v1/2024.naacl-long.251
+
MOKA: Moral Knowledge Augmentation for Moral Event Extraction
@@ -3298,6 +3531,7 @@
2024.naacl-long.252
zhang-etal-2024-moka
10.18653/v1/2024.naacl-long.252
+
Fixing Rogue Memorization in Many-to-One Multilingual Translators of Extremely-Low-Resource Languages by Rephrasing Training Samples
@@ -3310,6 +3544,7 @@
2024.naacl-long.253
cavalin-etal-2024-fixing
10.18653/v1/2024.naacl-long.253
+
Backdoor Attacks on Multilingual Machine Translation
@@ -3323,6 +3558,7 @@
2024.naacl-long.254
wang-etal-2024-backdoor
10.18653/v1/2024.naacl-long.254
+
Personalized Jargon Identification for Enhanced Interdisciplinary Communication
@@ -3338,6 +3574,7 @@
2024.naacl-long.255
guo-etal-2024-personalized
10.18653/v1/2024.naacl-long.255
+
Flames: Benchmarking Value Alignment of LLMs in Chinese
@@ -3358,6 +3595,7 @@
2024.naacl-long.256
huang-etal-2024-flames
10.18653/v1/2024.naacl-long.256
+
Mitigating Bias for Question Answering Models by Tracking Bias Influence
@@ -3388,6 +3626,7 @@
2024.naacl-long.258
kim-etal-2024-extending
10.18653/v1/2024.naacl-long.258
+
Generating Attractive and Authentic Copywriting from Customer Reviews
@@ -3398,6 +3637,7 @@
2024.naacl-long.259
lin-ma-2024-generating
10.18653/v1/2024.naacl-long.259
+
Effective Long-Context Scaling of Foundation Models
@@ -3427,6 +3667,7 @@
2024.naacl-long.260
xiong-etal-2024-effective
10.18653/v1/2024.naacl-long.260
+
Empowering Diffusion Models on the Embedding Space for Text Generation
@@ -3456,6 +3697,7 @@
2024.naacl-long.262
xia-etal-2024-aligning
10.18653/v1/2024.naacl-long.262
+
Fake Alignment: Are LLMs Really Aligned Well?
@@ -3474,6 +3716,7 @@
2024.naacl-long.263
wang-etal-2024-fake
10.18653/v1/2024.naacl-long.263
+
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
@@ -3489,6 +3732,7 @@
2024.naacl-long.264
mao-etal-2024-visually
10.18653/v1/2024.naacl-long.264
+
HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification
@@ -3505,6 +3749,7 @@
2024.naacl-long.265
zhu-etal-2024-hill
10.18653/v1/2024.naacl-long.265
+
Investigating the Emergent Audio Classification Ability of ASR Foundation Models
@@ -3517,6 +3762,7 @@
2024.naacl-long.266
ma-etal-2024-investigating
10.18653/v1/2024.naacl-long.266
+
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
@@ -3529,6 +3775,7 @@
2024.naacl-long.267
mueller-etal-2024-context
10.18653/v1/2024.naacl-long.267
+
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
@@ -3548,6 +3795,7 @@
10.18653/v1/2024.naacl-long.268
Minor updates.
+
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
@@ -3563,6 +3811,7 @@
2024.naacl-long.269
mujtaba-etal-2024-lost
10.18653/v1/2024.naacl-long.269
+
MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification
@@ -3576,6 +3825,7 @@
2024.naacl-long.270
helwe-etal-2024-mafalda
10.18653/v1/2024.naacl-long.270
+
Diffusion Glancing Transformer for Parallel Sequence-to-Sequence Learning
@@ -3588,6 +3838,7 @@
2024.naacl-long.271
qian-etal-2024-diffusion
10.18653/v1/2024.naacl-long.271
+
No Context Needed: Contextual Quandary In Idiomatic Reasoning With Pre-Trained Language Models
@@ -3598,6 +3849,7 @@
2024.naacl-long.272
cheng-bhat-2024-context
10.18653/v1/2024.naacl-long.272
+
Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation
@@ -3609,6 +3861,7 @@
2024.naacl-long.273
wang-etal-2024-multi
10.18653/v1/2024.naacl-long.273
+
Anisotropy is Not Inherent to Transformers
@@ -3619,6 +3872,7 @@
2024.naacl-long.274
machina-mercer-2024-anisotropy
10.18653/v1/2024.naacl-long.274
+
Finding Replicable Human Evaluations via Stable Ranking Probability
@@ -3633,6 +3887,7 @@
2024.naacl-long.275
riley-etal-2024-finding
10.18653/v1/2024.naacl-long.275
+
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
@@ -3644,6 +3899,7 @@
2024.naacl-long.276
cao-etal-2024-stealthy
10.18653/v1/2024.naacl-long.276
+
Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts
@@ -3657,6 +3913,7 @@
2024.naacl-long.277
somayajula-etal-2024-generalizable
10.18653/v1/2024.naacl-long.277
+
Detecting Bipolar Disorder from Misdiagnosed Major Depressive Disorder with Mood-Aware Multi-Task Learning
@@ -3672,6 +3929,7 @@
2024.naacl-long.278
lee-etal-2024-detecting-bipolar
10.18653/v1/2024.naacl-long.278
+
Leveraging Code to Improve In-Context Learning for Semantic Parsing
@@ -3684,6 +3942,7 @@
2024.naacl-long.279
bogin-etal-2024-leveraging
10.18653/v1/2024.naacl-long.279
+
Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER
@@ -3697,6 +3956,7 @@
2024.naacl-long.280
abaho-etal-2024-improving
10.18653/v1/2024.naacl-long.280
+
Language Models Implement Simple Word2Vec-style Vector Arithmetic
@@ -3708,6 +3968,7 @@
2024.naacl-long.281
merullo-etal-2024-language
10.18653/v1/2024.naacl-long.281
+
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
@@ -3720,6 +3981,7 @@
2024.naacl-long.282
zhang-etal-2024-autolora
10.18653/v1/2024.naacl-long.282
+
SportQA: A Benchmark for Sports Understanding in Large Language Models
@@ -3738,6 +4000,7 @@
2024.naacl-long.283
xia-etal-2024-sportqa
10.18653/v1/2024.naacl-long.283
+
Revisiting subword tokenization: A case study on affixal negation in large language models
@@ -3751,6 +4014,7 @@
2024.naacl-long.284
truong-etal-2024-revisiting
10.18653/v1/2024.naacl-long.284
+
Generating Mental Health Transcripts with SAPE (Spanish Adaptive Prompt Engineering)
@@ -3765,6 +4029,7 @@
2024.naacl-long.285
lozoya-etal-2024-generating
10.18653/v1/2024.naacl-long.285
+
Where are you from? Geolocating Speech and Applications to Language Identification
@@ -3779,6 +4044,7 @@
2024.naacl-long.286
foley-etal-2024-geolocating
10.18653/v1/2024.naacl-long.286
+
Teaching Language Models to Self-Improve through Interactive Demonstrations
@@ -3792,6 +4058,7 @@
2024.naacl-long.287
yu-etal-2024-teaching
10.18653/v1/2024.naacl-long.287
+
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
@@ -3810,6 +4077,7 @@
2024.naacl-long.288
aboutalebi-etal-2024-magid
10.18653/v1/2024.naacl-long.288
+
Zero-shot Generative Linguistic Steganography
@@ -3822,6 +4090,7 @@
2024.naacl-long.289
lin-etal-2024-zero
10.18653/v1/2024.naacl-long.289
+
Does GPT-4 pass the Turing test?
@@ -3832,6 +4101,7 @@
2024.naacl-long.290
jones-bergen-2024-gpt
10.18653/v1/2024.naacl-long.290
+
Polarity Calibration for Opinion Summarization
@@ -3846,6 +4116,7 @@
2024.naacl-long.291
lei-etal-2024-polarity
10.18653/v1/2024.naacl-long.291
+
Sentence-level Media Bias Analysis with Event Relation Graph
@@ -3856,6 +4127,7 @@
2024.naacl-long.292
lei-huang-2024-sentence
10.18653/v1/2024.naacl-long.292
+
EMONA: Event-level Moral Opinions in News Articles
@@ -3883,6 +4155,7 @@
2024.naacl-long.294
gao-etal-2024-dlm
10.18653/v1/2024.naacl-long.294
+
You don’t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
@@ -3899,6 +4172,7 @@
2024.naacl-long.295
shu-etal-2024-dont
10.18653/v1/2024.naacl-long.295
+
CASA: Causality-driven Argument Sufficiency Assessment
@@ -3910,6 +4184,7 @@
2024.naacl-long.296
liu-etal-2024-casa
10.18653/v1/2024.naacl-long.296
+
MacGyver: Are Large Language Models Creative Problem Solvers?
@@ -3927,6 +4202,7 @@
2024.naacl-long.297
tian-etal-2024-macgyver
10.18653/v1/2024.naacl-long.297
+
To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages
@@ -3937,6 +4213,7 @@
2024.naacl-long.298
ebing-glavas-2024-translate
10.18653/v1/2024.naacl-long.298
+
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
@@ -3952,6 +4229,7 @@
2024.naacl-long.299
wang-etal-2024-enhancing
10.18653/v1/2024.naacl-long.299
+
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer
@@ -3964,6 +4242,7 @@
2024.naacl-long.300
zaratiana-etal-2024-gliner
10.18653/v1/2024.naacl-long.300
+
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
@@ -3978,6 +4257,7 @@
2024.naacl-long.301
rottger-etal-2024-xstest
10.18653/v1/2024.naacl-long.301
+
Carpe diem: On the Evaluation of World Knowledge in Lifelong Language Models
@@ -4022,6 +4302,7 @@
2024.naacl-long.304
cai-etal-2024-dialogvcs
10.18653/v1/2024.naacl-long.304
+
LLatrieval: LLM-Verified Retrieval for Verifiable Generation
@@ -4036,6 +4317,7 @@
2024.naacl-long.305
li-etal-2024-llatrieval
10.18653/v1/2024.naacl-long.305
+
Mapping Long-term Causalities in Psychiatric Symptomatology and Life Events from Social Media
@@ -4053,6 +4335,7 @@
2024.naacl-long.306
chen-etal-2024-mapping
10.18653/v1/2024.naacl-long.306
+
Multimodal Chart Retrieval: A Comparison of Text, Table and Image Based Approaches
@@ -4064,6 +4347,7 @@
2024.naacl-long.307
nowak-etal-2024-multimodal
10.18653/v1/2024.naacl-long.307
+
Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models
@@ -4076,6 +4360,7 @@
2024.naacl-long.308
maekawa-etal-2024-retrieval
10.18653/v1/2024.naacl-long.308
+
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
@@ -4094,6 +4379,7 @@
2024.naacl-long.309
fathullah-etal-2024-audiochatllama
10.18653/v1/2024.naacl-long.309
+
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
@@ -4107,6 +4393,7 @@
2024.naacl-long.310
gupta-etal-2024-whispers
10.18653/v1/2024.naacl-long.310
+
Sequential Compositional Generalization in Multimodal Models
@@ -4121,6 +4408,7 @@
2024.naacl-long.311
yagcioglu-etal-2024-sequential
10.18653/v1/2024.naacl-long.311
+
Generating Uncontextualized and Contextualized Questions for Document-Level Event Argument Extraction
@@ -4133,6 +4421,7 @@
2024.naacl-long.312
uddin-etal-2024-generating
10.18653/v1/2024.naacl-long.312
+
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation
@@ -4147,6 +4436,7 @@
2024.naacl-long.313
yue-etal-2024-evidence
10.18653/v1/2024.naacl-long.313
+
Open-Vocabulary Federated Learning with Multimodal Prototyping
@@ -4158,6 +4448,7 @@
2024.naacl-long.314
zeng-etal-2024-open
10.18653/v1/2024.naacl-long.314
+
Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning
@@ -4172,6 +4463,7 @@
2024.naacl-long.315
li-etal-2024-exploring
10.18653/v1/2024.naacl-long.315
+
Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense
@@ -4186,6 +4478,7 @@
2024.naacl-long.316
shen-etal-2024-understanding
10.18653/v1/2024.naacl-long.316
+
Code Models are Zero-shot Precondition Reasoners
@@ -4202,6 +4495,7 @@
2024.naacl-long.317
logeswaran-etal-2024-code
10.18653/v1/2024.naacl-long.317
+
Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding
@@ -4213,6 +4507,7 @@
2024.naacl-long.318
kim-etal-2024-contrastive
10.18653/v1/2024.naacl-long.318
+
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
@@ -4226,6 +4521,7 @@
2024.naacl-long.319
wang-etal-2024-large
10.18653/v1/2024.naacl-long.319
+
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition
@@ -4236,6 +4532,7 @@
2024.naacl-long.320
nahid-rafiei-2024-tabsqlify
10.18653/v1/2024.naacl-long.320
+
Contextual Label Projection for Cross-Lingual Structured Prediction
@@ -4249,6 +4546,7 @@
2024.naacl-long.321
parekh-etal-2024-contextual
10.18653/v1/2024.naacl-long.321
+
Event Detection from Social Media for Epidemic Prediction
@@ -4268,6 +4566,7 @@
2024.naacl-long.322
parekh-etal-2024-event
10.18653/v1/2024.naacl-long.322
+
RESPROMPT: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
@@ -4287,6 +4586,7 @@
2024.naacl-long.323
jiang-etal-2024-resprompt
10.18653/v1/2024.naacl-long.323
+
BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision
@@ -4297,6 +4597,7 @@
2024.naacl-long.324
bauwens-delobelle-2024-bpe
10.18653/v1/2024.naacl-long.324
+
How are Prompts Different in Terms of Sensitivity?
@@ -4321,6 +4622,7 @@
2024.naacl-long.326
ye-etal-2024-lstdial
10.18653/v1/2024.naacl-long.326
+
The ART of LLM Refinement: Ask, Refine, and Trust
@@ -4338,6 +4640,7 @@
2024.naacl-long.327
shridhar-etal-2024-art
10.18653/v1/2024.naacl-long.327
+
Modularized Multilingual NMT with Fine-grained Interlingua
@@ -4349,6 +4652,7 @@
2024.naacl-long.328
lim-etal-2024-modularized
10.18653/v1/2024.naacl-long.328
+
ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
@@ -4361,6 +4665,7 @@
2024.naacl-long.329
sultan-etal-2024-parallelparc
10.18653/v1/2024.naacl-long.329
+
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content
@@ -4371,6 +4676,7 @@
2024.naacl-long.330
cao-wang-2024-awesome
10.18653/v1/2024.naacl-long.330
+
NLP Systems That Can’t Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
@@ -4384,6 +4690,7 @@
2024.naacl-long.331
gligoric-etal-2024-nlp
10.18653/v1/2024.naacl-long.331
+
Debiasing with Sufficient Projection: A General Theoretical Framework for Vector Representations
@@ -4396,6 +4703,7 @@
2024.naacl-long.332
shi-etal-2024-debiasing
10.18653/v1/2024.naacl-long.332
+
Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection
@@ -4410,6 +4718,7 @@
2024.naacl-long.333
he-etal-2024-semi
10.18653/v1/2024.naacl-long.333
+
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
@@ -4476,6 +4785,7 @@
2024.naacl-long.334
wang-etal-2024-afrimte
10.18653/v1/2024.naacl-long.334
+
TableLlama: Towards Open Large Generalist Models for Tables
@@ -4488,6 +4798,7 @@
2024.naacl-long.335
zhang-etal-2024-tablellama
10.18653/v1/2024.naacl-long.335
+
PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models
@@ -4499,6 +4810,7 @@
2024.naacl-long.336
kim-etal-2024-pema
10.18653/v1/2024.naacl-long.336
+
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
@@ -4516,6 +4828,7 @@
2024.naacl-long.337
yan-etal-2024-backdooring
10.18653/v1/2024.naacl-long.337
+
Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models
@@ -4529,6 +4842,7 @@
2024.naacl-long.338
she-etal-2024-exploring
10.18653/v1/2024.naacl-long.338
+
Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly
@@ -4543,6 +4857,7 @@
2024.naacl-long.339
gao-etal-2024-multilingual
10.18653/v1/2024.naacl-long.339
+
A Study on the Calibration of In-context Learning
@@ -4559,6 +4874,7 @@
2024.naacl-long.340
zhang-etal-2024-study
10.18653/v1/2024.naacl-long.340
+
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
@@ -4574,6 +4890,7 @@
2024.naacl-long.341
ou-etal-2024-dialogbench
10.18653/v1/2024.naacl-long.341
+
GINopic: Topic Modeling with Graph Isomorphism Network
@@ -4584,6 +4901,7 @@
2024.naacl-long.342
adhya-sanyal-2024-ginopic
10.18653/v1/2024.naacl-long.342
+
CMB: A Comprehensive Medical Benchmark in Chinese
@@ -4604,6 +4922,7 @@
2024.naacl-long.343
wang-etal-2024-cmb
10.18653/v1/2024.naacl-long.343
+
Massive End-to-end Speech Recognition Models with Time Reduction
@@ -4627,6 +4946,7 @@
2024.naacl-long.344
wang-etal-2024-massive
10.18653/v1/2024.naacl-long.344
+
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics
@@ -4642,6 +4962,7 @@
2024.naacl-long.345
ardakani-etal-2024-slimfit
10.18653/v1/2024.naacl-long.345
+
Effective Large Language Model Adaptation for Improved Grounding and Citation Generation
@@ -4668,6 +4989,7 @@
2024.naacl-long.347
shao-etal-2024-assisting
10.18653/v1/2024.naacl-long.347
+
Grounding Gaps in Language Model Generations
@@ -4682,6 +5004,7 @@
2024.naacl-long.348
shaikh-etal-2024-grounding
10.18653/v1/2024.naacl-long.348
+
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
@@ -4694,6 +5017,7 @@
2024.naacl-long.349
baziotis-etal-2024-monolingual
10.18653/v1/2024.naacl-long.349
+
ContraSim – Analyzing Neural Representations Based on Contrastive Learning
@@ -4704,6 +5028,7 @@
2024.naacl-long.350
rahamim-belinkov-2024-contrasim
10.18653/v1/2024.naacl-long.350
+
Universal Prompt Optimizer for Safe Text-to-Image Generation
@@ -4721,6 +5046,7 @@
Minor updates.
10.18653/v1/2024.naacl-long.351
This revision corrects a typo in Eq 3 and 4, and the corresponding text and figure. Also, correct some typos in Section 4.
+
Language Model Based Unsupervised Dependency Parsing with Conditional Mutual Information and Grammatical Constraints
@@ -4732,6 +5058,7 @@
2024.naacl-long.352
chen-etal-2024-language
10.18653/v1/2024.naacl-long.352
+
The Bias Amplification Paradox in Text-to-Image Generation
@@ -4743,6 +5070,7 @@
2024.naacl-long.353
seshadri-etal-2024-bias
10.18653/v1/2024.naacl-long.353
+
Grammar-based Data Augmentation for Low-Resource Languages: The Case of Guarani-Spanish Neural Machine Translation
@@ -4757,6 +5085,7 @@
2024.naacl-long.354
lucas-etal-2024-grammar
10.18653/v1/2024.naacl-long.354
+
Global Gallery: The Fine Art of Painting Culture Portraits through Multilingual Instruction Tuning
@@ -4769,6 +5098,7 @@
2024.naacl-long.355
mukherjee-etal-2024-global
10.18653/v1/2024.naacl-long.355
+
Toward Interactive Regional Understanding in Vision-Large Language Models
@@ -4780,6 +5110,7 @@
2024.naacl-long.356
lee-etal-2024-toward
10.18653/v1/2024.naacl-long.356
+
ScriptMix: Mixing Scripts for Low-resource Language Parsing
@@ -4791,6 +5122,7 @@
2024.naacl-long.357
lee-etal-2024-scriptmix
10.18653/v1/2024.naacl-long.357
+
MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation
@@ -4803,6 +5135,7 @@
2024.naacl-long.358
li-etal-2024-mt
10.18653/v1/2024.naacl-long.358
+
ToXCL: A Unified Framework for Toxic Speech Detection and Explanation
@@ -4816,6 +5149,7 @@
2024.naacl-long.359
hoang-etal-2024-toxcl
10.18653/v1/2024.naacl-long.359
+
LinkPrompt: Natural and Universal Adversarial Attacks on Prompt-based Language Models
@@ -4826,6 +5160,7 @@
2024.naacl-long.360
xu-wang-2024-linkprompt
10.18653/v1/2024.naacl-long.360
+
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
@@ -4839,6 +5174,7 @@
2024.naacl-long.361
zhang-etal-2024-coe
10.18653/v1/2024.naacl-long.361
+
ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models
@@ -4850,6 +5186,7 @@
2024.naacl-long.362
li-etal-2024-contradoc
10.18653/v1/2024.naacl-long.362
+
Entity Disambiguation via Fusion Entity Decoding
@@ -4866,6 +5203,7 @@
2024.naacl-long.363
wang-etal-2024-entity
10.18653/v1/2024.naacl-long.363
+
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
@@ -4877,6 +5215,7 @@
2024.naacl-long.364
lee-etal-2024-planrag
10.18653/v1/2024.naacl-long.364
+
GPTScore: Evaluate as You Desire
@@ -4903,6 +5242,7 @@
2024.naacl-long.366
geng-etal-2024-survey
10.18653/v1/2024.naacl-long.366
+
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
@@ -4919,6 +5259,7 @@
2024.naacl-long.367
tang-etal-2024-metrics
10.18653/v1/2024.naacl-long.367
+
Separation and Fusion: A Novel Multiple Token Linking Model for Event Argument Extraction
@@ -4936,6 +5277,7 @@
2024.naacl-long.368
xu-etal-2024-separation
10.18653/v1/2024.naacl-long.368
+
The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Typing
@@ -4950,6 +5292,7 @@
2024.naacl-long.369
li-etal-2024-integration
10.18653/v1/2024.naacl-long.369
+
ComCLIP: Training-Free Compositional Image and Text Matching
@@ -4962,6 +5305,7 @@
2024.naacl-long.370
jiang-etal-2024-comclip
10.18653/v1/2024.naacl-long.370
+
ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications
@@ -4975,6 +5319,7 @@
2024.naacl-long.371
takeshita-etal-2024-aclsum
10.18653/v1/2024.naacl-long.371
+
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners
@@ -4991,6 +5336,7 @@
2024.naacl-long.372
luo-etal-2024-xal
10.18653/v1/2024.naacl-long.372
+
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
@@ -5007,6 +5353,7 @@
2024.naacl-long.373
wang-etal-2024-ladic
10.18653/v1/2024.naacl-long.373
+
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
@@ -5021,6 +5368,7 @@
2024.naacl-long.374
hengle-etal-2024-intent
10.18653/v1/2024.naacl-long.374
+
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
@@ -5034,6 +5382,7 @@
2024.naacl-long.375
dong-etal-2024-attacks
10.18653/v1/2024.naacl-long.375
+
Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models
@@ -5051,6 +5400,7 @@
2024.naacl-long.376
liu-etal-2024-minds
10.18653/v1/2024.naacl-long.376
+
Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization
@@ -5066,6 +5416,7 @@
2024.naacl-long.377
deiseroth-etal-2024-divergent
10.18653/v1/2024.naacl-long.377
+
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs
@@ -5076,6 +5427,7 @@
2024.naacl-long.378
reif-schwartz-2024-beyond
10.18653/v1/2024.naacl-long.378
+
Instructing Large Language Models to Identify and Ignore Irrelevant Conditions
@@ -5087,6 +5439,7 @@
2024.naacl-long.379
wu-etal-2024-instructing
10.18653/v1/2024.naacl-long.379
+
Lower Bounds on the Expressivity of Recurrent Neural Language Models
@@ -5099,6 +5452,7 @@
2024.naacl-long.380
svete-etal-2024-lower
10.18653/v1/2024.naacl-long.380
+
Transformers Can Represent n-gram Language Models
@@ -5109,6 +5463,7 @@
2024.naacl-long.381
svete-cotterell-2024-transformers
10.18653/v1/2024.naacl-long.381
+
The Role of n-gram Smoothing in the Age of Neural Networks
@@ -5123,6 +5478,7 @@
2024.naacl-long.382
malagutti-etal-2024-role
10.18653/v1/2024.naacl-long.382
+
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together
@@ -5135,6 +5491,7 @@
2024.naacl-long.383
burdisso-etal-2024-reliability
10.18653/v1/2024.naacl-long.383
+
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
@@ -5148,6 +5505,7 @@
2024.naacl-long.384
kojima-etal-2024-multilingual
10.18653/v1/2024.naacl-long.384
+
NLP Progress in Indigenous Latin American Languages
@@ -5163,6 +5521,7 @@
2024.naacl-long.385
tonja-etal-2024-nlp
10.18653/v1/2024.naacl-long.385
+
On the Effectiveness of Adversarial Robustness for Abuse Mitigation with Counterspeech
@@ -5173,6 +5532,7 @@
2024.naacl-long.386
chung-bright-2024-effectiveness
10.18653/v1/2024.naacl-long.386
+
Leveraging the Structure of Pre-trained Embeddings to Minimize Annotation Effort
@@ -5183,6 +5543,7 @@
2024.naacl-long.387
gonzalez-gutierrez-quattoni-2024-leveraging
10.18653/v1/2024.naacl-long.387
+
UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing
@@ -5196,6 +5557,7 @@
2024.naacl-long.388
yang-etal-2024-uniark
10.18653/v1/2024.naacl-long.388
+
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
@@ -5209,6 +5571,7 @@
2024.naacl-long.389
jeong-etal-2024-adaptive
10.18653/v1/2024.naacl-long.389
+
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
@@ -5226,6 +5589,7 @@
2024.naacl-long.390
zhao-etal-2024-knowing
10.18653/v1/2024.naacl-long.390
+
Are Large Language Model Temporally Grounded?
@@ -5240,6 +5604,7 @@
2024.naacl-long.391
qiu-etal-2024-large
10.18653/v1/2024.naacl-long.391
+
Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling
@@ -5256,6 +5621,7 @@
2024.naacl-long.392
liang-etal-2024-document
10.18653/v1/2024.naacl-long.392
+
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation
@@ -5269,6 +5635,7 @@
2024.naacl-long.393
daheim-etal-2024-elastic
10.18653/v1/2024.naacl-long.393
+
R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’
@@ -5286,6 +5653,7 @@
2024.naacl-long.394
zhang-etal-2024-r
10.18653/v1/2024.naacl-long.394
+
Bridging the Gap between Different Vocabularies for LLM Ensemble
@@ -5297,6 +5665,7 @@
2024.naacl-long.395
xu-etal-2024-bridging
10.18653/v1/2024.naacl-long.395
+
KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation
@@ -5310,6 +5679,7 @@
2024.naacl-long.396
luo-etal-2024-knowla
10.18653/v1/2024.naacl-long.396
+
Extremely Weakly-supervised Text Classification with Wordsets Mining and Sync-Denoising
@@ -5319,6 +5689,7 @@
2024.naacl-long.397
xiao-2024-extremely
10.18653/v1/2024.naacl-long.397
+
F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation
@@ -5330,6 +5701,7 @@
2024.naacl-long.398
wu-etal-2024-f
10.18653/v1/2024.naacl-long.398
+
Towards Reducing Diagnostic Errors with Interpretable Risk Prediction
@@ -5355,6 +5727,7 @@
2024.naacl-long.400
singh-thakur-2024-generalizable
10.18653/v1/2024.naacl-long.400
+
Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks
@@ -5365,6 +5738,7 @@
2024.naacl-long.401
chirkova-nikoulina-2024-key
10.18653/v1/2024.naacl-long.401
+
The Impact of Depth on Compositional Generalization in Transformer Language Models
@@ -5379,6 +5753,7 @@
2024.naacl-long.402
petty-etal-2024-impact
10.18653/v1/2024.naacl-long.402
+
Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
@@ -5394,6 +5769,7 @@
2024.naacl-long.403
srikanth-etal-2024-pregnant
10.18653/v1/2024.naacl-long.403
+
Towards Explainability in Legal Outcome Prediction Models
@@ -5420,6 +5796,7 @@
2024.naacl-long.405
li-etal-2024-steerability
10.18653/v1/2024.naacl-long.405
+
CCSum: A Large-Scale and High-Quality Dataset for Abstractive News Summarization
@@ -5430,6 +5807,7 @@
2024.naacl-long.406
jiang-dreyer-2024-ccsum
10.18653/v1/2024.naacl-long.406
+
Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks
@@ -5444,6 +5822,7 @@
2024.naacl-long.407
mokhberian-etal-2024-capturing
10.18653/v1/2024.naacl-long.407
+
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo
@@ -5455,6 +5834,7 @@
2024.naacl-long.408
sundararajan-etal-2024-improving
10.18653/v1/2024.naacl-long.408
+
CERET: Cost-Effective Extrinsic Refinement for Text Generation
@@ -5468,6 +5848,7 @@
2024.naacl-long.409
cai-etal-2024-ceret
10.18653/v1/2024.naacl-long.409
+
Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling
@@ -5484,6 +5865,7 @@
2024.naacl-long.410
khatuya-etal-2024-parameter
10.18653/v1/2024.naacl-long.410
+
Analysis of State-Level Legislative Process in Enhanced Linguistic and Nationwide Network Contexts
@@ -5494,6 +5876,7 @@
2024.naacl-long.411
davoodi-goldwasser-2024-analysis
10.18653/v1/2024.naacl-long.411
+
DeMuX: Data-efficient Multilingual Learning
@@ -5506,6 +5889,7 @@
2024.naacl-long.412
khanuja-etal-2024-demux
10.18653/v1/2024.naacl-long.412
+
DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation
@@ -5517,6 +5901,7 @@
2024.naacl-long.413
chandradevan-etal-2024-duqgen
10.18653/v1/2024.naacl-long.413
+
How did we get here? Summarizing conversation dynamics
@@ -5531,6 +5916,7 @@
2024.naacl-long.414
hua-etal-2024-get
10.18653/v1/2024.naacl-long.414
+
Can Language Model Moderators Improve the Health of Online Discourse?
@@ -5550,6 +5936,7 @@
2024.naacl-long.415
cho-etal-2024-language
10.18653/v1/2024.naacl-long.415
+
LeanReasoner: Boosting Complex Logical Reasoning with Lean
@@ -5575,6 +5962,7 @@
2024.naacl-long.417
wu-2024-uicoder
10.18653/v1/2024.naacl-long.417
+
Measuring Cross-lingual Transfer in Bytes
@@ -5587,6 +5975,7 @@
2024.naacl-long.418
de-souza-etal-2024-measuring
10.18653/v1/2024.naacl-long.418
+
MisgenderMender: A Community-Informed Approach to Interventions for Misgendering
@@ -5598,6 +5987,7 @@
2024.naacl-long.419
hossain-etal-2024-misgendermender
10.18653/v1/2024.naacl-long.419
+
Interplay of Machine Translation, Diacritics, and Diacritization
@@ -5609,6 +5999,7 @@
2024.naacl-long.420
chen-etal-2024-interplay
10.18653/v1/2024.naacl-long.420
+
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
@@ -5626,6 +6017,7 @@
2024.naacl-long.421
li-etal-2024-quantity
10.18653/v1/2024.naacl-long.421
+
Safer-Instruct: Aligning Language Models with Automated Preference Data
@@ -5637,6 +6029,7 @@
2024.naacl-long.422
shi-etal-2024-safer
10.18653/v1/2024.naacl-long.422
+
PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization
@@ -5648,6 +6041,7 @@
2024.naacl-long.423
peper-etal-2024-pelms
10.18653/v1/2024.naacl-long.423
+
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?
@@ -5662,6 +6056,7 @@
2024.naacl-long.424
li-etal-2024-deceptive
10.18653/v1/2024.naacl-long.424
+
IndiSentiment140: Sentiment Analysis Dataset for Indian Languages with Emphasis on Low-Resource Languages using Machine Translation
@@ -5673,6 +6068,7 @@
2024.naacl-long.425
kumar-etal-2024-indisentiment140
10.18653/v1/2024.naacl-long.425
+
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
@@ -5687,6 +6083,7 @@
2024.naacl-long.426
thakur-etal-2024-leveraging
10.18653/v1/2024.naacl-long.426
+
SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities
@@ -5699,6 +6096,7 @@
2024.naacl-long.427
ok-etal-2024-scanner
10.18653/v1/2024.naacl-long.427
+
A Theory Guided Scaffolding Instruction Framework for LLM-Enabled Metaphor Reasoning
@@ -5710,6 +6108,7 @@
2024.naacl-long.428
tian-etal-2024-theory
10.18653/v1/2024.naacl-long.428
+
Learning to Compress Prompt in Natural Language Formats
@@ -5724,6 +6123,7 @@
2024.naacl-long.429
chuang-etal-2024-learning
10.18653/v1/2024.naacl-long.429
+
Automatic, Meta and Human Evaluation for Multimodal Summarization with Multimodal Output
@@ -5752,6 +6152,7 @@
2024.naacl-long.431
su-etal-2024-naive
10.18653/v1/2024.naacl-long.431
+
Leitner-Guided Memory Replay for Cross-lingual Continual Learning
@@ -5762,6 +6163,7 @@
2024.naacl-long.432
mhamdi-may-2024-leitner
10.18653/v1/2024.naacl-long.432
+
Multilingual Nonce Dependency Treebanks: Understanding how Language Models Represent and Process Syntactic Structure
@@ -5774,6 +6176,7 @@
2024.naacl-long.433
arps-etal-2024-multilingual
10.18653/v1/2024.naacl-long.433
+
Actively Learn from LLMs with Uncertainty Propagation for Generalized Category Discovery
@@ -5787,6 +6190,7 @@
2024.naacl-long.434
liang-etal-2024-actively
10.18653/v1/2024.naacl-long.434
+
Explaining Text Similarity in Transformer Models
@@ -5797,6 +6201,7 @@
2024.naacl-long.435
vasileiou-eberle-2024-explaining
10.18653/v1/2024.naacl-long.435
+
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning
@@ -5810,6 +6215,7 @@
2024.naacl-long.436
wang-etal-2024-large-language
10.18653/v1/2024.naacl-long.436
+
HIL: Hybrid Isotropy Learning for Zero-shot Performance in Dense retrieval
@@ -5821,6 +6227,7 @@
2024.naacl-long.437
kim-etal-2024-hil
10.18653/v1/2024.naacl-long.437
+
SuperGLEBer: German Language Understanding Evaluation Benchmark
@@ -5831,6 +6238,7 @@
2024.naacl-long.438
pfister-hotho-2024-supergleber
10.18653/v1/2024.naacl-long.438
+
“You are an expert annotator”: Automatic Best–Worst-Scaling Annotations for Emotion Intensity Modeling
@@ -5843,6 +6251,7 @@
2024.naacl-long.439
bagdon-etal-2024-expert
10.18653/v1/2024.naacl-long.439
+
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
@@ -5860,6 +6269,7 @@
2024.naacl-long.440
zeng-etal-2024-matters
10.18653/v1/2024.naacl-long.440
+
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
@@ -5871,6 +6281,7 @@
2024.naacl-long.441
ruan-etal-2024-defining
10.18653/v1/2024.naacl-long.441
+
MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus
@@ -5886,6 +6297,7 @@
2024.naacl-long.442
conia-etal-2024-mosaico
10.18653/v1/2024.naacl-long.442
+
SemRoDe: Macro Adversarial Training to Learn Representations that are Robust to Word-Level Attacks
@@ -5899,6 +6311,7 @@
2024.naacl-long.443
formento-etal-2024-semrode
10.18653/v1/2024.naacl-long.443
+
BUST: Benchmark for the evaluation of detectors of LLM-Generated Text
@@ -5912,6 +6325,7 @@
2024.naacl-long.444
cornelius-etal-2024-bust
10.18653/v1/2024.naacl-long.444
+
Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment
@@ -5924,6 +6338,7 @@
2024.naacl-long.445
li-etal-2024-improving-context
10.18653/v1/2024.naacl-long.445
+
MaCSC: Towards Multimodal-augmented Pre-trained Language Models via Conceptual Prototypes and Self-balancing Calibration
@@ -5938,6 +6353,7 @@
2024.naacl-long.446
zhuang-etal-2024-macsc
10.18653/v1/2024.naacl-long.446
+
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
@@ -5950,6 +6366,7 @@
2024.naacl-long.447
sakai-etal-2024-pre
10.18653/v1/2024.naacl-long.447
+
Discovering Lobby-Parliamentarian Alignments through NLP
@@ -5964,6 +6381,7 @@
2024.naacl-long.448
suresh-etal-2024-discovering
10.18653/v1/2024.naacl-long.448
+
IterCQR: Iterative Conversational Query Reformulation with Retrieval Guidance
@@ -5977,6 +6395,7 @@
2024.naacl-long.449
jang-etal-2024-itercqr
10.18653/v1/2024.naacl-long.449
+
AceGPT, Localizing Large Language Models in Arabic
@@ -6004,6 +6423,7 @@
2024.naacl-long.450
huang-etal-2024-acegpt
10.18653/v1/2024.naacl-long.450
+
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
@@ -6019,6 +6439,7 @@
2024.naacl-long.451
he-etal-2024-improving
10.18653/v1/2024.naacl-long.451
+
Depression Detection in Clinical Interviews with LLM-Empowered Structural Element Graph
@@ -6033,6 +6454,7 @@
2024.naacl-long.452
chen-etal-2024-depression
10.18653/v1/2024.naacl-long.452
+
SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU
@@ -6056,6 +6478,7 @@
2024.naacl-long.454
khosravani-etal-2024-enhancing
10.18653/v1/2024.naacl-long.454
+
ARM: Alignment with Residual Energy-Based Model
@@ -6067,6 +6490,7 @@
2024.naacl-long.455
pang-etal-2024-arm
10.18653/v1/2024.naacl-long.455
+
HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants
@@ -6078,6 +6502,7 @@
2024.naacl-long.456
gritta-etal-2024-humanrankeval
10.18653/v1/2024.naacl-long.456
+
FAMuS: Frames Across Multiple Sources
@@ -6091,6 +6516,7 @@
2024.naacl-long.457
vashishtha-etal-2024-famus
10.18653/v1/2024.naacl-long.457
+
Rationale-based Opinion Summarization
@@ -6101,6 +6527,7 @@
2024.naacl-long.458
li-chaturvedi-2024-rationale
10.18653/v1/2024.naacl-long.458
+
Mustango: Toward Controllable Text-to-Music Generation
@@ -6115,6 +6542,7 @@
2024.naacl-long.459
melechovsky-etal-2024-mustango
10.18653/v1/2024.naacl-long.459
+
Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations
@@ -6127,6 +6555,7 @@
2024.naacl-long.460
cueva-etal-2024-adaptive
10.18653/v1/2024.naacl-long.460
+
CNER: Concept and Named Entity Recognition
@@ -6140,6 +6569,7 @@
2024.naacl-long.461
martinelli-etal-2024-cner
10.18653/v1/2024.naacl-long.461
+
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
@@ -6210,6 +6640,7 @@
2024.naacl-long.466
eisape-etal-2024-systematic
10.18653/v1/2024.naacl-long.466
+
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
@@ -6223,6 +6654,7 @@
Revised the related work in sec. 2.3.
10.18653/v1/2024.naacl-long.467
+
ICLE++: Modeling Fine-Grained Traits for Holistic Essay Scoring
@@ -6233,6 +6665,7 @@
2024.naacl-long.468
li-ng-2024-icle
10.18653/v1/2024.naacl-long.468
+
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
@@ -6260,6 +6693,7 @@
2024.naacl-long.470
hazra-majumder-2024-tell
10.18653/v1/2024.naacl-long.470
+
Multilingual Models for ASR in Chibchan Languages
@@ -6272,6 +6706,7 @@
2024.naacl-long.471
coto-solano-etal-2024-multilingual
10.18653/v1/2024.naacl-long.471
+
LegalDiscourse: Interpreting When Laws Apply and To Whom
@@ -6301,6 +6736,7 @@
2024.naacl-long.473
liu-etal-2024-x
10.18653/v1/2024.naacl-long.473
+
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
@@ -6316,6 +6752,7 @@
2024.naacl-long.474
sheng-etal-2024-reference
10.18653/v1/2024.naacl-long.474
+
Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning
@@ -6328,6 +6765,7 @@
2024.naacl-long.475
su-etal-2024-semi
10.18653/v1/2024.naacl-long.475
+
Evaluating the Deductive Competence of Large Language Models
@@ -6338,6 +6776,7 @@
2024.naacl-long.476
seals-shalin-2024-evaluating
10.18653/v1/2024.naacl-long.476
+
Large Human Language Models: A Need and the Challenges
@@ -6350,6 +6789,7 @@
2024.naacl-long.477
soni-etal-2024-large
10.18653/v1/2024.naacl-long.477
+
On Learning to Summarize with Large Language Models as References
@@ -6366,6 +6806,7 @@
2024.naacl-long.478
liu-etal-2024-learning
10.18653/v1/2024.naacl-long.478
+
Hallucination Diversity-Aware Active Learning for Text Summarization
@@ -6382,6 +6823,7 @@
2024.naacl-long.479
xia-etal-2024-hallucination
10.18653/v1/2024.naacl-long.479
+
Keep it Private: Unsupervised Privatization of Online Text
@@ -6392,6 +6834,7 @@
2024.naacl-long.480
bao-carpuat-2024-keep
10.18653/v1/2024.naacl-long.480
+
Tied-LoRA: Enhancing parameter efficiency of LoRA with Weight Tying
@@ -6416,6 +6859,7 @@
2024.naacl-long.482
deng-etal-2024-investigating
10.18653/v1/2024.naacl-long.482
+
Pre-trained Language Models for Entity Blocking: A Reproducibility Study
@@ -6426,6 +6870,7 @@
2024.naacl-long.483
wang-zhang-2024-pre
10.18653/v1/2024.naacl-long.483
+
RE^2: Region-Aware Relation Extraction from Visually Rich Documents
@@ -6439,6 +6884,7 @@
2024.naacl-long.484
ramu-etal-2024-re2
10.18653/v1/2024.naacl-long.484
+
Mix-Initiative Response Generation with Dynamic Prefix Tuning
@@ -6451,6 +6897,7 @@
2024.naacl-long.485
nie-etal-2024-mix
10.18653/v1/2024.naacl-long.485
+
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Value
@@ -6464,6 +6911,7 @@
2024.naacl-long.486
yao-etal-2024-value
10.18653/v1/2024.naacl-long.486
+
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
@@ -6479,6 +6927,7 @@
2024.naacl-long.487
sahoo-etal-2024-indibias
10.18653/v1/2024.naacl-long.487
+
@@ -6508,6 +6957,7 @@
2024.naacl-short.1
chhabra-etal-2024-revisiting
10.18653/v1/2024.naacl-short.1
+
Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
@@ -6534,6 +6984,7 @@
2024.naacl-short.3
zhang-etal-2024-improving-toponym
10.18653/v1/2024.naacl-short.3
+
Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
@@ -6545,6 +6996,7 @@
2024.naacl-short.4
fan-etal-2024-advancing
10.18653/v1/2024.naacl-short.4
+
Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers
@@ -6557,6 +7009,7 @@
2024.naacl-short.5
xie-etal-2024-extracting
10.18653/v1/2024.naacl-short.5
+
Clear Up Confusion: Advancing Cross-Domain Few-Shot Relation Extraction through Relation-Aware Prompt Learning
@@ -6574,6 +7027,7 @@
2024.naacl-short.6
bai-etal-2024-clear
10.18653/v1/2024.naacl-short.6
+
Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction
@@ -6590,6 +7044,7 @@
2024.naacl-short.7
li-etal-2024-fusion
10.18653/v1/2024.naacl-short.7
+
Personalized Review Recommendation based on Implicit dimension mining
@@ -6600,6 +7055,7 @@
2024.naacl-short.8
xu-xu-2024-personalized
10.18653/v1/2024.naacl-short.8
+
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence
@@ -6612,6 +7068,7 @@
2024.naacl-short.9
liu-etal-2024-unlocking
10.18653/v1/2024.naacl-short.9
+
Returning to the Start: Generating Narratives with Related Endpoints
@@ -6623,6 +7080,7 @@
2024.naacl-short.10
brei-etal-2024-returning
10.18653/v1/2024.naacl-short.10
+
Unified Examination of Entity Linking in Absence of Candidate Sets
@@ -6634,6 +7092,7 @@
2024.naacl-short.11
ong-etal-2024-unified
10.18653/v1/2024.naacl-short.11
+
MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages
@@ -6645,6 +7104,7 @@
2024.naacl-short.12
dementieva-etal-2024-multiparadetox
10.18653/v1/2024.naacl-short.12
+
SKICSE: Sentence Knowable Information Prompted by LLMs Improves Contrastive Sentence Embeddings
@@ -6655,6 +7115,7 @@
2024.naacl-short.13
ou-xu-2024-skicse
10.18653/v1/2024.naacl-short.13
+
A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models
@@ -6667,6 +7128,7 @@
2024.naacl-short.14
jones-etal-2024-multi
10.18653/v1/2024.naacl-short.14
+
How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes
@@ -6679,6 +7141,7 @@
2024.naacl-short.15
bhasin-etal-2024-multi
10.18653/v1/2024.naacl-short.15
+
CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders.
@@ -6690,6 +7153,7 @@
2024.naacl-short.16
zhang-etal-2024-celi
10.18653/v1/2024.naacl-short.16
+
ContrastiveMix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval
@@ -6701,6 +7165,7 @@
2024.naacl-short.17
do-etal-2024-contrastivemix
10.18653/v1/2024.naacl-short.17
+
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
@@ -6724,6 +7189,7 @@
2024.naacl-short.19
zou-etal-2024-separately
10.18653/v1/2024.naacl-short.19
+
Unveiling Divergent Inductive Biases of LLMs on Temporal Data
@@ -6734,6 +7200,7 @@
2024.naacl-short.20
kishore-he-2024-unveiling
10.18653/v1/2024.naacl-short.20
+
On Retrieval Augmentation and the Limitations of Language Model Training
@@ -6761,6 +7228,7 @@
2024.naacl-short.22
zhou-etal-2024-gendecider
10.18653/v1/2024.naacl-short.22
+
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
@@ -6790,6 +7258,7 @@
2024.naacl-short.24
dorbala-etal-2024-llms
10.18653/v1/2024.naacl-short.24
+
On the Role of Summary Content Units in Text Summarization Evaluation
@@ -6814,6 +7283,7 @@
2024.naacl-short.25
nawrath-etal-2024-role
10.18653/v1/2024.naacl-short.25
+
More room for language: Investigating the effect of retrieval on language models
@@ -6825,6 +7295,7 @@
2024.naacl-short.26
samuel-etal-2024-room
10.18653/v1/2024.naacl-short.26
+
Discourse-Aware In-Context Learning for Temporal Expression Normalization
@@ -6836,6 +7307,7 @@
2024.naacl-short.27
gautam-etal-2024-discourse
10.18653/v1/2024.naacl-short.27
+
Contextualizing Argument Quality Assessment with Relevant Knowledge
@@ -6848,6 +7320,7 @@
2024.naacl-short.28
deshpande-etal-2024-contextualizing
10.18653/v1/2024.naacl-short.28
+
Selective Perception: Learning Concise State Descriptions for Language Model Actors
@@ -6863,6 +7336,7 @@
2024.naacl-short.29
nottingham-etal-2024-selective
10.18653/v1/2024.naacl-short.29
+
ALOHa: A New Measure for Hallucination in Captioning Models
@@ -6878,6 +7352,7 @@
2024.naacl-short.30
petryk-etal-2024-aloha
10.18653/v1/2024.naacl-short.30
+
Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels
@@ -6893,6 +7368,7 @@
2024.naacl-short.31
zhuang-etal-2024-beyond
10.18653/v1/2024.naacl-short.31
+
LLM-Driven Knowledge Injection Advances Zero-Shot and Cross-Target Stance Detection
@@ -6905,6 +7381,7 @@
2024.naacl-short.32
zhang-etal-2024-llm-driven
10.18653/v1/2024.naacl-short.32
+
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
@@ -6916,6 +7393,7 @@
2024.naacl-short.33
iskander-etal-2024-leveraging
10.18653/v1/2024.naacl-short.33
+
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
@@ -6928,6 +7406,7 @@
2024.naacl-short.34
yang-etal-2024-direct
10.18653/v1/2024.naacl-short.34
+
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning
@@ -6939,6 +7418,7 @@
2024.naacl-short.35
mekala-etal-2024-echoprompt
10.18653/v1/2024.naacl-short.35
+
LEAF: Language Learners’ English Essays and Feedback Corpus
@@ -6950,6 +7430,7 @@
2024.naacl-short.36
behzad-etal-2024-leaf
10.18653/v1/2024.naacl-short.36
+
Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical Gaps
@@ -6960,6 +7441,7 @@
2024.naacl-short.37
ebrahimi-wense-2024-zero
10.18653/v1/2024.naacl-short.37
+
On the True Distribution Approximation of Minimum Bayes-Risk Decoding
@@ -6972,6 +7454,7 @@
2024.naacl-short.38
ohashi-etal-2024-true
10.18653/v1/2024.naacl-short.38
+
Rehearsal-Free Modular and Compositional Continual Learning for Language Models
@@ -6985,6 +7468,7 @@
2024.naacl-short.39
wang-etal-2024-rehearsal
10.18653/v1/2024.naacl-short.39
+
Llama meets EU: Investigating the European political spectrum through the lens of LLMs
@@ -6995,6 +7479,7 @@
2024.naacl-short.40
chalkidis-brandl-2024-llama
10.18653/v1/2024.naacl-short.40
+
M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
@@ -7012,6 +7497,7 @@
2024.naacl-short.41
hsu-etal-2024-m3t
10.18653/v1/2024.naacl-short.41
+
Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata
@@ -7024,6 +7510,7 @@
2024.naacl-short.42
chen-etal-2024-control
10.18653/v1/2024.naacl-short.42
+
Do Vision-Language Models Understand Compound Nouns?
@@ -7037,6 +7524,7 @@
2024.naacl-short.43
kumar-etal-2024-vision
10.18653/v1/2024.naacl-short.43
+
Is Prompt Transfer Always Effective? An Empirical Study of Prompt Transfer for Question Answering
@@ -7049,6 +7537,7 @@
2024.naacl-short.44
jung-etal-2024-prompt
10.18653/v1/2024.naacl-short.44
+
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers
@@ -7061,6 +7550,7 @@
2024.naacl-short.45
pantazopoulos-etal-2024-lost
10.18653/v1/2024.naacl-short.45
+
Do Multilingual Language Models Think Better in English?
@@ -7074,6 +7564,7 @@
2024.naacl-short.46
etxaniz-etal-2024-multilingual
10.18653/v1/2024.naacl-short.46
+
A Continued Pretrained LLM Approach for Automatic Medical Note Generation
@@ -7090,6 +7581,7 @@
2024.naacl-short.47
yuan-etal-2024-continued
10.18653/v1/2024.naacl-short.47
+
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
@@ -7104,6 +7596,7 @@
2024.naacl-short.48
saxon-etal-2024-lost
10.18653/v1/2024.naacl-short.48
+
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models
@@ -7117,6 +7610,7 @@
2024.naacl-short.49
xie-etal-2024-self
10.18653/v1/2024.naacl-short.49
+
Lifelong Event Detection with Embedding Space Separation and Compaction
@@ -7130,6 +7624,7 @@
2024.naacl-short.50
qin-etal-2024-lifelong
10.18653/v1/2024.naacl-short.50
+
Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion
@@ -7141,6 +7636,7 @@
2024.naacl-short.51
singh-etal-2024-language
10.18653/v1/2024.naacl-short.51
+
CPopQA: Ranking Cultural Concept Popularity by LLMs
@@ -7151,6 +7647,7 @@
2024.naacl-short.52
jiang-joshi-2024-cpopqa
10.18653/v1/2024.naacl-short.52
+
The Impact of Language on Arithmetic Proficiency: A Multilingual Investigation with Cross-Agent Checking Computation
@@ -7163,6 +7660,7 @@
2024.naacl-short.53
chen-etal-2024-impact
10.18653/v1/2024.naacl-short.53
+
Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning
@@ -7174,6 +7672,7 @@
2024.naacl-short.54
borchert-etal-2024-efficient
10.18653/v1/2024.naacl-short.54
+
A diverse Multilingual News Headlines Dataset from around the World
@@ -7184,6 +7683,7 @@
2024.naacl-short.55
leeb-scholkopf-2024-diverse
10.18653/v1/2024.naacl-short.55
+
The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation
@@ -7194,6 +7694,7 @@
2024.naacl-short.56
tokarchuk-niculae-2024-unreasonable
10.18653/v1/2024.naacl-short.56
+
Efficient Sample-Specific Encoder Perturbations
@@ -7204,6 +7705,7 @@
2024.naacl-short.57
fathullah-gales-2024-efficient
10.18653/v1/2024.naacl-short.57
+
Diverse Perspectives, Divergent Models: Cross-Cultural Evaluation of Depression Detection on Twitter
@@ -7218,6 +7720,7 @@
10.18653/v1/2024.naacl-short.58
Corrects two typos.
+
Removing RLHF Protections in GPT-4 via Fine-Tuning
@@ -7232,6 +7735,7 @@
2024.naacl-short.59
zhan-etal-2024-removing
10.18653/v1/2024.naacl-short.59
+
LifeTox: Unveiling Implicit Toxicity in Life Advice
@@ -7246,6 +7750,7 @@
2024.naacl-short.60
kim-etal-2024-lifetox
10.18653/v1/2024.naacl-short.60
+
Arithmetic Reasoning with LLM: Prolog Generation & Permutation
@@ -7257,6 +7762,7 @@
2024.naacl-short.61
yang-etal-2024-arithmetic
10.18653/v1/2024.naacl-short.61
+
Verifying Claims About Metaphors with Large-Scale Automatic Metaphor Identification
@@ -7268,6 +7774,7 @@
2024.naacl-short.62
aono-etal-2024-verifying
10.18653/v1/2024.naacl-short.62
+
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis
@@ -7282,6 +7789,7 @@
2024.naacl-short.63
scaria-etal-2024-instructabsa
10.18653/v1/2024.naacl-short.63
+
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
@@ -7297,6 +7805,7 @@
2024.naacl-short.64
zemlyanskiy-etal-2024-memory
10.18653/v1/2024.naacl-short.64
+
Unveiling the Magic: Investigating Attention Distillation in Retrieval-Augmented Generation
@@ -7308,6 +7817,7 @@
2024.naacl-short.65
li-etal-2024-unveiling
10.18653/v1/2024.naacl-short.65
+
Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training
@@ -7320,6 +7830,7 @@
2024.naacl-short.66
elhady-etal-2024-improving
10.18653/v1/2024.naacl-short.66
+
MuLan: A Study of Fact Mutability in Language Models
@@ -7333,6 +7844,7 @@
2024.naacl-short.67
fierro-etal-2024-mulan
10.18653/v1/2024.naacl-short.67
+
Language-Independent Representations Improve Zero-Shot Summarization
@@ -7344,6 +7856,7 @@
2024.naacl-short.68
solovyev-etal-2024-language
10.18653/v1/2024.naacl-short.68
+
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
@@ -7371,6 +7884,7 @@
2024.naacl-short.70
clarke-etal-2024-guylingo
10.18653/v1/2024.naacl-short.70
+
DoubleLingo: Causal Estimation with Large Language Models
@@ -7381,6 +7895,7 @@
2024.naacl-short.71
veljanovski-wood-doughty-2024-doublelingo
10.18653/v1/2024.naacl-short.71
+
Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
@@ -7405,6 +7920,7 @@
2024.naacl-short.72
mitsios-etal-2024-improved
10.18653/v1/2024.naacl-short.72
+
On Narrative Question Answering Skills
@@ -7415,6 +7931,7 @@
2024.naacl-short.73
kalbaliyev-sirts-2024-narrative
10.18653/v1/2024.naacl-short.73
+
Order-Based Pre-training Strategies for Procedural Text Understanding
@@ -7427,6 +7944,7 @@
2024.naacl-short.74
nandy-etal-2024-order
10.18653/v1/2024.naacl-short.74
+
Breaking the Language Barrier: Can Direct Inference Outperform Pre-Translation in Multilingual LLM Applications?
@@ -7443,6 +7961,7 @@
2024.naacl-short.75
intrator-etal-2024-breaking
10.18653/v1/2024.naacl-short.75
+
@@ -7494,6 +8013,7 @@
2024.naacl-demo.2
cai-etal-2024-low-code
10.18653/v1/2024.naacl-demo.2
+
EdTec-QBuilder: A Semantic Retrieval Tool for Assembling Vocational Training Exams in German Language
@@ -7508,6 +8028,7 @@
2024.naacl-demo.3
palomino-etal-2024-edtec
10.18653/v1/2024.naacl-demo.3
+
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models
@@ -7521,6 +8042,7 @@
2024.naacl-demo.4
hu-etal-2024-dialight
10.18653/v1/2024.naacl-demo.4
+
RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
@@ -7533,6 +8055,7 @@
2024.naacl-demo.5
cho-etal-2024-rtsum
10.18653/v1/2024.naacl-demo.5
+
Edu-ConvoKit: An Open-Source Library for Education Conversation Data
@@ -7543,6 +8066,7 @@
2024.naacl-demo.6
wang-demszky-2024-edu
10.18653/v1/2024.naacl-demo.6
+
jp-evalb: Robust Alignment-based PARSEVAL Measures
@@ -7566,6 +8090,7 @@
2024.naacl-demo.8
haller-etal-2024-opiniongpt
10.18653/v1/2024.naacl-demo.8
+
ATLAS: A System for PDF-centric Human Interaction Data Collection
@@ -7580,6 +8105,7 @@
2024.naacl-demo.9
siu-etal-2024-atlas
10.18653/v1/2024.naacl-demo.9
+
BeLeaf: Belief Prediction as Tree Generation
@@ -7590,6 +8116,7 @@
2024.naacl-demo.10
murzaku-rambow-2024-beleaf
10.18653/v1/2024.naacl-demo.10
+
QueryExplorer: An Interactive Query Generation Assistant for Search and Exploration
@@ -7602,6 +8129,7 @@
2024.naacl-demo.11
dhole-etal-2024-queryexplorer
10.18653/v1/2024.naacl-demo.11
+
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
@@ -7617,6 +8145,7 @@
2024.naacl-demo.12
diao-etal-2024-lmflow
10.18653/v1/2024.naacl-demo.12
+
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering
@@ -7629,6 +8158,7 @@
2024.naacl-demo.13
nguyen-etal-2024-docmaster
10.18653/v1/2024.naacl-demo.13
+
RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
@@ -7645,6 +8175,7 @@
2024.naacl-demo.14
tan-etal-2024-redcoast
10.18653/v1/2024.naacl-demo.14
+
Concept Over Time Analysis: Unveiling Temporal Patterns for Qualitative Data Analysis
@@ -7659,6 +8190,7 @@
2024.naacl-demo.15
fischer-etal-2024-concept
10.18653/v1/2024.naacl-demo.15
+
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
@@ -7686,6 +8218,7 @@
2024.naacl-demo.17
saxena-etal-2024-newspaper
10.18653/v1/2024.naacl-demo.17
+
FastFit: Fast and Effective Few-Shot Text Classification with a Multitude of Classes
@@ -7696,6 +8229,7 @@
2024.naacl-demo.18
yehudai-bandel-2024-fastfit
10.18653/v1/2024.naacl-demo.18
+
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
@@ -7711,6 +8245,7 @@
2024.naacl-demo.19
gioacchini-etal-2024-agentquest
10.18653/v1/2024.naacl-demo.19
+
ZhuJiu-Knowledge: A Fairer Platform for Evaluating Multiple Knowledge Types in Large Language Models
@@ -7726,6 +8261,7 @@
2024.naacl-demo.20
du-etal-2024-zhujiu
10.18653/v1/2024.naacl-demo.20
+
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
@@ -7746,6 +8282,7 @@
2024.naacl-demo.21
bandel-etal-2024-unitxt
10.18653/v1/2024.naacl-demo.21
+
@@ -7780,6 +8317,7 @@
2024.naacl-srw.1
huang-etal-2024-systematic
10.18653/v1/2024.naacl-srw.1
+
Rephrasing Invokes Better Generations for Large Language Models
@@ -7834,6 +8372,7 @@
2024.naacl-srw.6
zhu-2024-fast
10.18653/v1/2024.naacl-srw.6
+
Start Simple: Progressive Difficulty Multitask Learning
@@ -7846,6 +8385,7 @@
2024.naacl-srw.7
luo-etal-2024-start
10.18653/v1/2024.naacl-srw.7
+
LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues
@@ -7862,6 +8402,7 @@
2024.naacl-srw.8
stacey-etal-2024-lucid
10.18653/v1/2024.naacl-srw.8
+
Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
@@ -7874,6 +8415,7 @@
2024.naacl-srw.9
bahad-etal-2024-fine
10.18653/v1/2024.naacl-srw.9
+
Knowledge-centered conversational agents with a drive to learn
@@ -7999,6 +8541,7 @@
2024.naacl-srw.20
rahaman-ive-2024-source
10.18653/v1/2024.naacl-srw.20
+
Distilling Text Style Transfer With Self-Explanation From LLMs
@@ -8013,6 +8556,7 @@
2024.naacl-srw.21
zhang-etal-2024-distilling
10.18653/v1/2024.naacl-srw.21
+
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
@@ -8025,6 +8569,7 @@
2024.naacl-srw.22
wang-etal-2024-reinforcement
10.18653/v1/2024.naacl-srw.22
+
Evaluation Dataset for Japanese Medical Text Simplification
@@ -8063,6 +8608,7 @@
2024.naacl-srw.25
toossi-etal-2024-reproducibility
10.18653/v1/2024.naacl-srw.25
+
Coding Open-Ended Responses using Pseudo Response Generation by Large Language Models
@@ -8076,6 +8622,7 @@
2024.naacl-srw.26
zenimoto-etal-2024-coding
10.18653/v1/2024.naacl-srw.26
+
Cross-Task Generalization Abilities of Large Language Models
@@ -8097,6 +8644,7 @@
10.18653/v1/2024.naacl-srw.28
Corrected a Grant Number in the Acknowledgments section.
+
Facilitating Opinion Diversity through Hybrid NLP Approaches
@@ -8251,6 +8799,7 @@
2024.naacl-industry.1
ma-etal-2024-hpipe
10.18653/v1/2024.naacl-industry.1
+
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
@@ -8262,6 +8811,7 @@
2024.naacl-industry.2
ou-etal-2024-lossless
10.18653/v1/2024.naacl-industry.2
+
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
@@ -8288,6 +8838,7 @@
2024.naacl-industry.3
kim-etal-2024-solar
10.18653/v1/2024.naacl-industry.3
+
UINav: A Practical Approach to Train On-Device Automation Agents
@@ -8302,6 +8853,7 @@
2024.naacl-industry.4
li-etal-2024-uinav
10.18653/v1/2024.naacl-industry.4
+
Efficiently Distilling LLMs for Edge Applications
@@ -8342,6 +8894,7 @@
2024.naacl-industry.7
tang-etal-2024-multiple
10.18653/v1/2024.naacl-industry.7
+
An NLP-Focused Pilot Training Agent for Safe and Efficient Aviation Communication
@@ -8353,6 +8906,7 @@
2024.naacl-industry.8
liu-etal-2024-nlp
10.18653/v1/2024.naacl-industry.8
+
Visual Grounding for User Interfaces
@@ -8377,6 +8931,7 @@
2024.naacl-industry.10
buchner-etal-2024-prompt
10.18653/v1/2024.naacl-industry.10
+
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking
@@ -8409,6 +8964,7 @@
2024.naacl-industry.12
xu-etal-2024-conformer
10.18653/v1/2024.naacl-industry.12
+
Generating Signed Language Instructions in Large-Scale Dialogue Systems
@@ -8450,6 +9006,7 @@
2024.naacl-industry.15
he-etal-2024-annollm
10.18653/v1/2024.naacl-industry.15
+
An Automatic Prompt Generation System for Tabular Data Tasks
@@ -8488,6 +9045,7 @@
2024.naacl-industry.18
hu-etal-2024-language
10.18653/v1/2024.naacl-industry.18
+
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
@@ -8498,6 +9056,7 @@
2024.naacl-industry.19
ayala-bechard-2024-reducing
10.18653/v1/2024.naacl-industry.19
+
Towards Translating Objective Product Attributes Into Customer Language
@@ -8510,6 +9069,7 @@
2024.naacl-industry.20
yazdi-etal-2024-towards
10.18653/v1/2024.naacl-industry.20
+
Automating the Generation of a Functional Semantic Types Ontology with Foundational Models
@@ -8533,6 +9093,7 @@
2024.naacl-industry.22
mukku-etal-2024-leveraging
10.18653/v1/2024.naacl-industry.22
+
Optimizing LLM Based Retrieval Augmented Generation Pipelines in the Financial Domain
@@ -8548,6 +9109,7 @@
2024.naacl-industry.23
zhao-etal-2024-optimizing
10.18653/v1/2024.naacl-industry.23
+
Scaling Up Authorship Attribution
@@ -8563,6 +9125,7 @@
2024.naacl-industry.24
striebel-etal-2024-scaling
10.18653/v1/2024.naacl-industry.24
+
Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models
@@ -8575,6 +9138,7 @@
2024.naacl-industry.25
miah-etal-2024-multimodal
10.18653/v1/2024.naacl-industry.25
+
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR
@@ -8595,6 +9159,7 @@
2024.naacl-industry.26
wu-etal-2024-deferred
10.18653/v1/2024.naacl-industry.26
+
Less is More for Improving Automatic Evaluation of Factual Consistency
@@ -8606,6 +9171,7 @@
2024.naacl-industry.27
wang-etal-2024-less
10.18653/v1/2024.naacl-industry.27
+
DriftWatch: A Tool that Automatically Detects Data Drift and Extracts Representative Examples Affected by Drift
@@ -8632,6 +9198,7 @@
2024.naacl-industry.29
marani-etal-2024-graph
10.18653/v1/2024.naacl-industry.29
+
Leveraging LLMs for Dialogue Quality Measurement
@@ -8658,6 +9225,7 @@
2024.naacl-industry.31
mora-cross-calderon-ramirez-2024-uncertainty
10.18653/v1/2024.naacl-industry.31
+
AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction
@@ -8674,6 +9242,7 @@
2024.naacl-industry.32
wang-etal-2024-ama
10.18653/v1/2024.naacl-industry.32
+
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
@@ -8687,6 +9256,7 @@
2024.naacl-industry.33
fu-etal-2024-tiny
10.18653/v1/2024.naacl-industry.33
+
Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
@@ -8716,6 +9286,7 @@
2024.naacl-industry.35
lee-etal-2024-tree
10.18653/v1/2024.naacl-industry.35
+
LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems
@@ -8730,6 +9301,7 @@
2024.naacl-industry.36
mok-etal-2024-llm
10.18653/v1/2024.naacl-industry.36
+
Large Language Models Encode the Practice of Medicine
@@ -8740,6 +9312,7 @@
2024.naacl-industry.37
kanchinadam-shaheen-2024-large
10.18653/v1/2024.naacl-industry.37
+
Leveraging Interesting Facts to Enhance User Engagement with Conversational Interfaces
@@ -8766,6 +9339,7 @@
2024.naacl-industry.39
nakayama-etal-2024-search
10.18653/v1/2024.naacl-industry.39
+
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
@@ -8782,6 +9356,7 @@
2024.naacl-industry.40
zou-etal-2024-eiven
10.18653/v1/2024.naacl-industry.40
+
Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data
@@ -8801,6 +9376,7 @@
2024.naacl-industry.41
min-etal-2024-exploring
10.18653/v1/2024.naacl-industry.41
+
Solving General Natural-Language-Description Optimization Problems with Large Language Models
@@ -8816,6 +9392,7 @@
2024.naacl-industry.42
zhang-etal-2024-solving
10.18653/v1/2024.naacl-industry.42
+
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification
@@ -8833,6 +9410,65 @@
+
+ 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics
+ Mexico City, Mexico
+ June 16–21, 2024
+
+
+ https://2024.naacl.org
+
+
+ Keynote 1: Harnessing the Power of LLMs to Vitalize Indigenous Languages
+ ClaudioPinhanez
+ 2024.naacl.keynote1.mp4
+
+
+ Keynote 2: Distributional Semantics: What do large language models have to say?
+ SeanaCoulson
+ 2024.naacl.keynote2.mp4
+
+
+ Opening Session
+ KatrinErk
+ KevinDuh
+ StevenBethard
+ HelenaGomez
+ 2024.naacl.opening-session.mp4
+
+
+ Business Meeting
+ GrahamNeubig
+ LucianaBenotti
+ JonMay
+ KevinDuh
+ JessyLi
+ 2024.naacl.business-meeting.mp4
+
+
+ Panel: LLMs and their Impact on Education
+ KarenMatías
+ SwapnaSomasundaran
+ VictoriaYaneva
+ EkaterinaKochmar
+ 2024.naacl.panel.mp4
+
+
+ Best Papers Awards Session
+ StevenBethard
+ 2024.naacl.best-papers.mp4
+
+
+ Closing Session
+ KatrinErk
+ HelenaGomez
+ VivekSrikumar
+ ShrutiRijhwani
+ OwenRambow
+ RobertoNavigli
+ ColinCherry
+ 2024.naacl.closing-session.mp4
+
2024.findings-naacl
2024.americasnlp-1
diff --git a/data/xml/2024.semeval.xml b/data/xml/2024.semeval.xml
index beb7990232..a8c6d8d47a 100644
--- a/data/xml/2024.semeval.xml
+++ b/data/xml/2024.semeval.xml
@@ -236,6 +236,7 @@
2024.semeval-1.17.SupplementaryMaterial.txt
sarvazyan-etal-2024-genaios
10.18653/v1/2024.semeval-1.17
+
Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn More With Less
@@ -432,6 +433,7 @@
2024.semeval-1.32.SupplementaryMaterial.txt
saravanan-wilson-2024-ounlp
10.18653/v1/2024.semeval-1.32
+
NLP-LISAC at SemEval-2024 Task 1: Transformer-based approaches for Determining Semantic Textual Relatedness
@@ -507,6 +509,7 @@
2024.semeval-1.38.SupplementaryMaterial.txt
ebrahim-joy-2024-warwicknlp
10.18653/v1/2024.semeval-1.38
+
NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT
@@ -683,6 +686,7 @@
2024.semeval-1.52.SupplementaryMaterial.txt
mehta-etal-2024-halu
10.18653/v1/2024.semeval-1.52
+
QFNU_CS at SemEval-2024 Task 3: A Hybrid Pre-trained Model based Approach for Multimodal Emotion-Cause Pair Extraction Task
@@ -817,6 +821,7 @@
2024.semeval-1.63.SupplementaryMaterial.txt
marchitan-etal-2024-team
10.18653/v1/2024.semeval-1.63
+
LinguisTech at SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversation
@@ -894,6 +899,7 @@
2024.semeval-1.69.SupplementaryMaterial.txt
heydari-rad-etal-2024-rfbes
10.18653/v1/2024.semeval-1.69
+
BAMBAS at SemEval-2024 Task 4: How far can we get without looking at hierarchies?
@@ -910,6 +916,7 @@
2024.semeval-1.70.SupplementaryMaterial.zip
vasconcelos-etal-2024-bambas
10.18653/v1/2024.semeval-1.70
+
Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting AI-generated Text
@@ -1128,6 +1135,7 @@
2024.semeval-1.87.SupplementaryMaterial.rar
lau-wu-2024-cyut
10.18653/v1/2024.semeval-1.87
+
UniBuc at SemEval-2024 Task 2: Tailored Prompting with Solar for Clinical NLI
@@ -1154,6 +1162,7 @@
2024.semeval-1.89.SupplementaryMaterial.zip
laken-2024-fralak
10.18653/v1/2024.semeval-1.89
+
OtterlyObsessedWithSemantics at SemEval-2024 Task 4: Developing a Hierarchical Multi-Label Classification Head for Large Language Models
@@ -1170,6 +1179,7 @@
2024.semeval-1.90.SupplementaryMaterial.txt
wunderle-etal-2024-otterlyobsessedwithsemantics
10.18653/v1/2024.semeval-1.90
+
D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models
@@ -1240,6 +1250,7 @@
2024.semeval-1.95.SupplementaryMaterial.zip
creanga-dinu-2024-isds
10.18653/v1/2024.semeval-1.95
+
UMUTeam at SemEval-2024 Task 4: Multimodal Identification of Persuasive Techniques in Memes through Large Language Models
@@ -1294,6 +1305,7 @@
2024.semeval-1.99.SupplementaryMaterial.txt
verma-raithel-2024-dfki
10.18653/v1/2024.semeval-1.99
+
UMUTeam at SemEval-2024 Task 8: Combining Transformers and Syntax Features for Machine-Generated Text Detection
@@ -1774,6 +1786,7 @@
2024.semeval-1.135.SupplementaryMaterial.txt
eponon-ramos-perez-2024-pinealai
10.18653/v1/2024.semeval-1.135
+
Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers
@@ -1789,6 +1802,7 @@
2024.semeval-1.136.SupplementaryMaterial.txt
he-etal-2024-infrrd
10.18653/v1/2024.semeval-1.136
+
AlphaIntellect at SemEval-2024 Task 6: Detection of Hallucinations in Generated Text
@@ -1882,6 +1896,7 @@
2024.semeval-1.143.SupplementaryMaterial.zip
aguiar-etal-2024-seme
10.18653/v1/2024.semeval-1.143
+
MAINDZ at SemEval-2024 Task 5: CLUEDO - Choosing Legal oUtcome by Explaining Decision through Oversight
@@ -2233,6 +2248,7 @@
2024.semeval-1.170.SupplementaryMaterial.zip
ben-fares-etal-2024-fi
10.18653/v1/2024.semeval-1.170
+
Team Innovative at SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection
@@ -2258,6 +2274,7 @@
2024.semeval-1.172.SupplementaryMaterial.zip
peskine-etal-2024-eurecom
10.18653/v1/2024.semeval-1.172
+
TU Wien at SemEval-2024 Task 6: Unifying Model-Agnostic and Model-Aware Techniques for Hallucination Detection
@@ -2429,6 +2446,7 @@
2024.semeval-1.185.SupplementaryMaterial.txt
guimaraes-etal-2024-lisbon
10.18653/v1/2024.semeval-1.185
+
GIL-IIMAS UNAM at SemEval-2024 Task 1: SAND: An In Depth Analysis of Semantic Relatedness Using Regression and Similarity Characteristics
@@ -2511,6 +2529,7 @@
2024.semeval-1.191.SupplementaryMaterial.txt
obiso-etal-2024-harmonee
10.18653/v1/2024.semeval-1.191
+
VerbaNexAI Lab at SemEval-2024 Task 10: Emotion recognition and reasoning in mixed-coded conversations based on an NRC VAD approach
@@ -2526,6 +2545,7 @@
2024.semeval-1.192.SupplementaryMaterial.txt
garcia-etal-2024-verbanexai
10.18653/v1/2024.semeval-1.192
+
VerbaNexAI Lab at SemEval-2024 Task 3: Deciphering emotional causality in conversations using multimodal analysis approach
@@ -2541,6 +2561,7 @@
2024.semeval-1.193.SupplementaryMaterial.txt
pacheco-etal-2024-verbanexai
10.18653/v1/2024.semeval-1.193
+
VerbaNexAI Lab at SemEval-2024 Task 1: A Multilayer Artificial Intelligence Model for Semantic Relationship Detection
@@ -2555,6 +2576,7 @@
2024.semeval-1.194.SupplementaryMaterial.txt
morillo-etal-2024-verbanexai
10.18653/v1/2024.semeval-1.194
+
UMBCLU at SemEval-2024 Task 1: Semantic Textual Relatedness with and without machine translation
@@ -2582,6 +2604,7 @@
2024.semeval-1.196.SupplementaryMaterial.txt
raihan-etal-2024-masontigers
10.18653/v1/2024.semeval-1.196
+
MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection
@@ -2597,6 +2620,7 @@
2024.semeval-1.197.SupplementaryMaterial.txt
puspo-etal-2024-masontigers
10.18653/v1/2024.semeval-1.197
+
UIC NLP GRADS at SemEval-2024 Task 3: Two-Step Disjoint Modeling for Emotion-Cause Pair Extraction
@@ -2625,6 +2649,7 @@
2024.semeval-1.199.SupplementaryMaterial.txt
goswami-etal-2024-masontigers-semeval
10.18653/v1/2024.semeval-1.199
+
RiddleMasters at SemEval-2024 Task 9: Comparing Instruction Fine-tuning with Zero-Shot Approaches
@@ -2672,6 +2697,7 @@
2024.semeval-1.203.SupplementaryMaterial.zip
abaskohi-etal-2024-bcamirs
10.18653/v1/2024.semeval-1.203
+
Pauk at SemEval-2024 Task 4: A Neuro-Symbolic Method for Consistent Classification of Propaganda Techniques in Memes
@@ -2696,6 +2722,7 @@
2024.semeval-1.205.SupplementaryMaterial.txt
kim-etal-2024-saama
10.18653/v1/2024.semeval-1.205
+
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning
@@ -2801,6 +2828,7 @@
2024.semeval-1.213.SupplementaryMaterial.txt
chen-etal-2024-semeval
10.18653/v1/2024.semeval-1.213
+
UCSC NLP at SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversation (EDiReF)
@@ -2828,6 +2856,7 @@
2024.semeval-1.215.SupplementaryMaterial.txt
rezaei-etal-2024-clulab
10.18653/v1/2024.semeval-1.215
+
SINAI at SemEval-2024 Task 8: Fine-tuning on Words and Perplexity as Features for Detecting Machine Written Text
@@ -2840,6 +2869,7 @@
2024.semeval-1.216.SupplementaryMaterial.txt
gutierrez-megias-etal-2024-sinai
10.18653/v1/2024.semeval-1.216
+
USTC-BUPT at SemEval-2024 Task 8: Enhancing Machine-Generated Text Detection via Domain Adversarial Neural Networks and LLM Embeddings
@@ -2884,6 +2914,7 @@
2024.semeval-1.219.SupplementaryMaterial.zip
kobs-etal-2024-pollice
10.18653/v1/2024.semeval-1.219
+
whatdoyoumeme at SemEval-2024 Task 4: Hierarchical-Label-Aware Persuasion Detection using Translated Texts
@@ -2898,6 +2929,7 @@
2024.semeval-1.220.SupplementaryMaterial.txt
chatterjee-etal-2024-whatdoyoumeme
10.18653/v1/2024.semeval-1.220
+
LomonosovMSU at SemEval-2024 Task 4: Comparing LLMs and embedder models to identifying propaganda techniques in the content of memes in English for subtasks No1, No2a, and No2b
@@ -3118,6 +3150,7 @@
2024.semeval-1.235.SupplementaryMaterial.txt
krumov-etal-2024-su
10.18653/v1/2024.semeval-1.235
+
Challenges at SemEval 2024 Task 7: Contrastive Learning Approach on Numeral-Aware Language Generation
@@ -3157,6 +3190,7 @@
2024.semeval-1.238.SupplementaryMaterial.zip
shirnin-etal-2024-aipom
10.18653/v1/2024.semeval-1.238
+
CLaC at SemEval-2024 Task 2: Faithful Clinical Trial Inference
@@ -3273,6 +3307,7 @@
2024.semeval-1.246.SupplementaryMaterial.txt
singh-etal-2024-clustercore
10.18653/v1/2024.semeval-1.246
+
HierarchyEverywhere at SemEval-2024 Task 4: Detection of Persuasion Techniques in Memes Using Hierarchical Text Classifier
@@ -3385,6 +3420,7 @@
2024.semeval-1.254.SupplementaryMaterial.zip
shi-etal-2024-ualberta
10.18653/v1/2024.semeval-1.254
+
HW-TSC at SemEval-2024 Task 5: Self-Eval? A Confident LLM System for Auto Prediction and Evaluation for the Legal Argument Reasoning Task
@@ -3429,6 +3465,7 @@
2024.semeval-1.257.SupplementaryMaterial.zip
voznyuk-konovalov-2024-deeppavlov
10.18653/v1/2024.semeval-1.257
+
Bit_numeval at SemEval-2024 Task 7: Enhance Numerical Sensitivity and Reasoning Completeness for Quantitative Understanding
@@ -3442,6 +3479,7 @@
2024.semeval-1.258.SupplementaryMaterial.txt
liang-etal-2024-bit
10.18653/v1/2024.semeval-1.258
+
MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness
@@ -3493,6 +3531,7 @@
2024.semeval-1.262.SupplementaryMaterial.zip
guo-fan-2024-nlpnchu
10.18653/v1/2024.semeval-1.262
+
Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization
@@ -3599,6 +3638,7 @@
2024.semeval-1.270.SupplementaryMaterial.txt
kumar-etal-2024-semeval
10.18653/v1/2024.semeval-1.270
+
SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials
@@ -3611,6 +3651,7 @@
2024.semeval-1.271.SupplementaryMaterial.txt
jullien-etal-2024-semeval
10.18653/v1/2024.semeval-1.271
+
SemEval Task 1: Semantic Textual Relatedness for African and Asian Languages
@@ -3637,6 +3678,7 @@
2024.semeval-1.272.SupplementaryMaterial.txt
ousidhoum-etal-2024-semeval
10.18653/v1/2024.semeval-1.272
+
SemEval-2024 Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
@@ -3655,6 +3697,7 @@
2024.semeval-1.273.SupplementaryMaterial.txt
mickus-etal-2024-semeval
10.18653/v1/2024.semeval-1.273
+
SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
@@ -3667,6 +3710,7 @@
2024.semeval-1.274.SupplementaryMaterial.txt
jiang-etal-2024-semeval
10.18653/v1/2024.semeval-1.274
+
SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes
@@ -3695,6 +3739,7 @@
2024.semeval-1.276.SupplementaryMaterial.zip
held-habernal-2024-semeval
10.18653/v1/2024.semeval-1.276
+
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
@@ -3710,6 +3755,7 @@
2024.semeval-1.277.SupplementaryMaterial.zip
wang-etal-2024-semeval
10.18653/v1/2024.semeval-1.277
+
SheffieldVeraAI at SemEval-2024 Task 4: Prompting and fine-tuning a Large Vision-Language Model for Binary Classification of Persuasion Techniques in Memes
@@ -3742,6 +3788,7 @@
2024.semeval-1.279.SupplementaryMaterial.txt
wang-etal-2024-semeval-2024
10.18653/v1/2024.semeval-1.279
+
diff --git a/data/xml/2024.sigmorphon.xml b/data/xml/2024.sigmorphon.xml
index 9c983ddaa8..4cdd8e4bcf 100644
--- a/data/xml/2024.sigmorphon.xml
+++ b/data/xml/2024.sigmorphon.xml
@@ -38,6 +38,7 @@
2024.sigmorphon-1.2
matsuzaki-etal-2024-j
10.18653/v1/2024.sigmorphon-1.2
+
More than Just Statistical Recurrence: Human and Machine Unsupervised Learning of Māori Word Segmentation across Morphological Processes
@@ -59,6 +60,7 @@
2024.sigmorphon-1.4
arnett-etal-2024-different
10.18653/v1/2024.sigmorphon-1.4
+
Ye Olde French: Effect of Old and Middle French on SIGMORPHON-UniMorph Shared Task Data
@@ -81,6 +83,7 @@
2024.sigmorphon-1.6
salehi-jacobs-2024-effect
10.18653/v1/2024.sigmorphon-1.6
+
Decomposing Fusional Morphemes with Vector Embeddings
@@ -113,6 +116,7 @@
2024.sigmorphon-1.9
matogawa-etal-2024-japanese
10.18653/v1/2024.sigmorphon-1.9
+
diff --git a/data/xml/2024.starsem.xml b/data/xml/2024.starsem.xml
index 12739b2bd9..2fe71fa2d6 100644
--- a/data/xml/2024.starsem.xml
+++ b/data/xml/2024.starsem.xml
@@ -38,6 +38,7 @@
2024.starsem-1.2
fraser-etal-2024-stereotype
10.18653/v1/2024.starsem-1.2
+
Polysemy through the lens of psycholinguistic variables: a dataset and an evaluation of static and contextualized language models
@@ -49,6 +50,7 @@
2024.starsem-1.3
bruera-etal-2024-polysemy
10.18653/v1/2024.starsem-1.3
+
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
@@ -60,6 +62,7 @@
2024.starsem-1.4
sancheti-etal-2024-post
10.18653/v1/2024.starsem-1.4
+
A Benchmark Suite of Japanese Natural Questions
@@ -72,6 +75,7 @@
2024.starsem-1.5
uematsu-etal-2024-benchmark
10.18653/v1/2024.starsem-1.5
+
ROUGE-K: Do Your Summaries Have Keywords?
@@ -83,6 +87,7 @@
2024.starsem-1.6
takeshita-etal-2024-rouge
10.18653/v1/2024.starsem-1.6
+
Investigating Aspect Features in Contextualized Embeddings with Semantic Scales and Distributional Similarity
@@ -94,6 +99,7 @@
2024.starsem-1.7
li-etal-2024-investigating
10.18653/v1/2024.starsem-1.7
+
WikiScenes with Descriptions: Aligning Paragraphs and Sentences with Images in Wikipedia Articles
@@ -129,6 +135,7 @@
2024.starsem-1.10
shi-etal-2024-lexical
10.18653/v1/2024.starsem-1.10
+
Paraphrase Identification via Textual Inference
@@ -141,6 +148,7 @@
2024.starsem-1.11
shi-etal-2024-paraphrase
10.18653/v1/2024.starsem-1.11
+
Identifying Emotional and Polar Concepts via Synset Translation
@@ -156,6 +164,7 @@
2024.starsem-1.12
woudstra-etal-2024-identifying
10.18653/v1/2024.starsem-1.12
+
A Closer Look at Claim Decomposition
@@ -285,6 +294,7 @@
2024.starsem-1.23
buz-etal-2024-investigating
10.18653/v1/2024.starsem-1.23
+
Multilingual and Code-Switched Sentence Ordering
@@ -374,6 +384,7 @@
2024.starsem-1.31
fu-frank-2024-compositional
10.18653/v1/2024.starsem-1.31
+
Inspecting Soundness of AMR Similarity Metrics in terms of Equivalence and Inequivalence
diff --git a/data/xml/2024.tacl.xml b/data/xml/2024.tacl.xml
index 81d4686c70..7b85344d14 100644
--- a/data/xml/2024.tacl.xml
+++ b/data/xml/2024.tacl.xml
@@ -100,6 +100,7 @@
120–136
2024.tacl-1.7
jiang-etal-2024-addressing
+
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation
@@ -192,6 +193,7 @@
229–246
2024.tacl-1.13
he-etal-2024-exploring
+
Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering
@@ -215,6 +217,7 @@
264–282
2024.tacl-1.15
nuyts-etal-2024-explicitly
+
Evaluating the Ripple Effects of Knowledge Editing in Language Models
@@ -228,6 +231,7 @@
283–298
2024.tacl-1.16
cohen-etal-2024-evaluating
+
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
@@ -274,6 +278,7 @@
355–371
2024.tacl-1.20
luo-etal-2024-diverge
+
What Do Self-Supervised Speech Models Know About Words?
@@ -396,6 +401,7 @@
543–561
2024.tacl-1.30
strobl-etal-2024-formal
+
Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap
@@ -516,6 +522,7 @@
721–737
2024.tacl-1.40
yin-etal-2024-source-free
+
Scope Ambiguities in Large Language Models
@@ -528,6 +535,7 @@
738–754
2024.tacl-1.41
kamath-etal-2024-scope
+
Visually Grounded Speech Models Have a Mutual Exclusivity Bias
diff --git a/data/xml/2024.trustnlp.xml b/data/xml/2024.trustnlp.xml
index 38b04b6f60..441c12defb 100644
--- a/data/xml/2024.trustnlp.xml
+++ b/data/xml/2024.trustnlp.xml
@@ -216,6 +216,7 @@
2024.trustnlp-1.16.SupplementaryMaterial.zip
cao-etal-2024-introducing
10.18653/v1/2024.trustnlp-1.16
+
Semantic-Preserving Adversarial Example Attack against BERT
diff --git a/data/xml/2024.vardial.xml b/data/xml/2024.vardial.xml
index 53f820c8fc..9ec1c07010 100644
--- a/data/xml/2024.vardial.xml
+++ b/data/xml/2024.vardial.xml
@@ -85,6 +85,7 @@
2024.vardial-1.5.SupplementaryMaterial.txt
espana-bonet-etal-2024-elote
10.18653/v1/2024.vardial-1.5
+
Modeling Orthographic Variation in Occitan’s Dialects
@@ -226,6 +227,7 @@
2024.vardial-1.17.SupplementaryMaterial.txt
faisal-anastasopoulos-2024-data
10.18653/v1/2024.vardial-1.17
+
JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far
diff --git a/data/xml/2024.woah.xml b/data/xml/2024.woah.xml
index f5026cdeec..473be7bb2a 100644
--- a/data/xml/2024.woah.xml
+++ b/data/xml/2024.woah.xml
@@ -256,6 +256,7 @@
2024.woah-1.19
dementieva-etal-2024-toxicity
10.18653/v1/2024.woah-1.19
+
A Strategy Labelled Dataset of Counterspeech