You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Process metadata corrections for 2025.genaidetect-1.10 (closes#4579)
* Process metadata corrections for 2025.mcg-1.4 (closes#4578)
* Process metadata corrections for 2023.emnlp-main.212 (closes#4576)
* Process metadata corrections for 2024.findings-acl.220 (closes#4572)
* Process metadata corrections for 2024.findings-eacl.156 (closes#4571)
* Process metadata corrections for 2025.finnlp-1.30 (closes#4570)
* Process metadata corrections for 2022.findings-acl.21 (closes#4567)
* Process metadata corrections for 2025.comedi-1.6 (closes#4564)
* Process metadata corrections for 2025.coling-main.535 (closes#4563)
* Process metadata corrections for 2022.emnlp-main.788 (closes#4562)
* Process metadata corrections for 2024.acl-long.191 (closes#4556)
* Process metadata corrections for 2024.acl-srw.29 (closes#4555)
* Process metadata corrections for 2024.conll-1.17 (closes#4554)
* Process metadata corrections for 2024.emnlp-main.59 (closes#4551)
* Process metadata corrections for 2020.nlposs-1.2 (closes#4550)
* Process metadata corrections for 2022.naacl-main.13 (closes#4548)
* Process metadata corrections for 2025.genaidetect-1.31 (closes#4544)
* Process metadata corrections for 2024.ltedi-1.16 (closes#4543)
* Process metadata corrections for 2024.wnut-1.5 (closes#4542)
* Process metadata corrections for 2024.figlang-1.8 (closes#4521)
* Handle errors in script
- No title or abstract in frontmatter
- Print issue number when JSON fails
<abstract>In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.</abstract>
<abstract>We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at <url>https://github.com/sberbank-ai-lab/RuCCoN</url>.</abstract>
352
+
<abstract>We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at <url>https://github.com/AIRI-Institute/RuCCoN</url>.</abstract>
<abstract>Dense retrieval calls for discriminative embeddings to represent the semantic relationship between query and document. It may benefit from the using of large language models (LLMs), given LLMs’ strong capability on semantic understanding. However, the LLMs are learned by auto-regression, whose working mechanism is completely different from representing whole text as one discriminative embedding. Thus, it is imperative to study how to adapt LLMs properly so that they can be effectively initialized as the backbone encoder for dense retrieval. In this paper, we propose a novel approach, called <b>Llama2Vec</b>, which performs unsupervised adaptation of LLM for its dense retrieval application. Llama2Vec consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the LLM is prompted to <i>reconstruct the input sentence</i> and <i>predict the next sentence</i> based on its text embeddings. Llama2Vec is simple, lightweight, but highly effective. It is used to adapt LLaMA-2-7B on the Wikipedia corpus. With a moderate steps of adaptation, it substantially improves the model’s fine-tuned performances on a variety of dense retrieval benchmarks. Notably, it results in the new state-of-the-art performances on popular benchmarks, such as passage and document retrieval on MSMARCO, and zero-shot retrieval on BEIR. The model and source code will be made publicly available to facilitate the future research. Our model is available at https://github.com/FlagOpen/FlagEmbedding.</abstract>
2673
2673
<url hash="e0092648">2024.acl-long.191</url>
@@ -13986,8 +13986,8 @@
13986
13986
<paper id="29">
13987
13987
<title>Compromesso! <fixed-case>I</fixed-case>talian Many-Shot Jailbreaks undermine the safety of Large Language Models</title>
<abstract>As diverse linguistic communities and users adopt Large Language Models (LLMs), assessing their safety across languages becomes critical. Despite ongoing efforts to align these models with safe and ethical guidelines, they can still be induced into unsafe behavior with jailbreaking, a technique in which models are prompted to act outside their operational guidelines. What research has been conducted on these vulnerabilities was predominantly on English, limiting the understanding of LLM behavior in other languages. We address this gap by investigating Many-Shot Jailbreaking (MSJ) in Italian, underscoring the importance of understanding LLM behavior in different languages. We base our analysis on a newly created Italian dataset to identify unique safety vulnerabilities in 4 families of open-source LLMs.We find that the models exhibit unsafe behaviors even with minimal exposure to harmful prompts, and–more alarmingly–this tendency rapidly escalates with more demonstrations.</abstract>
<abstract>The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.</abstract>
Copy file name to clipboardExpand all lines: data/xml/2024.emnlp.xml
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -826,11 +826,11 @@
826
826
</paper>
827
827
<paper id="59">
828
828
<title><fixed-case>HEART</fixed-case>-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with <fixed-case>LLM</fixed-case>s</title>
829
-
<author><first>Jocelyn J</first><last>Shen</last><affiliation>Massachusetts Institute of Technology</affiliation></author>
<abstract>Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. We introduce a novel, theory-based taxonomy, HEART (Human Empathy and Narrative Taxonomy) that delineates elements of narrative style that can lead to empathy with the narrator of a story. We establish the performance of LLMs in extracting narrative elements from HEART, showing that prompting with our taxonomy leads to reasonable, human-level annotations beyond what prior lexicon-based methods can do. To show empirical use of our taxonomy, we collect a dataset of empathy judgments of stories via a large-scale crowdsourcing study with <tex-math>N=2,624</tex-math> participants. We show that narrative elements extracted via LLMs, in particular, vividness of emotions and plot volume, can elucidate the pathways by which narrative style cultivates empathy towards personal stories. Our work suggests that such models can be used for narrative analyses that lead to human-centered social and behavioral insights.</abstract>
0 commit comments