From 81885945dc991d3960b4bec9e61ec6d75d39e124 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:35:34 +0100 Subject: [PATCH 01/19] Add explicit id for all Lei Li author mentions Catch-all id for now --- data/xml/2020.acl.xml | 4 ++-- data/xml/2020.emnlp.xml | 6 +++--- data/xml/2020.findings.xml | 4 ++-- data/xml/2020.fnp.xml | 2 +- data/xml/2020.sdp.xml | 2 +- data/xml/2020.wmt.xml | 4 ++-- data/xml/2021.acl.xml | 18 +++++++++--------- data/xml/2021.eacl.xml | 2 +- data/xml/2021.emnlp.xml | 10 +++++----- data/xml/2021.findings.xml | 20 ++++++++++---------- data/xml/2021.iwslt.xml | 2 +- data/xml/2021.naacl.xml | 12 ++++++------ data/xml/2021.wmt.xml | 2 +- data/xml/2022.aacl.xml | 2 +- data/xml/2022.acl.xml | 12 ++++++------ data/xml/2022.coling.xml | 4 ++-- data/xml/2022.emnlp.xml | 4 ++-- data/xml/2022.findings.xml | 24 ++++++++++++------------ data/xml/2022.iwslt.xml | 2 +- data/xml/2022.naacl.xml | 4 ++-- data/xml/2023.acl.xml | 10 +++++----- data/xml/2023.americasnlp.xml | 2 +- data/xml/2023.emnlp.xml | 10 +++++----- data/xml/2023.findings.xml | 16 ++++++++-------- data/xml/2023.ijcnlp.xml | 2 +- data/xml/2024.acl.xml | 12 ++++++------ data/xml/2024.ccl.xml | 2 +- data/xml/2024.emnlp.xml | 12 ++++++------ data/xml/2024.findings.xml | 20 ++++++++++---------- data/xml/2024.iwslt.xml | 4 ++-- data/xml/2024.lrec.xml | 4 ++-- data/xml/2024.naacl.xml | 2 +- data/xml/2025.acl.xml | 10 +++++----- data/xml/2025.coling.xml | 2 +- data/xml/2025.emnlp.xml | 6 +++--- data/xml/2025.findings.xml | 18 +++++++++--------- data/xml/2025.iwslt.xml | 2 +- data/xml/2025.naacl.xml | 10 +++++----- data/xml/D18.xml | 2 +- data/xml/D19.xml | 6 +++--- data/xml/K19.xml | 2 +- data/xml/N18.xml | 2 +- data/xml/P16.xml | 2 +- data/xml/P19.xml | 12 ++++++------ data/xml/W13.xml | 4 ++-- data/xml/W14.xml | 2 +- data/xml/W16.xml | 2 +- data/xml/W17.xml | 4 ++-- data/xml/W19.xml | 4 ++-- data/xml/Y06.xml | 2 +- data/yaml/name_variants.yaml | 3 +++ 51 files changed, 167 insertions(+), 164 deletions(-) diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml index f7e4e32899..3918997eda 100644 --- a/data/xml/2020.acl.xml +++ b/data/xml/2020.acl.xml @@ -4234,7 +4234,7 @@ NingMiao YuxuanSong HaoZhou - LeiLi + LeiLi 3436–3441 It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach. 2020.acl-main.314 @@ -10481,7 +10481,7 @@ XijinZhang SongchengJiang YuxuanWang - LeiLi + LeiLi 1–8 This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four inte- gral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multi- lingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms. 2020.acl-demos.1 diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml index 1bf9ab29c4..74ace32d5f 100644 --- a/data/xml/2020.emnlp.xml +++ b/data/xml/2020.emnlp.xml @@ -1707,7 +1707,7 @@ ShuangZeng RunxinXu BaobaoChang - LeiLi + LeiLi 1630–1640 Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across paragraphs. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN), a method to recognize such relations for long paragraphs. GAIN constructs two graphs, a heterogeneous mention-level graph (MG) and an entity-level graph (EG). The former captures complex interaction among different mentions and the latter aggregates mentions underlying for the same entities. Based on the graphs we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at https://github.com/PKUnlp-icler/GAIN. 2020.emnlp-main.127 @@ -2836,7 +2836,7 @@ XipengQiu JiangtaoFeng HaoZhou - LeiLi + LeiLi 2649–2663 We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github.com/linzehui/mRASP. 2020.emnlp-main.210 @@ -9842,7 +9842,7 @@ JunxianHe MingxuanWang YimingYang - LeiLi + LeiLi 9119–9130 Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow. 2020.emnlp-main.733 diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml index 2382fc6635..28c4cdd206 100644 --- a/data/xml/2020.findings.xml +++ b/data/xml/2020.findings.xml @@ -1465,7 +1465,7 @@ Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced <fixed-case>M</fixed-case>onte-<fixed-case>C</fixed-case>arlo Approach MaosenZhang NanJiang - LeiLi + LeiLi YexiangXue 1286–1298 Generating natural language under complex constraints is a principled formulation towards controllable text generation. We present a framework to allow specification of combinatorial constraints for sentence generation. We propose TSMC, an efficient method to generate high likelihood sentences with respect to a pre-trained language model while satisfying the constraints. Our approach is highly flexible, requires no task-specific train- ing, and leverages efficient constraint satisfaction solving techniques. To better handle the combinatorial constraints, a tree search algorithm is embedded into the proposal process of the Markov Chain Monte Carlo (MCMC) to explore candidates that satisfy more constraints. Compared to existing MCMC approaches, our sampling approach has a better mixing performance. Experiments show that TSMC achieves consistent and significant improvement on multiple language generation tasks. @@ -5726,7 +5726,7 @@ MingxuanWang WeinanZhang YongYu - LeiLi + LeiLi 4908–4917 Active learning for sentence understanding aims at discovering informative unlabeled data for annotation and therefore reducing the demand for labeled data. We argue that the typical uncertainty sampling method for active learning is time-consuming and can hardly work in real-time, which may lead to ineffective sample selection. We propose adversarial uncertainty sampling in discrete space (AUSDS) to retrieve informative unlabeled samples more efficiently. AUSDS maps sentences into latent space generated by the popular pre-trained language models, and discover informative unlabeled text samples for annotation via adversarial attack. The proposed approach is extremely efficient compared with traditional uncertainty sampling with more than 10x speedup. Experimental results on five datasets show that AUSDS outperforms strong baselines on effectiveness. 2020.findings-emnlp.441 diff --git a/data/xml/2020.fnp.xml b/data/xml/2020.fnp.xml index 3752fcbd16..30cff5edd3 100644 --- a/data/xml/2020.fnp.xml +++ b/data/xml/2020.fnp.xml @@ -194,7 +194,7 @@ Extractive Financial Narrative Summarisation based on <fixed-case>DPP</fixed-case>s - LeiLi + LeiLi YafeiJiang YinanLiu 100–104 diff --git a/data/xml/2020.sdp.xml b/data/xml/2020.sdp.xml index db6adb6692..5d198dd20d 100644 --- a/data/xml/2020.sdp.xml +++ b/data/xml/2020.sdp.xml @@ -349,7 +349,7 @@ <fixed-case>CIST</fixed-case>@<fixed-case>CL</fixed-case>-<fixed-case>S</fixed-case>ci<fixed-case>S</fixed-case>umm 2020, <fixed-case>L</fixed-case>ong<fixed-case>S</fixed-case>umm 2020: Automatic Scientific Document Summarization - LeiLi + LeiLi YangXie WeiLiu YinanLiu diff --git a/data/xml/2020.wmt.xml b/data/xml/2020.wmt.xml index 9613f9566f..56f716fc66 100644 --- a/data/xml/2020.wmt.xml +++ b/data/xml/2020.wmt.xml @@ -471,7 +471,7 @@ ZehuiLin YaomingZhu MingxuanWang - LeiLi + LeiLi 305–312 This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training. 2020.wmt-1.33 @@ -1443,7 +1443,7 @@ ZhuoZhi JunCao MingxuanWang - LeiLi + LeiLi 985–990 In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining mod- ule adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions. 2020.wmt-1.112 diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index f6dfd817ee..c2467d6434 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -284,7 +284,7 @@ ChangzhiSun YuanbinWu HaoZhou - LeiLi + LeiLi JunchiYan 220–231 Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks’ label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell’s label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster. @@ -315,7 +315,7 @@ XiaoPan MingxuanWang LiweiWu - LeiLi + LeiLi 244–258 Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual baseline 2021.acl-long.21 @@ -364,7 +364,7 @@ ZehuiLin LiweiWu MingxuanWang - LeiLi + LeiLi 293–305 Multilingual neural machine translation aims at learning a single translation model for multiple languages. These jointly trained models often suffer from performance degradationon rich-resource language pairs. We attribute this degeneration to parameter interference. In this paper, we propose LaSS to jointly train a single unified multilingual MT model. LaSS learns Language Specific Sub-network (LaSS) for each language pair to counter parameter interference. Comprehensive experiments on IWSLT and WMT datasets with various Transformer architectures show that LaSS obtains gains on 36 language pairs by up to 1.2 BLEU. Besides, LaSS shows its strong generalization performance at easy adaptation to new language pairs and zero-shot translation. LaSS boosts zero-shot translation with an average of 8.3 BLEU on 30 language pairs. Codes and trained models are available at https://github.com/NLP-Playground/LaSS. 2021.acl-long.25 @@ -2163,7 +2163,7 @@ LinQiu WeinanZhang YongYu - LeiLi + LeiLi 1993–2003 Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM) for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8×-15× speedup. Note that GLAT does not modify the network architecture, which is a training method to learn word interdependency. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points. 2021.acl-long.155 @@ -3869,7 +3869,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker RunxinXu TianyuLiu - LeiLi + LeiLi BaobaoChang 3533–3546 Document-level event extraction aims to recognize event information from a whole piece of article. Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model. In this paper, we propose Heterogeneous Graph-based Interaction Model with a Tracker (GIT) to solve the aforementioned two challenges. For the first challenge, GIT constructs a heterogeneous graph interaction network to capture global interactions among different sentences and entity mentions. For the second, GIT introduces a Tracker module to track the extracted events and hence capture the interdependency among the events. Experiments on a large-scale dataset (Zheng et al, 2019) show GIT outperforms the previous methods by 2.8 F1. Further analysis reveals is effective in extracting multiple correlated events and event arguments that scatter across the document. @@ -5370,7 +5370,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Personalized Transformer for Explainable Recommendation - LeiLi + LeiLi YongfengZhang LiChen 4947–4957 @@ -7997,7 +7997,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO HaoZhou ChunGan ZaixiangZheng - LeiLi + LeiLi 7361–7373 The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether we can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of vocabulary from the perspective of information theory. It motivates us to formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT beats widely-used vocabularies in diverse scenarios, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https://github.com/Jingjing-NLP/VOLT. 2021.acl-long.571 @@ -10453,7 +10453,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO MingxuanWang QianqianDong RongYe - LeiLi + LeiLi 55–62 NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products. NeurST aims at facilitating the speech translation research for NLP researchers and building reliable benchmarks for this field. It provides step-by-step recipes for feature extraction, data preprocessing, distributed training, and evaluation. In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research. The toolkit is publicly available at https://github.com/bytedance/neurst and we will continuously update the performance of with other counterparts and studies at https://st-benchmark.github.io/. 2021.acl-demo.7 @@ -11081,7 +11081,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Pre-training Methods for Neural Machine Translation MingxuanWang - LeiLi + LeiLi 21–25 This tutorial provides a comprehensive guide to make the most of pre-training for neural machine translation. Firstly, we will briefly introduce the background of NMT, pre-training methodology, and point out the main challenges when applying pre-training for NMT. Then we will focus on analysing the role of pre-training in enhancing the performance of NMT, how to design a better pre-training model for executing specific NMT tasks and how to better integrate the pre-trained model into NMT system. In each part, we will provide examples, discuss training techniques and analyse what is transferred when applying pre-training. 2021.acl-tutorials.4 diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml index c48bdfa284..5cc30f41bc 100644 --- a/data/xml/2021.eacl.xml +++ b/data/xml/2021.eacl.xml @@ -3008,7 +3008,7 @@ ChangzhiSun YuanbinWu HaoZhou - LeiLi + LeiLi JunchiYan 2877–2887 Current state-of-the-art systems for joint entity relation extraction (Luan et al., 2019; Wad-den et al., 2019) usually adopt the multi-task learning framework. However, annotations for these additional tasks such as coreference resolution and event extraction are always equally hard (or even harder) to obtain. In this work, we propose a pre-training method ENPAR to improve the joint extraction performance. ENPAR requires only the additional entity annotations that are much easier to collect. Unlike most existing works that only consider incorporating entity information into the sentence encoder, we further utilize the entity pair information. Specifically, we devise four novel objectives,i.e., masked entity typing, masked entity prediction, adversarial context discrimination, and permutation prediction, to pre-train an entity encoder and an entity pair encoder. Comprehensive experiments show that the proposed pre-training method achieves significant improvement over BERT on ACE05, SciERC, and NYT, and outperforms current state-of-the-art on ACE05. diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml index e7faac1620..c2f1fe5fb3 100644 --- a/data/xml/2021.emnlp.xml +++ b/data/xml/2021.emnlp.xml @@ -432,7 +432,7 @@ Dynamic Knowledge Distillation for Pre-trained Language Models - LeiLi + LeiLi YankaiLin ShuhuaiRen PengLi @@ -1301,7 +1301,7 @@ HaoZhou WeinanZhang YongYu - LeiLi + LeiLi 1239–1250 Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, which makes the model less transparent. To tackle this challenge, in this paper, we propose LogiRE, a novel probabilistic model for document-level relation extraction by learning logic rules. LogiRE treats logic rules as latent variables and consists of two modules: a rule generator and a relation extractor. The rule generator is to generate logic rules potentially contributing to final predictions, and the relation extractor outputs final predictions based on the generated logic rules. Those two modules can be efficiently optimized with the expectation-maximization (EM) algorithm. By introducing logic rules into neural networks, LogiRE can explicitly capture long-range dependencies as well as enjoy better interpretation. Empirical results show that significantly outperforms several strong baselines in terms of relation performance and logical consistency. Our code is available at https://github.com/rudongyu/LogiRE. 2021.emnlp-main.95 @@ -4705,7 +4705,7 @@ ZhiyuanZeng JiazeChen WeiranXu - LeiLi + LeiLi 4102–4108 Neural abstractive summarization systems have gained significant progress in recent years. However, abstractive summarization often produce inconsisitent statements or false facts. How to automatically generate highly abstract yet factually correct summaries? In this paper, we proposed an efficient weak-supervised adversarial data augmentation approach to form the factual consistency dataset. Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings. Experiments and analysis conduct on public annotated summarization and factual consistency datasets demonstrate our approach effective and reasonable. 2021.emnlp-main.337 @@ -7934,7 +7934,7 @@ JunCao ShanboCheng ShujianHuang - LeiLi + LeiLi 7280–7290 How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER. 2021.emnlp-main.579 @@ -9717,7 +9717,7 @@ Text <fixed-case>A</fixed-case>uto<fixed-case>A</fixed-case>ugment: Learning Compositional Augmentation Policy for Text Classification ShuhuaiRen JinchaoZhang - LeiLi + LeiLi XuSun JieZhou 9029–9043 diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index 5b0b476fb0..29c781856a 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -925,7 +925,7 @@ <fixed-case>U</fixed-case>ni<fixed-case>K</fixed-case>eyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction HuanqinWu WeiLiu - LeiLi + LeiLi DanNie TaoChen FengZhang @@ -2444,7 +2444,7 @@ ChiHan MingxuanWang HengJi - LeiLi + LeiLi 2214–2225 2021.findings-acl.195 10.18653/v1/2021.findings-acl.195 @@ -3026,7 +3026,7 @@ JiazeChen HaoZhou XipengQiu - LeiLi + LeiLi 2739–2750 2021.findings-acl.242 10.18653/v1/2021.findings-acl.242 @@ -3300,7 +3300,7 @@ LiweiWu ShanboCheng MingxuanWang - LeiLi + LeiLi 3001–3007 2021.findings-acl.264 10.18653/v1/2021.findings-acl.264 @@ -3464,7 +3464,7 @@ YuanbinWu JiazeChen HaoZhou - LeiLi + LeiLi 3140–3151 2021.findings-acl.277 10.18653/v1/2021.findings-acl.277 @@ -6240,7 +6240,7 @@ <fixed-case>C</fixed-case>ascade<fixed-case>BERT</fixed-case>: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade - LeiLi + LeiLi YankaiLin DeliChen ShuhuaiRen @@ -6725,7 +6725,7 @@ Leveraging Word-Formation Knowledge for <fixed-case>C</fixed-case>hinese Word Sense Disambiguation HuaZheng - LeiLi + LeiLi DamaiDai DeliChen TianyuLiu @@ -8770,7 +8770,7 @@ Multilingual Translation via Grafting Pre-trained Language Models ZeweiSun MingxuanWang - LeiLi + LeiLi 2735–2747 Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size. 2021.findings-emnlp.233 @@ -8864,7 +8864,7 @@ JiangtaoFeng ChengqiZhao MingxuanWang - LeiLi + LeiLi 2812–2823 Developing a unified multilingual model has been a long pursuing goal for machine translation. However, existing approaches suffer from performance degradation - a single multilingual model is inferior to separately trained bilingual ones on rich-resource languages. We conjecture that such a phenomenon is due to interference brought by joint training with multiple languages. To accommodate the issue, we propose CIAT, an adapted Transformer model with a small parameter overhead for multilingual machine translation. We evaluate CIAT on multiple benchmark datasets, including IWSLT, OPUS-100, and WMT. Experiments show that the CIAT consistently outperforms strong multilingual baselines on 64 of total 66 language directions, 42 of which have above 0.5 BLEU improvement. 2021.findings-emnlp.240 @@ -10963,7 +10963,7 @@ TaoWang ChengqiZhao MingxuanWang - LeiLi + LeiLi HangLi DeyiXiong 4639–4644 diff --git a/data/xml/2021.iwslt.xml b/data/xml/2021.iwslt.xml index e4e7f0c7c5..6582fc3c5b 100644 --- a/data/xml/2021.iwslt.xml +++ b/data/xml/2021.iwslt.xml @@ -110,7 +110,7 @@ RongYe QianqianDong JunCao - LeiLi + LeiLi 64–74 This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications. 2021.iwslt-1.6 diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml index e10381b920..d36faa17ca 100644 --- a/data/xml/2021.naacl.xml +++ b/data/xml/2021.naacl.xml @@ -2243,7 +2243,7 @@ Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in <fixed-case>NLP</fixed-case> Models WenkaiYang - LeiLi + LeiLi ZhiyuanZhang XuanchengRen XuSun @@ -5884,7 +5884,7 @@ Decompose, Fuse and Generate: A Formation-Informed Method for <fixed-case>C</fixed-case>hinese Definition Generation HuaZheng DamaiDai - LeiLi + LeiLi TianyuLiu ZhifangSui BaobaoChang @@ -6173,7 +6173,7 @@ Generative Imagination Elevates Machine Translation QuanyuLong MingxuanWang - LeiLi + LeiLi 5738–5748 There are common semantics shared across text and images. Given a sentence in a source language, whether depicting the visual scene helps translation into a target language? Existing multimodal neural machine translation methods (MNMT) require triplets of bilingual sentence - image for training and tuples of source sentence - image for inference. In this paper, we propose ImagiT, a novel machine translation method via visual imagination. ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the “imagined representation” to produce a target translation. Unlike previous methods, it only needs the source sentence at the inference time. Experiments demonstrate that ImagiT benefits from visual imagination and significantly outperforms the text-only neural machine translation baselines. Further analysis reveals that the imagination process in ImagiT helps fill in missing information when performing the degradation strategy. 2021.naacl-main.457 @@ -7335,7 +7335,7 @@ MingxuanWang HongxiaoBai HaiZhao - LeiLi + LeiLi 89–96 We propose to improve unsupervised neural machine translation with cross-lingual supervision (), which utilizes supervision signals from high resource language pairs to improve the translation of zero-source languages. Specifically, for training En-Ro system without parallel corpus, we can leverage the corpus from En-Fr and En-De to collectively train the translation from one language into many languages under one model. % is based on multilingual models which require no changes to the standard unsupervised NMT. Simple and effective, significantly improves the translation quality with a big margin in the benchmark unsupervised translation tasks, and even achieves comparable performance to supervised NMT. In particular, on WMT’14 -tasks achieves 37.6 and 35.18 BLEU score, which is very close to the large scale supervised setting and on WMT’16 -tasks achieves 35.09 BLEU score which is even better than the supervised Transformer baseline. 2021.naacl-industry.12 @@ -7361,7 +7361,7 @@ TaoWang ChengqiZhao MingxuanWang - LeiLi + LeiLi DeyiXiong 105–112 Automatic translation of dialogue texts is a much needed demand in many real life scenarios. However, the currently existing neural machine translation delivers unsatisfying results. In this paper, we conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation, including pronoun dropping (), punctuation dropping (), and typos (). In response to these challenges, we propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances. To properly evaluate the performance, we propose a manually annotated dataset with 1,931 Chinese-English parallel utterances from 300 dialogues as a benchmark testbed for dialogue translation. Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines. It also elevates the recovery rate of omitted pronouns from 26.09% to 47.16%. We will publish the code and dataset publicly at https://xxx.xx. @@ -7376,7 +7376,7 @@ YingXiong YangWei MingxuanWang - LeiLi + LeiLi 113–120 Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review. 2021.naacl-industry.15 diff --git a/data/xml/2021.wmt.xml b/data/xml/2021.wmt.xml index 2df80ca98f..c9bed4b19a 100644 --- a/data/xml/2021.wmt.xml +++ b/data/xml/2021.wmt.xml @@ -259,7 +259,7 @@ ZehuiLin JiangtaoFeng ShanboCheng - LeiLi + LeiLi MingxuanWang HaoZhou 187–196 diff --git a/data/xml/2022.aacl.xml b/data/xml/2022.aacl.xml index 68f2393158..19f589335b 100644 --- a/data/xml/2022.aacl.xml +++ b/data/xml/2022.aacl.xml @@ -553,7 +553,7 @@ <fixed-case>SAPG</fixed-case>raph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph SiyaQi - LeiLi + LeiLi YiyangLi JinJiang DingxinHu diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index fd1624a921..4fab3c8f51 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -278,7 +278,7 @@ ShijieGeng ZuohuiFu YingqiangGe - LeiLi + LeiLi Gerardde Melo YongfengZhang 244-255 @@ -707,7 +707,7 @@ QianDong YaomingZhu MingxuanWang - LeiLi + LeiLi 680-694 How to find proper moments to generate partial sentence translation given a streaming speech input? Existing approaches waiting-and-translating for a fixed duration often break the acoustic units in speech, since the boundaries between acoustic units in speech are not even. In this paper, we propose MoSST, a simple yet effective method for translating streaming speech content. Given a usually long speech sequence, we develop an efficient monotonic segmentation module inside an encoder-decoder model to accumulate acoustic information incrementally and detect proper speech unit boundaries for the input in speech translation task. Experiments on multiple translation directions of the MuST-C dataset show that outperforms existing methods and achieves the best trade-off between translation quality (BLEU) and latency. Our code is available at https://github.com/dqqcasia/mosst. 2022.acl-long.50 @@ -2657,7 +2657,7 @@ WangchunshuZhou JingjingXu HaoZhou - LeiLi + LeiLi 2701-2714 Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due to the pervasiveness, it naturally raises an interesting question: how do masked language models (MLMs) learn contextual representations? In this work, we analyze the learning dynamics of MLMs and find that it adopts sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these problems, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. To be specific, TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over MLM. 2022.acl-long.193 @@ -6668,7 +6668,7 @@ in the Case of Unambiguous Gender <fixed-case>STEMM</fixed-case>: Self-learning with Speech-text Manifold Mixup for Speech Translation QingkaiFang RongYe - LeiLi + LeiLi YangFeng MingxuanWang 7050-7062 @@ -7423,7 +7423,7 @@ in the Case of Unambiguous Gender MoshaChen ZhenBi XiaozhuanLiang - LeiLi + LeiLi XinShang KangpingYin ChuanqiTan @@ -7867,7 +7867,7 @@ in the Case of Unambiguous Gender LihuaQian XinyuDai JiajunChen - LeiLi + LeiLi 8398-8409 Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm. 2022.acl-long.575 diff --git a/data/xml/2022.coling.xml b/data/xml/2022.coling.xml index 2dbc4374c6..bfdedc0130 100644 --- a/data/xml/2022.coling.xml +++ b/data/xml/2022.coling.xml @@ -2431,7 +2431,7 @@ <fixed-case>L</fixed-case>ight<fixed-case>NER</fixed-case>: A Lightweight Tuning Paradigm for Low-resource <fixed-case>NER</fixed-case> via Pluggable Prompting XiangChen - LeiLi + LeiLi ShuminDeng ChuanqiTan ChangliangXu @@ -2759,7 +2759,7 @@ Augmenting Legal Judgment Prediction with Contrastive Case Relations DugangLiu WeihaoDu - LeiLi + LeiLi WeikePan ZhongMing 2658–2667 diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml index 1a6a3b5d0d..7517e92f00 100644 --- a/data/xml/2022.emnlp.xml +++ b/data/xml/2022.emnlp.xml @@ -11465,7 +11465,7 @@ MinghuiQiuAlibaba Group TaolinZhangEast China Normal University TingtingLiuEast China Normal University - LeiLiEast China Normal University + LeiLiEast China Normal University JianingWangEast China Normal University MingWangAlibaba Group JunHuangAlibaba Group @@ -11575,7 +11575,7 @@ XinXieZhejiang University XiangChenZhejiang University ZhouboLiZhejiang University - LeiLiZhejiang University + LeiLiZhejiang University 98-108 We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video. 2022.emnlp-demos.10 diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 9565c09e27..8678f386d4 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -880,7 +880,7 @@ XuandongZhao ZhiguoYu MingWu - LeiLi + LeiLi 774-781 How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2×) and memory usage (8.0×) compared with state-of-the-art large models. Our implementation is available at https://github.com/XuandongZhao/HPD. 2022.findings-acl.64 @@ -3803,7 +3803,7 @@ ChengqiZhao ShujianHuang JiajunChen - LeiLi + LeiLi 3537-3548 This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer model and hope to answer the following question: Is the capacity of current models strong enough for document-level translation? Interestingly, we observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words. We evaluate this model and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that document-level Transformer models outperforms sentence-level ones and many previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation. 2022.findings-acl.279 @@ -4226,7 +4226,7 @@ ZhongqiaoLi XinboZhang ChangzhiSun - LeiLi + LeiLi YanghuaXiao HaoZhou 3941-3955 @@ -4371,7 +4371,7 @@ Structural Supervision for Word Alignment and Machine Translation - LeiLi + LeiLi KaiFan HongjiaLi ChunYuan @@ -6196,7 +6196,7 @@ Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction XiangChen NingyuZhang - LeiLi + LeiLi YunzhiYao ShuminDeng ChuanqiTan @@ -7198,7 +7198,7 @@ JingjingXu JiazeChen HaoZhou - LeiLi + LeiLi 2508-2527 We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at https://github.com/zide05/MTG. 2022.findings-naacl.192 @@ -8908,7 +8908,7 @@ TingtingLiuEast China Normal University ChengyuWangAlibaba Group XiangruZhuFudan University - LeiLiEast China Normal University + LeiLiEast China Normal University MinghuiQiuAlibaba Group JunHuangalibaba group MingGaoEast China Normal University @@ -11942,7 +11942,7 @@ Faster and Smaller Speech Translation without Quality Compromise SiyiWangBeijing University of Posts and Telecommunications KaiWangBeijing University of Posts and Telecommunications YanquanZhouBeijing University of Posts and Telecommunications - LeiLiBeijing University of Posts and Telecommunications + LeiLiBeijing University of Posts and Telecommunications QingYangDu Xiaoman Technology(Beijing) DongliangXuDu Xiaoman Technology(Beijing) 3880-3886 @@ -13046,7 +13046,7 @@ Faster and Smaller Speech Translation without Quality Compromise Distillation-Resistant Watermarking for Model Protection in <fixed-case>NLP</fixed-case> XuandongZhaoUC Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara Yu-XiangWangUCSB 5044-5055 How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim’s prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100% mean average precision for all four tasks while the prior method fails on two. @@ -13946,7 +13946,7 @@ Faster and Smaller Speech Translation without Quality Compromise YifanSongPeking University JingjingXuShanghai AI Lab ZhifangSuiPeking University - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 5937-5947 Previous literature has proved that Pretrained Language Models (PLMs) can store factual knowledge. However, we find that facts stored in the PLMs are not always correct. It motivates us to explore a fundamental question: How do we calibrate factual knowledge in PLMs without re-training from scratch? In this work, we propose a simple and lightweight method CaliNet to achieve this goal. To be specific, we first detect whether PLMs can learn the right facts via a contrastive score between right and fake facts. If not, we then use a lightweight method to add and adapt new parameters to specific factual texts. Experiments on the knowledge probing task show the calibration effectiveness and efficiency. In addition, through closed-book question answering, we find that the calibrated PLM possesses knowledge generalization ability after finetuning.Beyond the calibration performance, we further investigate and visualize the knowledge calibration mechanism. 2022.findings-emnlp.438 @@ -14453,7 +14453,7 @@ Faster and Smaller Speech Translation without Quality Compromise From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models - LeiLiPeking University + LeiLiPeking University YankaiLinGaoling School of Artificial Intelligence, Renmin University of China XuanchengRenPeking University GuangxiangZhaoPeking University @@ -14613,7 +14613,7 @@ Faster and Smaller Speech Translation without Quality Compromise Yi-LinTuanUniversity of California, Santa Barbara YujieLuUniversity of California, Santa Barbara MichaelSaxonUniversity of California, Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara William YangWangUnversity of California, Santa Barbara 6559-6574 Is it possible to build a general and automatic natural language generation (NLG) evaluation metric? Existing learned metrics either perform unsatisfactorily or are restricted to tasks where large human rating data is already available. We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation, by utilizing a novel, iterative error synthesis and severity scoring pipeline. This pipeline applies a series of plausible errors to raw text and assigns severity labels by simulating human judgements with entailment. We evaluate SESCORE against existing metrics by comparing how their scores correlate with human ratings. SESCORE outperforms all prior unsupervised metrics on multiple diverse NLG tasks including machine translation, image captioning, and WebNLG text generation. For WMT 20/21En-De and Zh-En, SESCORE improve the average Kendall correlation with human judgement from 0.154 to 0.195. SESCORE even achieves comparable performance to the best supervised metric COMET, despite receiving no human annotated training data. diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml index 07610a69ea..3525f423fa 100644 --- a/data/xml/2022.iwslt.xml +++ b/data/xml/2022.iwslt.xml @@ -112,7 +112,7 @@ On the Impact of Noises in Crowd-Sourced Data for Speech Translation SiqiOuyang RongYe - LeiLi + LeiLi 92-97 Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of the most widely used ST benchmark datasets. It contains around 400 hours of speech-transcript-translation data for each of the eight translation directions. This dataset passes several quality-control filters during creation. However, we find that MuST-C still suffers from three major quality issues: audiotext misalignment, inaccurate translation, and unnecessary speaker’s name. What are the impacts of these data quality issues for model development and evaluation? In this paper, we propose an automatic method to fix or filter the above quality issues, using English-German (En-De) translation as an example. Our experiments show that ST models perform better on clean test sets, and the rank of proposed models remains consistent across different test sets. Besides, simply removing misaligned data points from the training set does not lead to a better ST model. 2022.iwslt-1.9 diff --git a/data/xml/2022.naacl.xml b/data/xml/2022.naacl.xml index e51dd12cb9..6c56c243dc 100644 --- a/data/xml/2022.naacl.xml +++ b/data/xml/2022.naacl.xml @@ -973,7 +973,7 @@ Provably Confidential Language Modelling XuandongZhao - LeiLi + LeiLi Yu-XiangWang 943-955 Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality. @@ -5242,7 +5242,7 @@ Cross-modal Contrastive Learning for Speech Translation RongYe MingxuanWang - LeiLi + LeiLi 5099-5113 How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation. To this end, we propose ConST, a cross-modal contrastive learning method for end-to-end speech-to-text translation. We evaluate ConST and a variety of previous baselines on a popular benchmark MuST-C. Experiments show that the proposed ConST consistently outperforms the previous methods, and achieves an average BLEU of 29.4. The analysis further verifies that ConST indeed closes the representation gap of different modalities — its learned representation improves the accuracy of cross-modal speech-text retrieval from 4% to 88%. Code and models are available at https://github.com/ReneeYe/ConST. 2022.naacl-main.376 diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml index 0137f0c46e..e0dcf75007 100644 --- a/data/xml/2023.acl.xml +++ b/data/xml/2023.acl.xml @@ -3036,7 +3036,7 @@ <fixed-case>WACO</fixed-case>: Word-Aligned Contrastive Learning for Speech Translation SiqiOuyangUniversity of California, Santa Barbara RongYeByteDance AI Lab - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 3891-3907 End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model’s performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO. 2023.acl-long.216 @@ -4007,7 +4007,7 @@ WendaXuUniversity of California at Santa Barbara XianQianByteDance AI LAB MingxuanWangBytedance AI Lab - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara William YangWangUnversity of California, Santa Barbara 5166-5183 Is it possible to train a general metric for evaluating text generation quality without human-annotated ratings? Existing learned metrics either perform unsatisfactory across text generation tasks or require human ratings for training on specific tasks. In this paper, we propose SEScore2, a self-supervised approach for training a model-based metric for text generation evaluation. The key concept is to synthesize realistic model mistakes by perturbing sentences retrieved from a corpus. We evaluate SEScore2 and previous methods on four text generation tasks across three languages. SEScore2 outperforms all prior unsupervised metrics on four text generation evaluation benchmarks, with an average Kendall improvement of 0.158. Surprisingly, SEScore2 even outperforms the supervised BLEURT and COMET on multiple text generation tasks. @@ -7899,7 +7899,7 @@ WeiShiFudan University ZiquanFuSystem, Inc SijieChengFudan University - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara YanghuaXiaoFudan University 9890-9908 Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as “lions don’t live in the ocean”, is also ubiquitous in the world but rarely mentioned explicitly in text. What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict. @@ -12505,7 +12505,7 @@ SiqiOuyangUniversity of California, Santa Barbara ZhiguoYuMicrosoft MingWuGitHub, Inc. - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 15590-15606 How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, paraphrasing, and multiple-choice question answering. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 15.6% on the GLUE benchmark. Our source code is available at https://anonymous.4open.science/r/NPPrompt. 2023.acl-long.869 @@ -16901,7 +16901,7 @@ <fixed-case>F</fixed-case>ashion<fixed-case>KLIP</fixed-case>: Enhancing <fixed-case>E</fixed-case>-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph XiaodanWangFudan University ChengyuWangAlibaba Group - LeiLiEast China Normal University + LeiLiEast China Normal University ZhixuLiFudan University BenChenAlibaba Group LinboJinAlibaba diff --git a/data/xml/2023.americasnlp.xml b/data/xml/2023.americasnlp.xml index 2a7fc31a22..5f13fb31f4 100644 --- a/data/xml/2023.americasnlp.xml +++ b/data/xml/2023.americasnlp.xml @@ -230,7 +230,7 @@ TianruiGuUniversity of California, Santa Barbara KaieChenUniversity of California, Santa Barbara SiqiOuyangUniversity of California, Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 173-176 This paper presents PlayGround’s submission to the AmericasNLP 2023 shared task on machine translation (MT) into indigenous languages. We finetuned NLLB-600M, a multilingual MT model pre-trained on Flores-200, on 10 low-resource language directions and examined the effectiveness of weight averaging and back translation. Our experiments showed that weight averaging, on average, led to a 0.0169 improvement in the ChrF++ score. Additionally, we found that back translation resulted in a 0.008 improvement in the ChrF++ score. 2023.americasnlp-1.19 diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml index b333e00563..ee7dd7e37b 100644 --- a/data/xml/2023.emnlp.xml +++ b/data/xml/2023.emnlp.xml @@ -4156,7 +4156,7 @@ Can We Edit Factual Knowledge by In-Context Learning? CeZheng - LeiLi + LeiLi QingxiuDong YuxuanFan ZhiyongWu @@ -5132,7 +5132,7 @@ ZhenqiaoSong MarkusFreitag WilliamWang - LeiLi + LeiLi 5967-5994 Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics do not provide explicit explanation of their verdict, nor associate the scores with defects in the generated text. To address this limitation, we present INSTRUCTSCORE, a fine-grained explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate INSTRUCTSCORE on a variety of generation tasks, including translation, captioning, data-to-text, and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our INSTRUCTSCORE, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings. 2023.emnlp-main.365 @@ -8511,7 +8511,7 @@ Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning LeanWang - LeiLi + LeiLi DamaiDai DeliChen HaoZhou @@ -9223,7 +9223,7 @@ Learning from Mistakes via Cooperative Study Assistant for Large Language Models DanqingWang - LeiLi + LeiLi 10667-10685 Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ. 2023.emnlp-main.659 @@ -10152,7 +10152,7 @@ Can Language Models Understand Physical Concepts? - LeiLi + LeiLi JingjingXu QingxiuDong CeZheng diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml index c14c5e0829..b5af6bc1be 100644 --- a/data/xml/2023.findings.xml +++ b/data/xml/2023.findings.xml @@ -4664,7 +4664,7 @@ YongkangWuHuawei MengHanHuawei YutaoZhuUniversity of Montreal - LeiLiHuawei + LeiLiHuawei XinyuZhangHuawei Technologies Co., Ltd RuofeiLaiHuawei XiaoguangLiHuawei Noah’s Ark Lab @@ -7044,7 +7044,7 @@ Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter YiLiuSchool of Computer Science, Peking University XiaohanBiPeking University - LeiLiPeking University + LeiLiPeking University SishuoChenCenter for Data Science, Peking University WenkaiYangPeking University XuSunPeking University @@ -7617,7 +7617,7 @@ <fixed-case>LET</fixed-case>: Leveraging Error Type Information for Grammatical Error Correction LingyuYangTsinghua University HongjiaLiTsinghua University - LeiLiTsinghua University + LeiLiTsinghua University ChengyinXuTsinghua University ShutaoXiaTsinghua University ChunYuanTsinghua University @@ -10714,7 +10714,7 @@ Delving into the Openness of <fixed-case>CLIP</fixed-case> ShuhuaiRenPeking University - LeiLiPeking University + LeiLiPeking University XuanchengRenDAMO Academy, Alibaba Group GuangxiangZhaoShanghai AI lab XuSunPeking University @@ -12344,7 +12344,7 @@ YinquanLuShanghai AI Laboratory WenhaoZhuNational Key Laboratory for Novel Software Technology, Nanjing University LingpengKongThe University of Hong Kong - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara YuQiaoShanghai AI Lab JingjingXuShanghai AI Lab 11518-11533 @@ -16398,7 +16398,7 @@ <fixed-case>I</fixed-case>mage<fixed-case>N</fixed-case>et<fixed-case>VC</fixed-case>: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 <fixed-case>I</fixed-case>mage<fixed-case>N</fixed-case>et Categories HemingXia QingxiuDong - LeiLi + LeiLi JingjingXu TianyuLiu ZiweiQin @@ -17331,7 +17331,7 @@ <fixed-case>A</fixed-case>uto<fixed-case>P</fixed-case>lan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models SiqiOuyang - LeiLi + LeiLi 3114-3128 Recent large language models (LLMs) are promising for making decisions in grounded environments. However, LLMs frequently fail in complex decision-making tasks due to the misalignment between the pre-trained knowledge in LLMs and the actual rules in the environment. Existing methods require either costly gradient computation or lengthy in-context demonstrations. In this paper, we propose AutoPlan, an approach to guide LLM-based agents to accomplish interactive decision-making tasks. AutoPlan augments the LLM prompt with a task-solving plan and optimizes it through iterative experience collection and reflection. Our experiments show that AutoPlan, though using no in-context demonstrations, achieves success rates on par with the baselines using human-written demonstrations on ALFWorld and even outperforms them by 8% on HotpotQA. The code is available at https://github.com/owaski/AutoPlan. 2023.findings-emnlp.205 @@ -28056,7 +28056,7 @@ BohongWu FeiYuan HaiZhao - LeiLi + LeiLi JingjingXu 15432-15444 Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these models are not capable of generating high-quality text compared with decoder-based causal language models. Can we transform a pre-trained language understanding model into an effective language generation model? We propose a Semantic-Guided Alignment-then-Denoising (SGA) approach to adapt a multilingual encoder to a multilingual generator with a small number of additional parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-R_{large}. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators. Our code is available at https://github.com/chengzhipanpan/XLMR4MT. diff --git a/data/xml/2023.ijcnlp.xml b/data/xml/2023.ijcnlp.xml index 8a6f4f1f91..292a8d1274 100644 --- a/data/xml/2023.ijcnlp.xml +++ b/data/xml/2023.ijcnlp.xml @@ -1496,7 +1496,7 @@ PengfeiZhu ChaoPang YekunChai - LeiLi + LeiLi ShuohuanWang YuSun HaoTian diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml index 6a03746558..0a00862971 100644 --- a/data/xml/2024.acl.xml +++ b/data/xml/2024.acl.xml @@ -7079,7 +7079,7 @@ Math-Shepherd: Verify and Reinforce <fixed-case>LLM</fixed-case>s Step-by-step without Human Annotations PeiyiWang - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ZhihongShaoTsinghua University, Tsinghua University RunxinXu DamaiDai @@ -7096,7 +7096,7 @@ Large Language Models are not Fair Evaluators PeiyiWang - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong LiangChen ZefanCai DaweiZhu @@ -10832,7 +10832,7 @@ Multimodal <fixed-case>A</fixed-case>r<fixed-case>X</fixed-case>iv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong YuqiWangUniversity of Hong Kong RunxinXuPeking University PeiyiWangPeking University @@ -11575,7 +11575,7 @@ GuangleiZhuCarnegie Mellon University XuandongZhaoUniversity of California, Berkeley LiangmingPanUniversity of California, Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University WilliamWangUC Santa Barbara 15474-15492 Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM’s bias in evaluating their own output. In this paper, we formally define LLM’s self-bias – the tendency to favor its own generation – using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias. @@ -13381,7 +13381,7 @@ ZiwenXuZhejiang University ShuofeiQiao RunnanFang - LeiLiTencent + LeiLiTencent ZhenBiZhejiang University GuozhouZheng HuajunChenZhejiang University @@ -14422,7 +14422,7 @@ Watermarking for Large Language Models XuandongZhao Yu-XiangWang - LeiLi + LeiLi 10-11 As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial in both the computational linguistics and machine learning communities. In this tutorial, we aim to provide an in-depth exploration of text watermarking, a subfield of linguistic steganography with the goal of embedding a hidden message (the watermark) within a text passage. We will introduce the fundamentals of text watermarking, discuss the main challenges in identifying AI-generated text, and delve into the current watermarking methods, assessing their strengths and weaknesses. Moreover, we will explore other possible applications of text watermarking and discuss future directions for this field. Each section will be supplemented with examples and key takeaways. 2024.acl-tutorials.6 diff --git a/data/xml/2024.ccl.xml b/data/xml/2024.ccl.xml index af96f82500..44aa040fb4 100644 --- a/data/xml/2024.ccl.xml +++ b/data/xml/2024.ccl.xml @@ -1070,7 +1070,7 @@ YuelouXu YanLu KaiWang - LeiLi + LeiLi YanquanZhou 1123–1135 “The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction” diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml index b36fd0d2d0..b71bb6a0e9 100644 --- a/data/xml/2024.emnlp.xml +++ b/data/xml/2024.emnlp.xml @@ -902,7 +902,7 @@ A Survey on In-context Learning QingxiuDong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong DamaiDai CeZhengPeking University JingyuanMa @@ -912,7 +912,7 @@ ZhiyongWuShanghai Artificial Intelligence Laboratory BaobaoChangPeking University XuSun - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University ZhifangSuiPeking University 1107-1128 With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL. @@ -5036,7 +5036,7 @@ <fixed-case>VLF</fixed-case>eedback: A Large-Scale <fixed-case>AI</fixed-case> Feedback Dataset for Large Vision-Language Models Alignment - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ZhihuiXieShanghai Jiao Tong University MukaiLi ShunianChenShenzhen Research Institute of Big Data @@ -8701,7 +8701,7 @@ WendaXu JiachenLiUniversity of California, Santa Barbara William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 11125-11139 Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of online training. Specifically, we identify that the learned LLM should adhere to the proximity of the behavior LLM, which collects the training samples. To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment.We conduct extensive experiments to validate the effectiveness and applicability of our approach by integrating it with various DAP methods, resulting in significant performance improvements across a wide range of tasks when training with the same amount of preference data. Even when only introducing one additional data collection phase, our online BPO improves its offline DAP baseline from 72.0% to 80.2% on TL;DR and from 82.2% to 89.1% on Anthropic Helpfulness in terms of win rate against human reference text. 2024.emnlp-main.623 @@ -10250,7 +10250,7 @@ HanlinZhuElectrical Engineering & Computer Science Department, University of California Berkeley XiaomengYangGoogle DeepMind AndrewCohen - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YuandongTianMeta AI (FAIR) 13274-13292 Recent research has increasingly focused on evaluating large language models’ (LLMs) alignment with diverse human values and preferences, particularly for open-ended tasks like story generation. Traditional evaluation metrics rely heavily on lexical similarity with human-written references, often showing poor correlation with human judgments and failing to account for alignment with the diversity of human preferences. To address these challenges, we introduce PerSE, an interpretable evaluation framework designed to assess alignment with specific human preferences. It is tuned to infer specific preferences from an in-context personal profile and evaluate the alignment between the generated content and personal preferences. PerSE enhances interpretability by providing detailed comments and fine-grained scoring, facilitating more personalized content generation. Our 13B LLaMA-2-based PerSE shows a 15.8% increase in Kendall correlation and a 13.7% rise in accuracy with zero-shot reviewers compared to GPT-4. It also outperforms GPT-4 by 46.01% in Kendall correlation on new domains, indicating its transferability @@ -18075,7 +18075,7 @@ WendaXu XiXu SiqiOuyangCMU, Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 344-350 With the rapid advancement of machine translation research, evaluation toolkits have become essential for benchmarking system progress. Tools like COMET and SacreBLEU offer single quality score assessments that are effective for pairwise system comparisons. However, these tools provide limited insights for fine-grained system-level comparisons and the analysis of instance-level defects. To address these limitations, we introduce Translation Canvas, an explainable interface designed to pinpoint and analyze translation systems’ performance: 1) Translation Canvas assists machine translation researchers in comprehending system-level model performance by identifying common errors (their frequency and severity) and analyzing relationships between different systems based on various evaluation metrics. 2) It supports fine-grained analysis by highlighting error spans with explanations and selectively displaying systems’ predictions. According to human evaluation, Translation Canvas demonstrates superior performance over COMET and SacreBLEU packages under enjoybility and understandbility criteria. 2024.emnlp-demo.36 diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index 3892c3c698..52ad9d01e8 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -3228,7 +3228,7 @@ BiaoZhangGoogle DeepMind ZhongtaoLiuGoogle William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University MarkusFreitagGoogle 1429-1445 Recent large language models (LLM) areleveraging human feedback to improve theirgeneration quality. However, human feedbackis costly to obtain, especially during inference.In this work, we propose LLMRefine, aninference time optimization method to refineLLM’s output. The core idea is to usea learned fine-grained feedback model topinpoint defects and guide LLM to refinethem iteratively. Using original LLM as aproposal of edits, LLMRefine searches fordefect-less text via simulated annealing, tradingoff the exploration and exploitation. Weconduct experiments on three text generationtasks, including machine translation, long-form question answering (QA), and topicalsummarization. LLMRefine consistentlyoutperforms all baseline approaches, achievingimprovements up to 1.7 MetricX points ontranslation tasks, 8.1 ROUGE-L on ASQA, 2.2ROUGE-L on topical summarization. @@ -4399,7 +4399,7 @@ ShujianHuangNanjing University LingpengKongDepartment of Computer Science, The University of Hong Kong JiajunChenNanjing University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 2765-2781 Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs’ performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM. 2024.findings-naacl.176 @@ -8815,7 +8815,7 @@ Red Teaming Visual Language Models MukaiLi - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong YuweiYin MasoodAhmed ZhenguangLiuZhejiang University @@ -13143,7 +13143,7 @@ YiLiuPeking University YuxiangWang ShuhuaiRen - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong SishuoChenAlibaba Group XuSun LuHouHuawei Technologies Ltd. @@ -15929,7 +15929,7 @@ FeiYuan ShuaiYuan ZhiyongWuShanghai Artificial Intelligence Laboratory - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 12111-12130 Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM’s multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant . 2024.findings-acl.721 @@ -18707,7 +18707,7 @@ ZhenqiaoSong TaiqiHe William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 15654-15669 How can large language models (LLMs) process and translate endangered languages? Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely perform well in unseen, endangered languages. On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary. We propose LingoLLM, a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training. Our key insight is to demonstrate linguistic knowledge of an unseen language in an LLM’s prompt, including a dictionary, a grammar book, and morphologically analyzed input text. We implement LingoLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages. Our results show that LingoLLM elevates translation capability from GPT-4’s 0 to 10.5 BLEU for 10 language directions. Our findings demonstrate the tremendous value of linguistic knowledge in the age of LLMs for endangered languages. Our data, code, and model generations will be released to the public. Our data, code, and model generations can be found at https://github.com/LLiLab/llm4endangeredlang. 2024.findings-acl.925 @@ -19577,7 +19577,7 @@ BabakDamavandi Xin LunaDongFacebook ChristosFaloutsosAmazon and Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University SeungwhanMoonFacebook 247-266 Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named SnapNTell, specifically tailored for entity-centric VQA. This task aims to test the models’ capabilities in identifying entities and providing detailed, entity-specific knowledge. We have developed the SnapNTell Dataset, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses. The dataset is organized into 22 major categories, containing 7,568 unique entities in total. For each entity, we curated 10 illustrative images and crafted 10 knowledge-intensive QA pairs. To address this novel task, we devised a scalable, efficient, and transparent retrieval-augmented multimodal LLM. Our approach markedly outperforms existing methods on the SnapNTell dataset, achieving a 66.5% improvement in the BELURT score. @@ -24567,7 +24567,7 @@ and high variation in performance on the subset, suggesting our plausibility cri João DSMarquesInstituto Superior Técnico and INESC-ID MiguelGraça MiguelFreire - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University Arlindo L.Oliveira 6473-6486 Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content’s semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to identify the point within a group of sequential passages where the content begins to shift. To evaluate our method, we introduce GutenQA, a benchmark with 3000 “needle in a haystack” type of question-answer pairs derived from 100 public domain narrative books available on Project Gutenberg. Our experiments show that LumberChunker not only outperforms the most competitive baseline by 7.37% in retrieval performance (DCG@20) but also that, when integrated into a RAG pipeline, LumberChunker proves to be more effective than other chunking methods and competitive baselines, such as the Gemini 1.5M Pro. @@ -28060,7 +28060,7 @@ and high variation in performance on the subset, suggesting our plausibility cri <fixed-case>LL</fixed-case>a<fixed-case>MAX</fixed-case>: Scaling Linguistic Horizons of <fixed-case>LLM</fixed-case> by Enhancing Translation Capabilities Beyond 100 Languages YinquanLuShanghai AI Laboratory WenhaoZhuNanjing University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YuQiao FeiYuan 10748-10772 @@ -32433,7 +32433,7 @@ hai-coaching/ <fixed-case>H</fixed-case>yper<fixed-case>L</fixed-case>o<fixed-case>RA</fixed-case>: Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation ChuanchengLvTsinghua University, Tsinghua University - LeiLiTencent + LeiLiTencent ShitouZhang GangChen FanchaoQi diff --git a/data/xml/2024.iwslt.xml b/data/xml/2024.iwslt.xml index 4384df4c78..a824817398 100644 --- a/data/xml/2024.iwslt.xml +++ b/data/xml/2024.iwslt.xml @@ -328,7 +328,7 @@ BrianYanCarnegie Mellon University PatrickFernandesCarnegie Mellon University WilliamChenCarnegie Mellon University - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University GrahamNeubigCarnegie Mellon University ShinjiWatanabeCarnegie Mellon University 154-159 @@ -366,7 +366,7 @@ SiqiOuyangCarnegie Mellon University WilliamChenCarnegie Mellon University KarenLivescuTTI-Chicago - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University GrahamNeubigCarnegie Mellon University ShinjiWatanabeCarnegie Mellon University 164-169 diff --git a/data/xml/2024.lrec.xml b/data/xml/2024.lrec.xml index 94b70df131..2b6c32c612 100644 --- a/data/xml/2024.lrec.xml +++ b/data/xml/2024.lrec.xml @@ -9082,7 +9082,7 @@ QingYang DongliangXu YanquanZhou - LeiLi + LeiLi YuzeLi YingqiZhu 8792–8803 @@ -10424,7 +10424,7 @@ Large Language Models for Generative Recommendation: A Survey and Visionary Discussions - LeiLi + LeiLi YongfengZhang DugangLiu LiChen diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml index 4f3494d646..134e936cfb 100644 --- a/data/xml/2024.naacl.xml +++ b/data/xml/2024.naacl.xml @@ -8700,7 +8700,7 @@ MuhaoChenUC Davis ChaoweiXiaoUW-Madison HuanSunOSU - LeiLiCMU + LeiLiCMU LeonDerczynskiUW Seattle AnimaAnandkumarCaltech, NVIDIA FeiWangUSC diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index 7e21d6cd67..785a06210b 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -4615,7 +4615,7 @@ XuandongZhaoUniversity of California, Berkeley ChenwenLiao Yu-XiangWangUniversity of California, San Diego - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 6304-6316 Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection. Our code is publicly available at https://github.com/XuandongZhao/llm-watermark-location. 2025.acl-long.316 @@ -11968,7 +11968,7 @@ TianfangZhangTsinghua University ZongkaiWu Jenq-NengHwang - LeiLi + LeiLi 16780-16790 Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a major challenge. In this paper, we propose the **Deductive and Inductive (DID)** method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW, MR-GSM8K, and our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate great improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought), while maintaining lower computational costs. 2025.acl-long.820 @@ -17060,7 +17060,7 @@ Uncertainty-Aware Iterative Preference Optimization for Enhanced <fixed-case>LLM</fixed-case> Reasoning - LeiLiTencent + LeiLiTencent HehuanLiu YaxinZhou ZhaoYangGuiTencent @@ -19239,7 +19239,7 @@ Benchmarking Long-Context Language Models on Long Code Understanding JiaLi XuyuanGuoPeking University - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong KechiZhangPeking University GeLiPeking University JiaLiTsinghua University @@ -23304,7 +23304,7 @@ Design Choices for Extending the Context Length of Visual Language Models MukaiLi - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ShansanGong QiLiuUniversity of Hong Kong 33425-33438 diff --git a/data/xml/2025.coling.xml b/data/xml/2025.coling.xml index 5ff4b44d22..a7ac61845a 100644 --- a/data/xml/2025.coling.xml +++ b/data/xml/2025.coling.xml @@ -6219,7 +6219,7 @@ ZhaojiangLin YuningMao William YangWang - LeiLi + LeiLi Yi-ChiaWang 7819–7830 From ice cream flavors to climate change, people exhibit a wide array of opinions on various topics, and understanding the rationale for these opinions can promote healthy discussion and consensus among them. As such, it can be valuable for a large language model (LLM), particularly as an AI assistant, to be able to empathize with or even explain these various standpoints. In this work, we hypothesize that different topic stances often manifest correlations that can be used to extrapolate to topics with unknown opinions. We explore various prompting and fine-tuning methods to improve an LLM’s ability to (a) extrapolate from opinions on known topics to unknown ones and (b) support their extrapolation with reasoning. Our findings suggest that LLMs possess inherent knowledge from training data about these opinion correlations, and with minimal data, the similarities between human opinions and model-extrapolated opinions can be improved by more than 50%. Furthermore, LLM can generate the reasoning process behind their extrapolation of opinions. diff --git a/data/xml/2025.emnlp.xml b/data/xml/2025.emnlp.xml index 32787a68cc..d91a31816c 100644 --- a/data/xml/2025.emnlp.xml +++ b/data/xml/2025.emnlp.xml @@ -22082,7 +22082,7 @@ ShengWang JingweiDongthe University of Hong Kong KaiLiu - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong JiahuiGao JiyueJiang LingpengKongDepartment of Computer Science, The University of Hong Kong @@ -23892,7 +23892,7 @@ XiaonanLiFudan University MingZhongUniversity of Illinois Urbana Champaign ShansanGong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong JunZhangByteDance JingjingXu LingpengKongDepartment of Computer Science, The University of Hong Kong @@ -25167,7 +25167,7 @@ AdamOfficerUniversity of Pittsburgh Medical Center AngelaChen YufeiHuangUniversity of Pittsburgh - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 480-486 Comprehensive pathway datasets are essential resources for advancing biological research, yet constructing these datasets is labor intensive. Recognizing the labor-intensive nature of constructing these critical resources, we present BioGraphia, a web-based annotation platform designed to facilitate collaborative pathway graph annotation. BioGraphia supports multi-user collaboration with real-time monitoring, curation, and interactive pathway graph visualization. It enables users to directly annotate the nodes and relations on the candidate graph, guided by detailed instructions. The platform is further enhanced with a large language model that automatically generates explainable and span-aligned pre-annotation to accelerate the annotation process. Its modular design allows flexible integration of external knowledge bases, and customization of the definition of annotation schema and, to support adaptation to other graph-based annotation tasks. Code is available at https://github.com/LeiLiLab/BioGraphia 2025.emnlp-demos.34 diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index 3c597b3104..4538d945e4 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -3679,7 +3679,7 @@ A Practical Examination of <fixed-case>AI</fixed-case>-Generated Text Detectors for Large Language Models BrianTufts XuandongZhaoUniversity of California, Berkeley - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 4824-4841 The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, PHD, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate practical adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate. 2025.findings-naacl.271 @@ -5290,7 +5290,7 @@ XiXu WendaXu SiqiOuyangCMU, Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 7062-7067 Simultaneous speech translation (SimulST) systems must balance translation quality with response time, making latency measurement crucial for evaluating their real-world performance. However, there has been a longstanding belief that current metrics yield unrealistically high latency measurements in unsegmented streaming settings. In this paper, we investigate this phenomenon, revealing its root cause in a fundamental misconception underlying existing latency evaluation approaches. We demonstrate that this issue affects not only streaming but also segment-level latency evaluation across different metrics. Furthermore, we propose a modification to correctly measure computation-aware latency for SimulST systems, addressing the limitations present in existing metrics. 2025.findings-naacl.393 @@ -8591,7 +8591,7 @@ <fixed-case>I</fixed-case>nfini<fixed-case>SST</fixed-case>: Simultaneous Translation of Unbounded Speech with Large Language Model SiqiOuyangCMU, Carnegie Mellon University XiXu - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 3032-3046 Simultaneous translation of unbounded streaming speech remains a challenging problem due to the need for effectively processing the historical speech context and past translations so that quality and latency, including computation overhead, can be balanced. Most prior works assume pre-segmented speech, limiting their real-world applicability. In this paper, we propose InfiniSST, a novel approach that formulates SST as a multi-turn dialogue task, enabling seamless translation of unbounded speech. We construct translation trajectories and robust segments from MuST-C with multi-latency augmentation during training and develop a key-value (KV) cache management strategy to facilitate efficient inference. Experiments on MuST-C En-Es, En-De, and En-Zh demonstrate that InfiniSST reduces computation-aware latency by 0.5 to 1 second while maintaining the same translation quality compared to baselines. Ablation studies further validate the contributions of our data construction and cache management strategy. Code is released at https://github.com/LeiLiLab/InfiniSST. 2025.findings-acl.157 @@ -13785,7 +13785,7 @@ ZongkaiWu JohnLeeUniversity of Edinburgh, University of Edinburgh Jenq-NengHwang - LeiLi + LeiLi 10045-10056 In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image’s semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4. 2025.findings-acl.523 @@ -23213,7 +23213,7 @@ <fixed-case>L</fixed-case>ego<fixed-case>MT</fixed-case>2: Selective Asynchronous Sharded Data Parallel Training for Massive Neural Machine Translation FeiYuan YinquanLuShanghai AI Laboratory - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University JingjingXu 23359-23376 It is a critical challenge to learn a single model for massive languages. Prior methods focus on increasing the model size and training data size. However, large models are difficult to optimize efficiently even with distributed parallel training and translation capacity can interfere among languages. To address the challenge, we propose LegoMT2, an efficient training approach with an asymmetric multi-way model architecture for massive multilingual neural machine translation. LegoMT2 shards 435 languages into 8 language-centric groups and attributes one local encoder for each group’s languages and a mix encoder-decoder for all languages. LegoMT2 trains the model through local data parallel and asynchronous distributed updating of parameters. LegoMT2 is 16.2\times faster than the distributed training method for M2M-100-12B (which only for 100 languages) while improving the translation performance by an average of 2.2 BLEU on Flores-101, especially performing better for low-resource languages . @@ -27104,7 +27104,7 @@ JinyuanXu XueHe Jenq-NengHwang - LeiLi + LeiLi 1736-1750 Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment, however, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the Multi-Layer Attention Consistency Score (MACS), a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22% decrease in VRAM usage and 30% reduction in latency. 2025.findings-emnlp.91 @@ -28380,7 +28380,7 @@ XinglinZhangMedical Image Insights TaoChenUniversity of Waterloo Jenq-NengHwang - LeiLi + LeiLi 3456-3467 Contrast-enhanced 3D Medical imaging (e.g., CT, MRI) leverages phase sequences to uncover temporal dynamics vital for diagnosing tumors, lesions, and vascular issues. However, current retrieval models primarily focus on spatial features, neglecting phase-specific progression detailed in clinical reports. We present the **Phase-aware Memory Network (PAMN)**, a novel framework enhancing 3D medical image retrieval by fusing imaging phases with diagnostic text. PAMN creates rich radiological representations that enhance diagnostic accuracy by combining image details with clinical report context, rigorously tested on a novel phase-series dataset of 12,230 hospital CT scans. PAMN achieves an effective balance of performance and scalability in 3D radiology retrieval, outperforming state-of-the-art baselines through the robust fusion of spatial, temporal, and textual information. 2025.findings-emnlp.184 @@ -38256,7 +38256,7 @@ WenhaoZhuByteDance Inc. HanxuHuMicrosoft Research ConghuiHeShanghai AI Lab - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University ShujianHuangNanjing University FeiYuan 16751-16774 @@ -43657,7 +43657,7 @@ <fixed-case>A</fixed-case>uto<fixed-case>MIR</fixed-case>: Effective Zero-Shot Medical Information Retrieval without Relevance Labels - LeiLi + LeiLi XiangxuZhangRenmin University of China XiaoZhou ZhengLiu diff --git a/data/xml/2025.iwslt.xml b/data/xml/2025.iwslt.xml index dafa1acfc8..1696407540 100644 --- a/data/xml/2025.iwslt.xml +++ b/data/xml/2025.iwslt.xml @@ -406,7 +406,7 @@ <fixed-case>CMU</fixed-case>’s <fixed-case>IWSLT</fixed-case> 2025 Simultaneous Speech Translation System SiqiOuyangCarnegie Mellon University XiXuCarnegie Mellon University - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University 309-314 This paper presents CMU’s submission to the IWSLT 2025 Simultaneous Speech Translation (SST) task for translating unsegmented English speech into Chinese and German text in a streaming manner. Our end-to-end speech-to-text system integrates a chunkwise causal Wav2Vec 2.0 speech encoder, an adapter, and the Qwen2.5-7B-Instruct as the decoder. We use a two-stage simultaneous training procedure on robust speech segments synthesized from LibriSpeech, CommonVoice, and VoxPopuli datasets, utilizing standard cross-entropy loss. Our model supports adjustable latency through a configurable latency multiplier. Experimental results demonstrate that our system achieves 44.3 BLEU for English-to-Chinese and 25.1 BLEU for English-to-German translations on the ACL60/60 development set, with computation-aware latencies of 2.7 seconds and 2.3 seconds, and theoretical latencies of 2.2 and 1.7 seconds, respectively. 2025.iwslt-1.31 diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml index b6e65a2d99..1a63346744 100644 --- a/data/xml/2025.naacl.xml +++ b/data/xml/2025.naacl.xml @@ -1282,7 +1282,7 @@ SiyuYuan KaiZhang YikaiZhang - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YanghuaXiaoFudan University 1872-1888 Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence. @@ -3938,7 +3938,7 @@ ZhehuaiChen VitalyLavrukhinNVIDIA JagadeeshBalamNVIDIA - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University BorisGinsburgNVIDIA 5547-5557 Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text. Existing SMT methods only use the partial utterance that has already arrived at the input and the generated hypothesis. Motivated by human interpreters’ technique to forecast future words before hearing them, we propose Translation by Anticipating Future (TAF), a method to improve translation quality while retaining low latency. Its core idea is to use a large language model (LLM) to predict future source words and opportunistically translate without introducing too much risk. We evaluate our TAF and multiple baselines of SMT on four language directions. Experiments show that TAF achieves the best translation quality-latency trade-off and outperforms the baselines by up to 5 BLEU points at the same latency (three words). @@ -4961,7 +4961,7 @@ <fixed-case>I</fixed-case>mg<fixed-case>T</fixed-case>rojan: Jailbreaking Vision-Language Models with <fixed-case>ONE</fixed-case> Image XijiaTao ShuaiZhong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong QiLiuUniversity of Hong Kong LingpengKongDepartment of Computer Science, The University of Hong Kong 7048-7063 @@ -5567,7 +5567,7 @@ ShangZhou DanqingWangCMU, Carnegie Mellon University William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 7959-7973 Sampling is a basic operation for large language models (LLMs). In reinforcement learning rollouts and meta generation algorithms such as Best-of-N, it is essential to sample correct trajectories within a given compute budget. To find an optimal allocation for sample compute budgets, several choices need to be made:Which sampling configurations (model, temperature, language, etc.) to use?How many samples to generate in each configuration?We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations.Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks.is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.Our code and generations are released at https://github.com/LeiLiLab/OSCA. 2025.naacl-long.404 @@ -6287,7 +6287,7 @@ ChangMa ShuaiYuan QiushiSunUniversity of Hong Kong - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 9077-9090 The lottery ticket hypothesis posits the existence of “winning tickets” within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens’ embedding of LLaMA suffices to reach the fine-tuning translation performance . 2025.naacl-long.458 diff --git a/data/xml/D18.xml b/data/xml/D18.xml index 8f6b6858aa..d995e57508 100644 --- a/data/xml/D18.xml +++ b/data/xml/D18.xml @@ -6212,7 +6212,7 @@ HaoyueShi HaoZhou JiazeChen - LeiLi + LeiLi 4631–4641 D18-1492 D18-1492.Attachment.zip diff --git a/data/xml/D19.xml b/data/xml/D19.xml index 5848dd445b..061437e9ca 100644 --- a/data/xml/D19.xml +++ b/data/xml/D19.xml @@ -953,7 +953,7 @@ ZhixingTan JinsongSu DeyiXiong - LeiLi + LeiLi 803–812 In this study, we first investigate a novel capsule network with dynamic routing for linear time Neural Machine Translation (NMT), referred as CapsNMT. CapsNMT uses an aggregation mechanism to map the source sentence into a matrix with pre-determined size, and then applys a deep LSTM network to decode the target sequence from the source representation. Unlike the previous work (CITATION) to store the source sentence with a passive and bottom-up way, the dynamic routing policy encodes the source sentence with an iterative process to decide the credit attribution between nodes from lower and higher layers. CapsNMT has two core properties: it runs in time that is linear in the length of the sequences and provides a more flexible way to aggregate the part-whole information of the source sentence. On WMT14 English-German task and a larger WMT14 English-French task, CapsNMT achieves comparable results with the Transformer system. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems. D19-1074 @@ -4288,7 +4288,7 @@ FuliLuo ShunyaoLi PengchengYang - LeiLi + LeiLi BaobaoChang ZhifangSui XuSun @@ -8707,7 +8707,7 @@ The tutorial will bring researchers and practitioners to be aware of this issue, Discreteness in Neural Natural Language Processing LiliMou HaoZhou - LeiLi + LeiLi This tutorial provides a comprehensive guide to the process of discreteness in neural NLP. As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications. diff --git a/data/xml/K19.xml b/data/xml/K19.xml index bd138b9624..a909084b16 100644 --- a/data/xml/K19.xml +++ b/data/xml/K19.xml @@ -955,7 +955,7 @@ In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes - LeiLi + LeiLi WeiLiu MarinaLitvak NataliaVanetik diff --git a/data/xml/N18.xml b/data/xml/N18.xml index 9422c64272..1d475291bb 100644 --- a/data/xml/N18.xml +++ b/data/xml/N18.xml @@ -1409,7 +1409,7 @@ Reinforced Co-Training JiaweiWu - LeiLi + LeiLi William YangWang 1252–1262 Co-training is a popular semi-supervised learning framework to utilize a large amount of unlabeled data in addition to a small labeled set. Co-training methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. However, the selection of samples in existing co-training methods is based on a predetermined policy, which ignores the sampling bias between the unlabeled and the labeled subsets, and fails to explore the data space. In this paper, we propose a novel method, Reinforced Co-Training, to select high-quality unlabeled samples to better co-train on. More specifically, our approach uses Q-learning to learn a data selection policy with a small labeled dataset, and then exploits this policy to train the co-training classifiers automatically. Experimental results on clickbait detection and generic text classification tasks demonstrate that our proposed method can obtain more accurate text classification results. diff --git a/data/xml/P16.xml b/data/xml/P16.xml index 5dcb56b7d4..4010ce6a2d 100644 --- a/data/xml/P16.xml +++ b/data/xml/P16.xml @@ -817,7 +817,7 @@ <fixed-case>CFO</fixed-case>: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases ZihangDai - LeiLi + LeiLi WeiXu 800–810 P16-1076 diff --git a/data/xml/P19.xml b/data/xml/P19.xml index 0cdab3ee01..973cf97486 100644 --- a/data/xml/P19.xml +++ b/data/xml/P19.xml @@ -2488,7 +2488,7 @@ Enhancing Topic-to-Essay Generation with External Commonsense Knowledge PengchengYang - LeiLi + LeiLi FuliLuo TianyuLiu XuSun @@ -3286,7 +3286,7 @@ PengchengYang ZhihanZhang FuliLuo - LeiLi + LeiLi ChengyangHuang XuSun 2680–2686 @@ -7124,7 +7124,7 @@ HuangzhaoZhang HaoZhou NingMiao - LeiLi + LeiLi 5564–5569 Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHAoutperforms the baseline model on attacking capability. Adversarial training with MHA also leads to better robustness and performance. P19-1559 @@ -7669,7 +7669,7 @@ YuBao HaoZhou ShujianHuang - LeiLi + LeiLi LiliMou OlgaVechtomova Xin-yuDai @@ -7853,7 +7853,7 @@ YunxuanXiao YanruQu HaoZhou - LeiLi + LeiLi WeinanZhang YongYu 6140–6150 @@ -8732,7 +8732,7 @@ Automatic Generation of Personalized Comment Based on User Profile WenhuanZeng AbulikemuAbuduweili - LeiLi + LeiLi PengchengYang 229–235 Comments on social media are very diverse, in terms of content, style and vocabulary, which make generating comments much more challenging than other existing natural language generation (NLG) tasks. Besides, since different user has different expression habits, it is necessary to take the user’s profile into consideration when generating comments. In this paper, we introduce the task of automatic generation of personalized comment (AGPC) for social media. Based on tens of thousands of users’ real comments and corresponding user profiles on weibo, we propose Personalized Comment Generation Network (PCGN) for AGPC. The model utilizes user feature embedding with a gated memory and attends to user description to model personality of users. In addition, external user representation is taken into consideration during the decoding to enhance the comments generation. Experimental results show that our model can generate natural, human-like and personalized comments. diff --git a/data/xml/W13.xml b/data/xml/W13.xml index b3a11bbd51..351599ba29 100644 --- a/data/xml/W13.xml +++ b/data/xml/W13.xml @@ -5020,7 +5020,7 @@ Multi-document multilingual summarization corpus preparation, Part 1: <fixed-case>A</fixed-case>rabic, <fixed-case>E</fixed-case>nglish, <fixed-case>G</fixed-case>reek, <fixed-case>C</fixed-case>hinese, <fixed-case>R</fixed-case>omanian - LeiLi + LeiLi CorinaForascu MahmoudEl-Haj GeorgeGiannakopoulos @@ -5056,7 +5056,7 @@ <fixed-case>CIST</fixed-case> System Report for <fixed-case>ACL</fixed-case> <fixed-case>M</fixed-case>ulti<fixed-case>L</fixed-case>ing 2013 – Track 1: Multilingual Multi-document Summarization - LeiLi + LeiLi WeiHeng JiaYu YuLiu diff --git a/data/xml/W14.xml b/data/xml/W14.xml index 493fcd347b..9cdb41c81d 100644 --- a/data/xml/W14.xml +++ b/data/xml/W14.xml @@ -11786,7 +11786,7 @@ XiaoyueCong FangHuang HongfaXue - LeiLi + LeiLi ZhiqiaoGao 114–119 W14-6818 diff --git a/data/xml/W16.xml b/data/xml/W16.xml index 41fd606e0d..33d4e950d9 100644 --- a/data/xml/W16.xml +++ b/data/xml/W16.xml @@ -2289,7 +2289,7 @@ <fixed-case>CIST</fixed-case> System for <fixed-case>CL</fixed-case>-<fixed-case>S</fixed-case>ci<fixed-case>S</fixed-case>umm 2016 Shared Task - LeiLi + LeiLi LiyuanMao YazhaoZhang JunqiChi diff --git a/data/xml/W17.xml b/data/xml/W17.xml index 06300c5187..4e65bc6b2b 100644 --- a/data/xml/W17.xml +++ b/data/xml/W17.xml @@ -1679,7 +1679,7 @@ Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums - LeiLi + LeiLi LiyuanMao MoyeChen 32–36 @@ -4186,7 +4186,7 @@ is able to handle phenomena related to scope by means of an higher-order type th DanchenZhang DaqingHe SanqiangZhao - LeiLi + LeiLi 263–271 W17-2333 10.18653/v1/W17-2333 diff --git a/data/xml/W19.xml b/data/xml/W19.xml index 441404c593..51ef0d3e4e 100644 --- a/data/xml/W19.xml +++ b/data/xml/W19.xml @@ -17436,7 +17436,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la YaoFu HaoZhou JiazeChen - LeiLi + LeiLi 24–33 Text attribute transfer is modifying certain linguistic attributes (e.g. sentiment, style, author-ship, etc.) of a sentence and transforming them from one type to another. In this paper, we aim to analyze and interpret what is changed during the transfer process. We start from the observation that in many existing models and datasets, certain words within a sentence play important roles in determining the sentence attribute class. These words are referred as the Pivot Words. Based on these pivot words, we propose a lexical analysis framework, the Pivot Analysis, to quantitatively analyze the effects of these words in text attribute classification and transfer. We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification,while leaving higher-level sentence structures unchanged. Our work provides an in-depth understanding of linguistic attribute transfer and further identifies the future requirements and challenges of this task W19-8604 @@ -18512,7 +18512,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la Multi-lingual <fixed-case>W</fixed-case>ikipedia Summarization and Title Generation On Low Resource Corpus WeiLiu - LeiLi + LeiLi ZuyingHuang YinanLiu 17–25 diff --git a/data/xml/Y06.xml b/data/xml/Y06.xml index 8d5cfaeb92..317f8c12a5 100644 --- a/data/xml/Y06.xml +++ b/data/xml/Y06.xml @@ -669,7 +669,7 @@ Research on <fixed-case>O</fixed-case>lympics-oriented Mobile Game News Ordering System YongguiYang - LeiLi + LeiLi 459–462 Y06-1069 http://hdl.handle.net/2065/29047 diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 20ab849f87..07a967d937 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5738,6 +5738,9 @@ - canonical: {first: Junhui, last: Li} variants: - {first: JunHui, last: Li} +- canonical: {first: Lei, last: Li} + id: lei-li + comment: May refer to several people - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From b9db84b1b88c9aeb4129bc3f27f063ae7119f5f1 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:54:33 +0100 Subject: [PATCH 02/19] Add Lei Li (CMU) and edit id for all orcid-tagged papers - add Lei Li (Carnegie Mellon University) as a person including orcid (ending in `-9776`) and institution of degree - change id for all papers with this orcid away from the catch-all to the specific `lei-li-cmu` --- data/xml/2022.acl.xml | 8 ++++---- data/xml/2022.findings.xml | 8 ++++---- data/xml/2022.naacl.xml | 4 ++-- data/xml/2023.acl.xml | 8 ++++---- data/xml/2023.findings.xml | 2 +- data/xml/2024.emnlp.xml | 8 ++++---- data/xml/2024.findings.xml | 10 +++++----- data/xml/2025.acl.xml | 2 +- data/xml/2025.emnlp.xml | 2 +- data/xml/2025.findings.xml | 10 +++++----- data/xml/2025.naacl.xml | 8 ++++---- data/yaml/name_variants.yaml | 5 +++++ 12 files changed, 40 insertions(+), 35 deletions(-) diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index 4fab3c8f51..72e6ef489f 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -707,7 +707,7 @@ QianDong YaomingZhu MingxuanWang - LeiLi + LeiLi 680-694 How to find proper moments to generate partial sentence translation given a streaming speech input? Existing approaches waiting-and-translating for a fixed duration often break the acoustic units in speech, since the boundaries between acoustic units in speech are not even. In this paper, we propose MoSST, a simple yet effective method for translating streaming speech content. Given a usually long speech sequence, we develop an efficient monotonic segmentation module inside an encoder-decoder model to accumulate acoustic information incrementally and detect proper speech unit boundaries for the input in speech translation task. Experiments on multiple translation directions of the MuST-C dataset show that outperforms existing methods and achieves the best trade-off between translation quality (BLEU) and latency. Our code is available at https://github.com/dqqcasia/mosst. 2022.acl-long.50 @@ -2657,7 +2657,7 @@ WangchunshuZhou JingjingXu HaoZhou - LeiLi + LeiLi 2701-2714 Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due to the pervasiveness, it naturally raises an interesting question: how do masked language models (MLMs) learn contextual representations? In this work, we analyze the learning dynamics of MLMs and find that it adopts sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these problems, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. To be specific, TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over MLM. 2022.acl-long.193 @@ -6668,7 +6668,7 @@ in the Case of Unambiguous Gender <fixed-case>STEMM</fixed-case>: Self-learning with Speech-text Manifold Mixup for Speech Translation QingkaiFang RongYe - LeiLi + LeiLi YangFeng MingxuanWang 7050-7062 @@ -7867,7 +7867,7 @@ in the Case of Unambiguous Gender LihuaQian XinyuDai JiajunChen - LeiLi + LeiLi 8398-8409 Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm. 2022.acl-long.575 diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 8678f386d4..3cf3a0f0f9 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -880,7 +880,7 @@ XuandongZhao ZhiguoYu MingWu - LeiLi + LeiLi 774-781 How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2×) and memory usage (8.0×) compared with state-of-the-art large models. Our implementation is available at https://github.com/XuandongZhao/HPD. 2022.findings-acl.64 @@ -3803,7 +3803,7 @@ ChengqiZhao ShujianHuang JiajunChen - LeiLi + LeiLi 3537-3548 This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer model and hope to answer the following question: Is the capacity of current models strong enough for document-level translation? Interestingly, we observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words. We evaluate this model and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that document-level Transformer models outperforms sentence-level ones and many previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation. 2022.findings-acl.279 @@ -4226,7 +4226,7 @@ ZhongqiaoLi XinboZhang ChangzhiSun - LeiLi + LeiLi YanghuaXiao HaoZhou 3941-3955 @@ -7198,7 +7198,7 @@ JingjingXu JiazeChen HaoZhou - LeiLi + LeiLi 2508-2527 We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at https://github.com/zide05/MTG. 2022.findings-naacl.192 diff --git a/data/xml/2022.naacl.xml b/data/xml/2022.naacl.xml index 6c56c243dc..1be30a3583 100644 --- a/data/xml/2022.naacl.xml +++ b/data/xml/2022.naacl.xml @@ -973,7 +973,7 @@ Provably Confidential Language Modelling XuandongZhao - LeiLi + LeiLi Yu-XiangWang 943-955 Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality. @@ -5242,7 +5242,7 @@ Cross-modal Contrastive Learning for Speech Translation RongYe MingxuanWang - LeiLi + LeiLi 5099-5113 How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation. To this end, we propose ConST, a cross-modal contrastive learning method for end-to-end speech-to-text translation. We evaluate ConST and a variety of previous baselines on a popular benchmark MuST-C. Experiments show that the proposed ConST consistently outperforms the previous methods, and achieves an average BLEU of 29.4. The analysis further verifies that ConST indeed closes the representation gap of different modalities — its learned representation improves the accuracy of cross-modal speech-text retrieval from 4% to 88%. Code and models are available at https://github.com/ReneeYe/ConST. 2022.naacl-main.376 diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml index e0dcf75007..22590a6adf 100644 --- a/data/xml/2023.acl.xml +++ b/data/xml/2023.acl.xml @@ -3036,7 +3036,7 @@ <fixed-case>WACO</fixed-case>: Word-Aligned Contrastive Learning for Speech Translation SiqiOuyangUniversity of California, Santa Barbara RongYeByteDance AI Lab - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 3891-3907 End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model’s performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO. 2023.acl-long.216 @@ -4007,7 +4007,7 @@ WendaXuUniversity of California at Santa Barbara XianQianByteDance AI LAB MingxuanWangBytedance AI Lab - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara William YangWangUnversity of California, Santa Barbara 5166-5183 Is it possible to train a general metric for evaluating text generation quality without human-annotated ratings? Existing learned metrics either perform unsatisfactory across text generation tasks or require human ratings for training on specific tasks. In this paper, we propose SEScore2, a self-supervised approach for training a model-based metric for text generation evaluation. The key concept is to synthesize realistic model mistakes by perturbing sentences retrieved from a corpus. We evaluate SEScore2 and previous methods on four text generation tasks across three languages. SEScore2 outperforms all prior unsupervised metrics on four text generation evaluation benchmarks, with an average Kendall improvement of 0.158. Surprisingly, SEScore2 even outperforms the supervised BLEURT and COMET on multiple text generation tasks. @@ -7899,7 +7899,7 @@ WeiShiFudan University ZiquanFuSystem, Inc SijieChengFudan University - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara YanghuaXiaoFudan University 9890-9908 Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as “lions don’t live in the ocean”, is also ubiquitous in the world but rarely mentioned explicitly in text. What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict. @@ -12505,7 +12505,7 @@ SiqiOuyangUniversity of California, Santa Barbara ZhiguoYuMicrosoft MingWuGitHub, Inc. - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 15590-15606 How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, paraphrasing, and multiple-choice question answering. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 15.6% on the GLUE benchmark. Our source code is available at https://anonymous.4open.science/r/NPPrompt. 2023.acl-long.869 diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml index b5af6bc1be..b67818d037 100644 --- a/data/xml/2023.findings.xml +++ b/data/xml/2023.findings.xml @@ -12344,7 +12344,7 @@ YinquanLuShanghai AI Laboratory WenhaoZhuNational Key Laboratory for Novel Software Technology, Nanjing University LingpengKongThe University of Hong Kong - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara YuQiaoShanghai AI Lab JingjingXuShanghai AI Lab 11518-11533 diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml index b71bb6a0e9..955132db1c 100644 --- a/data/xml/2024.emnlp.xml +++ b/data/xml/2024.emnlp.xml @@ -912,7 +912,7 @@ ZhiyongWuShanghai Artificial Intelligence Laboratory BaobaoChangPeking University XuSun - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University ZhifangSuiPeking University 1107-1128 With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL. @@ -8701,7 +8701,7 @@ WendaXu JiachenLiUniversity of California, Santa Barbara William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 11125-11139 Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of online training. Specifically, we identify that the learned LLM should adhere to the proximity of the behavior LLM, which collects the training samples. To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment.We conduct extensive experiments to validate the effectiveness and applicability of our approach by integrating it with various DAP methods, resulting in significant performance improvements across a wide range of tasks when training with the same amount of preference data. Even when only introducing one additional data collection phase, our online BPO improves its offline DAP baseline from 72.0% to 80.2% on TL;DR and from 82.2% to 89.1% on Anthropic Helpfulness in terms of win rate against human reference text. 2024.emnlp-main.623 @@ -10250,7 +10250,7 @@ HanlinZhuElectrical Engineering & Computer Science Department, University of California Berkeley XiaomengYangGoogle DeepMind AndrewCohen - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YuandongTianMeta AI (FAIR) 13274-13292 Recent research has increasingly focused on evaluating large language models’ (LLMs) alignment with diverse human values and preferences, particularly for open-ended tasks like story generation. Traditional evaluation metrics rely heavily on lexical similarity with human-written references, often showing poor correlation with human judgments and failing to account for alignment with the diversity of human preferences. To address these challenges, we introduce PerSE, an interpretable evaluation framework designed to assess alignment with specific human preferences. It is tuned to infer specific preferences from an in-context personal profile and evaluate the alignment between the generated content and personal preferences. PerSE enhances interpretability by providing detailed comments and fine-grained scoring, facilitating more personalized content generation. Our 13B LLaMA-2-based PerSE shows a 15.8% increase in Kendall correlation and a 13.7% rise in accuracy with zero-shot reviewers compared to GPT-4. It also outperforms GPT-4 by 46.01% in Kendall correlation on new domains, indicating its transferability @@ -18075,7 +18075,7 @@ WendaXu XiXu SiqiOuyangCMU, Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 344-350 With the rapid advancement of machine translation research, evaluation toolkits have become essential for benchmarking system progress. Tools like COMET and SacreBLEU offer single quality score assessments that are effective for pairwise system comparisons. However, these tools provide limited insights for fine-grained system-level comparisons and the analysis of instance-level defects. To address these limitations, we introduce Translation Canvas, an explainable interface designed to pinpoint and analyze translation systems’ performance: 1) Translation Canvas assists machine translation researchers in comprehending system-level model performance by identifying common errors (their frequency and severity) and analyzing relationships between different systems based on various evaluation metrics. 2) It supports fine-grained analysis by highlighting error spans with explanations and selectively displaying systems’ predictions. According to human evaluation, Translation Canvas demonstrates superior performance over COMET and SacreBLEU packages under enjoybility and understandbility criteria. 2024.emnlp-demo.36 diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index 52ad9d01e8..9fddf83d10 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -15929,7 +15929,7 @@ FeiYuan ShuaiYuan ZhiyongWuShanghai Artificial Intelligence Laboratory - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 12111-12130 Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM’s multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant . 2024.findings-acl.721 @@ -18707,7 +18707,7 @@ ZhenqiaoSong TaiqiHe William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 15654-15669 How can large language models (LLMs) process and translate endangered languages? Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely perform well in unseen, endangered languages. On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary. We propose LingoLLM, a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training. Our key insight is to demonstrate linguistic knowledge of an unseen language in an LLM’s prompt, including a dictionary, a grammar book, and morphologically analyzed input text. We implement LingoLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages. Our results show that LingoLLM elevates translation capability from GPT-4’s 0 to 10.5 BLEU for 10 language directions. Our findings demonstrate the tremendous value of linguistic knowledge in the age of LLMs for endangered languages. Our data, code, and model generations will be released to the public. Our data, code, and model generations can be found at https://github.com/LLiLab/llm4endangeredlang. 2024.findings-acl.925 @@ -19577,7 +19577,7 @@ BabakDamavandi Xin LunaDongFacebook ChristosFaloutsosAmazon and Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University SeungwhanMoonFacebook 247-266 Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named SnapNTell, specifically tailored for entity-centric VQA. This task aims to test the models’ capabilities in identifying entities and providing detailed, entity-specific knowledge. We have developed the SnapNTell Dataset, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses. The dataset is organized into 22 major categories, containing 7,568 unique entities in total. For each entity, we curated 10 illustrative images and crafted 10 knowledge-intensive QA pairs. To address this novel task, we devised a scalable, efficient, and transparent retrieval-augmented multimodal LLM. Our approach markedly outperforms existing methods on the SnapNTell dataset, achieving a 66.5% improvement in the BELURT score. @@ -24567,7 +24567,7 @@ and high variation in performance on the subset, suggesting our plausibility cri João DSMarquesInstituto Superior Técnico and INESC-ID MiguelGraça MiguelFreire - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University Arlindo L.Oliveira 6473-6486 Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content’s semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to identify the point within a group of sequential passages where the content begins to shift. To evaluate our method, we introduce GutenQA, a benchmark with 3000 “needle in a haystack” type of question-answer pairs derived from 100 public domain narrative books available on Project Gutenberg. Our experiments show that LumberChunker not only outperforms the most competitive baseline by 7.37% in retrieval performance (DCG@20) but also that, when integrated into a RAG pipeline, LumberChunker proves to be more effective than other chunking methods and competitive baselines, such as the Gemini 1.5M Pro. @@ -28060,7 +28060,7 @@ and high variation in performance on the subset, suggesting our plausibility cri <fixed-case>LL</fixed-case>a<fixed-case>MAX</fixed-case>: Scaling Linguistic Horizons of <fixed-case>LLM</fixed-case> by Enhancing Translation Capabilities Beyond 100 Languages YinquanLuShanghai AI Laboratory WenhaoZhuNanjing University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YuQiao FeiYuan 10748-10772 diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index 785a06210b..0e9d3ba73e 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -4615,7 +4615,7 @@ XuandongZhaoUniversity of California, Berkeley ChenwenLiao Yu-XiangWangUniversity of California, San Diego - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 6304-6316 Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection. Our code is publicly available at https://github.com/XuandongZhao/llm-watermark-location. 2025.acl-long.316 diff --git a/data/xml/2025.emnlp.xml b/data/xml/2025.emnlp.xml index d91a31816c..bf6c5000fe 100644 --- a/data/xml/2025.emnlp.xml +++ b/data/xml/2025.emnlp.xml @@ -25167,7 +25167,7 @@ AdamOfficerUniversity of Pittsburgh Medical Center AngelaChen YufeiHuangUniversity of Pittsburgh - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 480-486 Comprehensive pathway datasets are essential resources for advancing biological research, yet constructing these datasets is labor intensive. Recognizing the labor-intensive nature of constructing these critical resources, we present BioGraphia, a web-based annotation platform designed to facilitate collaborative pathway graph annotation. BioGraphia supports multi-user collaboration with real-time monitoring, curation, and interactive pathway graph visualization. It enables users to directly annotate the nodes and relations on the candidate graph, guided by detailed instructions. The platform is further enhanced with a large language model that automatically generates explainable and span-aligned pre-annotation to accelerate the annotation process. Its modular design allows flexible integration of external knowledge bases, and customization of the definition of annotation schema and, to support adaptation to other graph-based annotation tasks. Code is available at https://github.com/LeiLiLab/BioGraphia 2025.emnlp-demos.34 diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index 4538d945e4..7d74c6c282 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -3679,7 +3679,7 @@ A Practical Examination of <fixed-case>AI</fixed-case>-Generated Text Detectors for Large Language Models BrianTufts XuandongZhaoUniversity of California, Berkeley - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 4824-4841 The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, PHD, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate practical adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate. 2025.findings-naacl.271 @@ -5290,7 +5290,7 @@ XiXu WendaXu SiqiOuyangCMU, Carnegie Mellon University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 7062-7067 Simultaneous speech translation (SimulST) systems must balance translation quality with response time, making latency measurement crucial for evaluating their real-world performance. However, there has been a longstanding belief that current metrics yield unrealistically high latency measurements in unsegmented streaming settings. In this paper, we investigate this phenomenon, revealing its root cause in a fundamental misconception underlying existing latency evaluation approaches. We demonstrate that this issue affects not only streaming but also segment-level latency evaluation across different metrics. Furthermore, we propose a modification to correctly measure computation-aware latency for SimulST systems, addressing the limitations present in existing metrics. 2025.findings-naacl.393 @@ -8591,7 +8591,7 @@ <fixed-case>I</fixed-case>nfini<fixed-case>SST</fixed-case>: Simultaneous Translation of Unbounded Speech with Large Language Model SiqiOuyangCMU, Carnegie Mellon University XiXu - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 3032-3046 Simultaneous translation of unbounded streaming speech remains a challenging problem due to the need for effectively processing the historical speech context and past translations so that quality and latency, including computation overhead, can be balanced. Most prior works assume pre-segmented speech, limiting their real-world applicability. In this paper, we propose InfiniSST, a novel approach that formulates SST as a multi-turn dialogue task, enabling seamless translation of unbounded speech. We construct translation trajectories and robust segments from MuST-C with multi-latency augmentation during training and develop a key-value (KV) cache management strategy to facilitate efficient inference. Experiments on MuST-C En-Es, En-De, and En-Zh demonstrate that InfiniSST reduces computation-aware latency by 0.5 to 1 second while maintaining the same translation quality compared to baselines. Ablation studies further validate the contributions of our data construction and cache management strategy. Code is released at https://github.com/LeiLiLab/InfiniSST. 2025.findings-acl.157 @@ -23213,7 +23213,7 @@ <fixed-case>L</fixed-case>ego<fixed-case>MT</fixed-case>2: Selective Asynchronous Sharded Data Parallel Training for Massive Neural Machine Translation FeiYuan YinquanLuShanghai AI Laboratory - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University JingjingXu 23359-23376 It is a critical challenge to learn a single model for massive languages. Prior methods focus on increasing the model size and training data size. However, large models are difficult to optimize efficiently even with distributed parallel training and translation capacity can interfere among languages. To address the challenge, we propose LegoMT2, an efficient training approach with an asymmetric multi-way model architecture for massive multilingual neural machine translation. LegoMT2 shards 435 languages into 8 language-centric groups and attributes one local encoder for each group’s languages and a mix encoder-decoder for all languages. LegoMT2 trains the model through local data parallel and asynchronous distributed updating of parameters. LegoMT2 is 16.2\times faster than the distributed training method for M2M-100-12B (which only for 100 languages) while improving the translation performance by an average of 2.2 BLEU on Flores-101, especially performing better for low-resource languages . @@ -38256,7 +38256,7 @@ WenhaoZhuByteDance Inc. HanxuHuMicrosoft Research ConghuiHeShanghai AI Lab - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University ShujianHuangNanjing University FeiYuan 16751-16774 diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml index 1a63346744..793f627f0c 100644 --- a/data/xml/2025.naacl.xml +++ b/data/xml/2025.naacl.xml @@ -1282,7 +1282,7 @@ SiyuYuan KaiZhang YikaiZhang - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University YanghuaXiaoFudan University 1872-1888 Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence. @@ -3938,7 +3938,7 @@ ZhehuaiChen VitalyLavrukhinNVIDIA JagadeeshBalamNVIDIA - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University BorisGinsburgNVIDIA 5547-5557 Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text. Existing SMT methods only use the partial utterance that has already arrived at the input and the generated hypothesis. Motivated by human interpreters’ technique to forecast future words before hearing them, we propose Translation by Anticipating Future (TAF), a method to improve translation quality while retaining low latency. Its core idea is to use a large language model (LLM) to predict future source words and opportunistically translate without introducing too much risk. We evaluate our TAF and multiple baselines of SMT on four language directions. Experiments show that TAF achieves the best translation quality-latency trade-off and outperforms the baselines by up to 5 BLEU points at the same latency (three words). @@ -5567,7 +5567,7 @@ ShangZhou DanqingWangCMU, Carnegie Mellon University William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 7959-7973 Sampling is a basic operation for large language models (LLMs). In reinforcement learning rollouts and meta generation algorithms such as Best-of-N, it is essential to sample correct trajectories within a given compute budget. To find an optimal allocation for sample compute budgets, several choices need to be made:Which sampling configurations (model, temperature, language, etc.) to use?How many samples to generate in each configuration?We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations.Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks.is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.Our code and generations are released at https://github.com/LeiLiLab/OSCA. 2025.naacl-long.404 @@ -6287,7 +6287,7 @@ ChangMa ShuaiYuan QiushiSunUniversity of Hong Kong - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 9077-9090 The lottery ticket hypothesis posits the existence of “winning tickets” within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens’ embedding of LLaMA suffices to reach the fine-tuning translation performance . 2025.naacl-long.458 diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 07a967d937..1b604107b4 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5741,6 +5741,11 @@ - canonical: {first: Lei, last: Li} id: lei-li comment: May refer to several people +- canonical: {first: Lei, last: Li} + id: lei-li-cmu + orcid: 0000-0003-3095-9776 + comment: Carnegie Mellon University + institution: Carnegie Mellon University - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 7afd769fdc5a2d490a0d6ea0cdddf24afa1b414a Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:59:27 +0100 Subject: [PATCH 03/19] Add Lei Li (HKU) and edit id for all orcid-tagged papers - add Lei Li (University of Hong Kong) as a person including orcid (ending in `-5104`) and institution of degree - change id for all papers with this orcid away from the catch-all to the specific `lei-li-hku` --- data/xml/2024.emnlp.xml | 4 ++-- data/xml/2024.findings.xml | 4 ++-- data/xml/2025.acl.xml | 4 ++-- data/xml/2025.emnlp.xml | 4 ++-- data/xml/2025.naacl.xml | 2 +- data/yaml/name_variants.yaml | 5 +++++ 6 files changed, 14 insertions(+), 9 deletions(-) diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml index 955132db1c..92ce3743bb 100644 --- a/data/xml/2024.emnlp.xml +++ b/data/xml/2024.emnlp.xml @@ -902,7 +902,7 @@ A Survey on In-context Learning QingxiuDong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong DamaiDai CeZhengPeking University JingyuanMa @@ -5036,7 +5036,7 @@ <fixed-case>VLF</fixed-case>eedback: A Large-Scale <fixed-case>AI</fixed-case> Feedback Dataset for Large Vision-Language Models Alignment - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ZhihuiXieShanghai Jiao Tong University MukaiLi ShunianChenShenzhen Research Institute of Big Data diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index 9fddf83d10..2b12bb3577 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -8815,7 +8815,7 @@ Red Teaming Visual Language Models MukaiLi - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong YuweiYin MasoodAhmed ZhenguangLiuZhejiang University @@ -13143,7 +13143,7 @@ YiLiuPeking University YuxiangWang ShuhuaiRen - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong SishuoChenAlibaba Group XuSun LuHouHuawei Technologies Ltd. diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index 0e9d3ba73e..f8b0c3ebc7 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -19239,7 +19239,7 @@ Benchmarking Long-Context Language Models on Long Code Understanding JiaLi XuyuanGuoPeking University - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong KechiZhangPeking University GeLiPeking University JiaLiTsinghua University @@ -23304,7 +23304,7 @@ Design Choices for Extending the Context Length of Visual Language Models MukaiLi - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ShansanGong QiLiuUniversity of Hong Kong 33425-33438 diff --git a/data/xml/2025.emnlp.xml b/data/xml/2025.emnlp.xml index bf6c5000fe..6a1a3cc5ae 100644 --- a/data/xml/2025.emnlp.xml +++ b/data/xml/2025.emnlp.xml @@ -22082,7 +22082,7 @@ ShengWang JingweiDongthe University of Hong Kong KaiLiu - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong JiahuiGao JiyueJiang LingpengKongDepartment of Computer Science, The University of Hong Kong @@ -23892,7 +23892,7 @@ XiaonanLiFudan University MingZhongUniversity of Illinois Urbana Champaign ShansanGong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong JunZhangByteDance JingjingXu LingpengKongDepartment of Computer Science, The University of Hong Kong diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml index 793f627f0c..62936b7d90 100644 --- a/data/xml/2025.naacl.xml +++ b/data/xml/2025.naacl.xml @@ -4961,7 +4961,7 @@ <fixed-case>I</fixed-case>mg<fixed-case>T</fixed-case>rojan: Jailbreaking Vision-Language Models with <fixed-case>ONE</fixed-case> Image XijiaTao ShuaiZhong - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong QiLiuUniversity of Hong Kong LingpengKongDepartment of Computer Science, The University of Hong Kong 7048-7063 diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 1b604107b4..c70af33eef 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5746,6 +5746,11 @@ orcid: 0000-0003-3095-9776 comment: Carnegie Mellon University institution: Carnegie Mellon University +- canonical: {first: Lei, last: Li} + id: lei-li-hku + orcid: 0009-0008-6984-5104 + comment: University of Hong Kong + institution: University of Hong Kong - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 4535c91d2a71a0fc040f07b5b1e1d2beaeed8149 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 01:47:01 +0100 Subject: [PATCH 04/19] bsed on works in orcid.org add more entries to `lei-li-hku/cmu` though one case strange: 2024 emnlp of hku was listed on cmu orcid site --- data/xml/2020.acl.xml | 4 ++-- data/xml/2020.emnlp.xml | 6 +++--- data/xml/2020.findings.xml | 4 ++-- data/xml/2020.wmt.xml | 4 ++-- data/xml/2021.acl.xml | 14 +++++++------- data/xml/2021.eacl.xml | 2 +- data/xml/2021.emnlp.xml | 6 +++--- data/xml/2021.findings.xml | 12 ++++++------ data/xml/2021.iwslt.xml | 2 +- data/xml/2021.naacl.xml | 8 ++++---- data/xml/2021.wmt.xml | 2 +- data/xml/2022.findings.xml | 6 +++--- data/xml/2022.iwslt.xml | 2 +- data/xml/2023.emnlp.xml | 4 ++-- data/xml/2023.findings.xml | 4 ++-- data/xml/2024.acl.xml | 2 +- data/xml/2024.findings.xml | 4 ++-- data/xml/2024.iwslt.xml | 4 ++-- data/xml/2025.coling.xml | 2 +- data/xml/2025.iwslt.xml | 2 +- data/xml/D18.xml | 2 +- data/xml/D19.xml | 4 ++-- data/xml/N18.xml | 2 +- data/xml/P16.xml | 2 +- data/xml/P19.xml | 12 ++++++------ data/xml/W19.xml | 2 +- 26 files changed, 59 insertions(+), 59 deletions(-) diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml index 3918997eda..7b0399e1a1 100644 --- a/data/xml/2020.acl.xml +++ b/data/xml/2020.acl.xml @@ -4234,7 +4234,7 @@ NingMiao YuxuanSong HaoZhou - LeiLi + LeiLi 3436–3441 It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach. 2020.acl-main.314 @@ -10481,7 +10481,7 @@ XijinZhang SongchengJiang YuxuanWang - LeiLi + LeiLi 1–8 This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four inte- gral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multi- lingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms. 2020.acl-demos.1 diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml index 74ace32d5f..55e6af0b21 100644 --- a/data/xml/2020.emnlp.xml +++ b/data/xml/2020.emnlp.xml @@ -1707,7 +1707,7 @@ ShuangZeng RunxinXu BaobaoChang - LeiLi + LeiLi 1630–1640 Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across paragraphs. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN), a method to recognize such relations for long paragraphs. GAIN constructs two graphs, a heterogeneous mention-level graph (MG) and an entity-level graph (EG). The former captures complex interaction among different mentions and the latter aggregates mentions underlying for the same entities. Based on the graphs we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at https://github.com/PKUnlp-icler/GAIN. 2020.emnlp-main.127 @@ -2836,7 +2836,7 @@ XipengQiu JiangtaoFeng HaoZhou - LeiLi + LeiLi 2649–2663 We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github.com/linzehui/mRASP. 2020.emnlp-main.210 @@ -9842,7 +9842,7 @@ JunxianHe MingxuanWang YimingYang - LeiLi + LeiLi 9119–9130 Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow. 2020.emnlp-main.733 diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml index 28c4cdd206..63b5c7021e 100644 --- a/data/xml/2020.findings.xml +++ b/data/xml/2020.findings.xml @@ -1465,7 +1465,7 @@ Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced <fixed-case>M</fixed-case>onte-<fixed-case>C</fixed-case>arlo Approach MaosenZhang NanJiang - LeiLi + LeiLi YexiangXue 1286–1298 Generating natural language under complex constraints is a principled formulation towards controllable text generation. We present a framework to allow specification of combinatorial constraints for sentence generation. We propose TSMC, an efficient method to generate high likelihood sentences with respect to a pre-trained language model while satisfying the constraints. Our approach is highly flexible, requires no task-specific train- ing, and leverages efficient constraint satisfaction solving techniques. To better handle the combinatorial constraints, a tree search algorithm is embedded into the proposal process of the Markov Chain Monte Carlo (MCMC) to explore candidates that satisfy more constraints. Compared to existing MCMC approaches, our sampling approach has a better mixing performance. Experiments show that TSMC achieves consistent and significant improvement on multiple language generation tasks. @@ -5726,7 +5726,7 @@ MingxuanWang WeinanZhang YongYu - LeiLi + LeiLi 4908–4917 Active learning for sentence understanding aims at discovering informative unlabeled data for annotation and therefore reducing the demand for labeled data. We argue that the typical uncertainty sampling method for active learning is time-consuming and can hardly work in real-time, which may lead to ineffective sample selection. We propose adversarial uncertainty sampling in discrete space (AUSDS) to retrieve informative unlabeled samples more efficiently. AUSDS maps sentences into latent space generated by the popular pre-trained language models, and discover informative unlabeled text samples for annotation via adversarial attack. The proposed approach is extremely efficient compared with traditional uncertainty sampling with more than 10x speedup. Experimental results on five datasets show that AUSDS outperforms strong baselines on effectiveness. 2020.findings-emnlp.441 diff --git a/data/xml/2020.wmt.xml b/data/xml/2020.wmt.xml index 56f716fc66..c516c6e1fe 100644 --- a/data/xml/2020.wmt.xml +++ b/data/xml/2020.wmt.xml @@ -471,7 +471,7 @@ ZehuiLin YaomingZhu MingxuanWang - LeiLi + LeiLi 305–312 This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training. 2020.wmt-1.33 @@ -1443,7 +1443,7 @@ ZhuoZhi JunCao MingxuanWang - LeiLi + LeiLi 985–990 In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining mod- ule adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions. 2020.wmt-1.112 diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index c2467d6434..a0b841b9de 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -284,7 +284,7 @@ ChangzhiSun YuanbinWu HaoZhou - LeiLi + LeiLi JunchiYan 220–231 Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks’ label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell’s label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster. @@ -315,7 +315,7 @@ XiaoPan MingxuanWang LiweiWu - LeiLi + LeiLi 244–258 Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual baseline 2021.acl-long.21 @@ -364,7 +364,7 @@ ZehuiLin LiweiWu MingxuanWang - LeiLi + LeiLi 293–305 Multilingual neural machine translation aims at learning a single translation model for multiple languages. These jointly trained models often suffer from performance degradationon rich-resource language pairs. We attribute this degeneration to parameter interference. In this paper, we propose LaSS to jointly train a single unified multilingual MT model. LaSS learns Language Specific Sub-network (LaSS) for each language pair to counter parameter interference. Comprehensive experiments on IWSLT and WMT datasets with various Transformer architectures show that LaSS obtains gains on 36 language pairs by up to 1.2 BLEU. Besides, LaSS shows its strong generalization performance at easy adaptation to new language pairs and zero-shot translation. LaSS boosts zero-shot translation with an average of 8.3 BLEU on 30 language pairs. Codes and trained models are available at https://github.com/NLP-Playground/LaSS. 2021.acl-long.25 @@ -2163,7 +2163,7 @@ LinQiu WeinanZhang YongYu - LeiLi + LeiLi 1993–2003 Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM) for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8×-15× speedup. Note that GLAT does not modify the network architecture, which is a training method to learn word interdependency. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points. 2021.acl-long.155 @@ -3869,7 +3869,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker RunxinXu TianyuLiu - LeiLi + LeiLi BaobaoChang 3533–3546 Document-level event extraction aims to recognize event information from a whole piece of article. Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model. In this paper, we propose Heterogeneous Graph-based Interaction Model with a Tracker (GIT) to solve the aforementioned two challenges. For the first challenge, GIT constructs a heterogeneous graph interaction network to capture global interactions among different sentences and entity mentions. For the second, GIT introduces a Tracker module to track the extracted events and hence capture the interdependency among the events. Experiments on a large-scale dataset (Zheng et al, 2019) show GIT outperforms the previous methods by 2.8 F1. Further analysis reveals is effective in extracting multiple correlated events and event arguments that scatter across the document. @@ -7997,7 +7997,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO HaoZhou ChunGan ZaixiangZheng - LeiLi + LeiLi 7361–7373 The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether we can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of vocabulary from the perspective of information theory. It motivates us to formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT beats widely-used vocabularies in diverse scenarios, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https://github.com/Jingjing-NLP/VOLT. 2021.acl-long.571 @@ -10453,7 +10453,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO MingxuanWang QianqianDong RongYe - LeiLi + LeiLi 55–62 NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products. NeurST aims at facilitating the speech translation research for NLP researchers and building reliable benchmarks for this field. It provides step-by-step recipes for feature extraction, data preprocessing, distributed training, and evaluation. In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research. The toolkit is publicly available at https://github.com/bytedance/neurst and we will continuously update the performance of with other counterparts and studies at https://st-benchmark.github.io/. 2021.acl-demo.7 diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml index 5cc30f41bc..c6ad527d06 100644 --- a/data/xml/2021.eacl.xml +++ b/data/xml/2021.eacl.xml @@ -3008,7 +3008,7 @@ ChangzhiSun YuanbinWu HaoZhou - LeiLi + LeiLi JunchiYan 2877–2887 Current state-of-the-art systems for joint entity relation extraction (Luan et al., 2019; Wad-den et al., 2019) usually adopt the multi-task learning framework. However, annotations for these additional tasks such as coreference resolution and event extraction are always equally hard (or even harder) to obtain. In this work, we propose a pre-training method ENPAR to improve the joint extraction performance. ENPAR requires only the additional entity annotations that are much easier to collect. Unlike most existing works that only consider incorporating entity information into the sentence encoder, we further utilize the entity pair information. Specifically, we devise four novel objectives,i.e., masked entity typing, masked entity prediction, adversarial context discrimination, and permutation prediction, to pre-train an entity encoder and an entity pair encoder. Comprehensive experiments show that the proposed pre-training method achieves significant improvement over BERT on ACE05, SciERC, and NYT, and outperforms current state-of-the-art on ACE05. diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml index c2f1fe5fb3..6068a3f571 100644 --- a/data/xml/2021.emnlp.xml +++ b/data/xml/2021.emnlp.xml @@ -1301,7 +1301,7 @@ HaoZhou WeinanZhang YongYu - LeiLi + LeiLi 1239–1250 Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, which makes the model less transparent. To tackle this challenge, in this paper, we propose LogiRE, a novel probabilistic model for document-level relation extraction by learning logic rules. LogiRE treats logic rules as latent variables and consists of two modules: a rule generator and a relation extractor. The rule generator is to generate logic rules potentially contributing to final predictions, and the relation extractor outputs final predictions based on the generated logic rules. Those two modules can be efficiently optimized with the expectation-maximization (EM) algorithm. By introducing logic rules into neural networks, LogiRE can explicitly capture long-range dependencies as well as enjoy better interpretation. Empirical results show that significantly outperforms several strong baselines in terms of relation performance and logical consistency. Our code is available at https://github.com/rudongyu/LogiRE. 2021.emnlp-main.95 @@ -4705,7 +4705,7 @@ ZhiyuanZeng JiazeChen WeiranXu - LeiLi + LeiLi 4102–4108 Neural abstractive summarization systems have gained significant progress in recent years. However, abstractive summarization often produce inconsisitent statements or false facts. How to automatically generate highly abstract yet factually correct summaries? In this paper, we proposed an efficient weak-supervised adversarial data augmentation approach to form the factual consistency dataset. Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings. Experiments and analysis conduct on public annotated summarization and factual consistency datasets demonstrate our approach effective and reasonable. 2021.emnlp-main.337 @@ -7934,7 +7934,7 @@ JunCao ShanboCheng ShujianHuang - LeiLi + LeiLi 7280–7290 How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER. 2021.emnlp-main.579 diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index 29c781856a..54401ea44f 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -2444,7 +2444,7 @@ ChiHan MingxuanWang HengJi - LeiLi + LeiLi 2214–2225 2021.findings-acl.195 10.18653/v1/2021.findings-acl.195 @@ -3026,7 +3026,7 @@ JiazeChen HaoZhou XipengQiu - LeiLi + LeiLi 2739–2750 2021.findings-acl.242 10.18653/v1/2021.findings-acl.242 @@ -3300,7 +3300,7 @@ LiweiWu ShanboCheng MingxuanWang - LeiLi + LeiLi 3001–3007 2021.findings-acl.264 10.18653/v1/2021.findings-acl.264 @@ -8770,7 +8770,7 @@ Multilingual Translation via Grafting Pre-trained Language Models ZeweiSun MingxuanWang - LeiLi + LeiLi 2735–2747 Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size. 2021.findings-emnlp.233 @@ -8864,7 +8864,7 @@ JiangtaoFeng ChengqiZhao MingxuanWang - LeiLi + LeiLi 2812–2823 Developing a unified multilingual model has been a long pursuing goal for machine translation. However, existing approaches suffer from performance degradation - a single multilingual model is inferior to separately trained bilingual ones on rich-resource languages. We conjecture that such a phenomenon is due to interference brought by joint training with multiple languages. To accommodate the issue, we propose CIAT, an adapted Transformer model with a small parameter overhead for multilingual machine translation. We evaluate CIAT on multiple benchmark datasets, including IWSLT, OPUS-100, and WMT. Experiments show that the CIAT consistently outperforms strong multilingual baselines on 64 of total 66 language directions, 42 of which have above 0.5 BLEU improvement. 2021.findings-emnlp.240 @@ -10963,7 +10963,7 @@ TaoWang ChengqiZhao MingxuanWang - LeiLi + LeiLi HangLi DeyiXiong 4639–4644 diff --git a/data/xml/2021.iwslt.xml b/data/xml/2021.iwslt.xml index 6582fc3c5b..894656ce75 100644 --- a/data/xml/2021.iwslt.xml +++ b/data/xml/2021.iwslt.xml @@ -110,7 +110,7 @@ RongYe QianqianDong JunCao - LeiLi + LeiLi 64–74 This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications. 2021.iwslt-1.6 diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml index d36faa17ca..08ce6ff0e0 100644 --- a/data/xml/2021.naacl.xml +++ b/data/xml/2021.naacl.xml @@ -6173,7 +6173,7 @@ Generative Imagination Elevates Machine Translation QuanyuLong MingxuanWang - LeiLi + LeiLi 5738–5748 There are common semantics shared across text and images. Given a sentence in a source language, whether depicting the visual scene helps translation into a target language? Existing multimodal neural machine translation methods (MNMT) require triplets of bilingual sentence - image for training and tuples of source sentence - image for inference. In this paper, we propose ImagiT, a novel machine translation method via visual imagination. ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the “imagined representation” to produce a target translation. Unlike previous methods, it only needs the source sentence at the inference time. Experiments demonstrate that ImagiT benefits from visual imagination and significantly outperforms the text-only neural machine translation baselines. Further analysis reveals that the imagination process in ImagiT helps fill in missing information when performing the degradation strategy. 2021.naacl-main.457 @@ -7335,7 +7335,7 @@ MingxuanWang HongxiaoBai HaiZhao - LeiLi + LeiLi 89–96 We propose to improve unsupervised neural machine translation with cross-lingual supervision (), which utilizes supervision signals from high resource language pairs to improve the translation of zero-source languages. Specifically, for training En-Ro system without parallel corpus, we can leverage the corpus from En-Fr and En-De to collectively train the translation from one language into many languages under one model. % is based on multilingual models which require no changes to the standard unsupervised NMT. Simple and effective, significantly improves the translation quality with a big margin in the benchmark unsupervised translation tasks, and even achieves comparable performance to supervised NMT. In particular, on WMT’14 -tasks achieves 37.6 and 35.18 BLEU score, which is very close to the large scale supervised setting and on WMT’16 -tasks achieves 35.09 BLEU score which is even better than the supervised Transformer baseline. 2021.naacl-industry.12 @@ -7361,7 +7361,7 @@ TaoWang ChengqiZhao MingxuanWang - LeiLi + LeiLi DeyiXiong 105–112 Automatic translation of dialogue texts is a much needed demand in many real life scenarios. However, the currently existing neural machine translation delivers unsatisfying results. In this paper, we conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation, including pronoun dropping (), punctuation dropping (), and typos (). In response to these challenges, we propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances. To properly evaluate the performance, we propose a manually annotated dataset with 1,931 Chinese-English parallel utterances from 300 dialogues as a benchmark testbed for dialogue translation. Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines. It also elevates the recovery rate of omitted pronouns from 26.09% to 47.16%. We will publish the code and dataset publicly at https://xxx.xx. @@ -7376,7 +7376,7 @@ YingXiong YangWei MingxuanWang - LeiLi + LeiLi 113–120 Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review. 2021.naacl-industry.15 diff --git a/data/xml/2021.wmt.xml b/data/xml/2021.wmt.xml index c9bed4b19a..7633cb4f0d 100644 --- a/data/xml/2021.wmt.xml +++ b/data/xml/2021.wmt.xml @@ -259,7 +259,7 @@ ZehuiLin JiangtaoFeng ShanboCheng - LeiLi + LeiLi MingxuanWang HaoZhou 187–196 diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 3cf3a0f0f9..1d66490920 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -13046,7 +13046,7 @@ Faster and Smaller Speech Translation without Quality Compromise Distillation-Resistant Watermarking for Model Protection in <fixed-case>NLP</fixed-case> XuandongZhaoUC Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara Yu-XiangWangUCSB 5044-5055 How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim’s prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100% mean average precision for all four tasks while the prior method fails on two. @@ -13946,7 +13946,7 @@ Faster and Smaller Speech Translation without Quality Compromise YifanSongPeking University JingjingXuShanghai AI Lab ZhifangSuiPeking University - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 5937-5947 Previous literature has proved that Pretrained Language Models (PLMs) can store factual knowledge. However, we find that facts stored in the PLMs are not always correct. It motivates us to explore a fundamental question: How do we calibrate factual knowledge in PLMs without re-training from scratch? In this work, we propose a simple and lightweight method CaliNet to achieve this goal. To be specific, we first detect whether PLMs can learn the right facts via a contrastive score between right and fake facts. If not, we then use a lightweight method to add and adapt new parameters to specific factual texts. Experiments on the knowledge probing task show the calibration effectiveness and efficiency. In addition, through closed-book question answering, we find that the calibrated PLM possesses knowledge generalization ability after finetuning.Beyond the calibration performance, we further investigate and visualize the knowledge calibration mechanism. 2022.findings-emnlp.438 @@ -14613,7 +14613,7 @@ Faster and Smaller Speech Translation without Quality Compromise Yi-LinTuanUniversity of California, Santa Barbara YujieLuUniversity of California, Santa Barbara MichaelSaxonUniversity of California, Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara William YangWangUnversity of California, Santa Barbara 6559-6574 Is it possible to build a general and automatic natural language generation (NLG) evaluation metric? Existing learned metrics either perform unsatisfactorily or are restricted to tasks where large human rating data is already available. We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation, by utilizing a novel, iterative error synthesis and severity scoring pipeline. This pipeline applies a series of plausible errors to raw text and assigns severity labels by simulating human judgements with entailment. We evaluate SESCORE against existing metrics by comparing how their scores correlate with human ratings. SESCORE outperforms all prior unsupervised metrics on multiple diverse NLG tasks including machine translation, image captioning, and WebNLG text generation. For WMT 20/21En-De and Zh-En, SESCORE improve the average Kendall correlation with human judgement from 0.154 to 0.195. SESCORE even achieves comparable performance to the best supervised metric COMET, despite receiving no human annotated training data. diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml index 3525f423fa..3a52524fee 100644 --- a/data/xml/2022.iwslt.xml +++ b/data/xml/2022.iwslt.xml @@ -112,7 +112,7 @@ On the Impact of Noises in Crowd-Sourced Data for Speech Translation SiqiOuyang RongYe - LeiLi + LeiLi 92-97 Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of the most widely used ST benchmark datasets. It contains around 400 hours of speech-transcript-translation data for each of the eight translation directions. This dataset passes several quality-control filters during creation. However, we find that MuST-C still suffers from three major quality issues: audiotext misalignment, inaccurate translation, and unnecessary speaker’s name. What are the impacts of these data quality issues for model development and evaluation? In this paper, we propose an automatic method to fix or filter the above quality issues, using English-German (En-De) translation as an example. Our experiments show that ST models perform better on clean test sets, and the rank of proposed models remains consistent across different test sets. Besides, simply removing misaligned data points from the training set does not lead to a better ST model. 2022.iwslt-1.9 diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml index ee7dd7e37b..9c00b0fd08 100644 --- a/data/xml/2023.emnlp.xml +++ b/data/xml/2023.emnlp.xml @@ -5132,7 +5132,7 @@ ZhenqiaoSong MarkusFreitag WilliamWang - LeiLi + LeiLi 5967-5994 Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics do not provide explicit explanation of their verdict, nor associate the scores with defects in the generated text. To address this limitation, we present INSTRUCTSCORE, a fine-grained explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate INSTRUCTSCORE on a variety of generation tasks, including translation, captioning, data-to-text, and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our INSTRUCTSCORE, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings. 2023.emnlp-main.365 @@ -9223,7 +9223,7 @@ Learning from Mistakes via Cooperative Study Assistant for Large Language Models DanqingWang - LeiLi + LeiLi 10667-10685 Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ. 2023.emnlp-main.659 diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml index b67818d037..08375137ea 100644 --- a/data/xml/2023.findings.xml +++ b/data/xml/2023.findings.xml @@ -17331,7 +17331,7 @@ <fixed-case>A</fixed-case>uto<fixed-case>P</fixed-case>lan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models SiqiOuyang - LeiLi + LeiLi 3114-3128 Recent large language models (LLMs) are promising for making decisions in grounded environments. However, LLMs frequently fail in complex decision-making tasks due to the misalignment between the pre-trained knowledge in LLMs and the actual rules in the environment. Existing methods require either costly gradient computation or lengthy in-context demonstrations. In this paper, we propose AutoPlan, an approach to guide LLM-based agents to accomplish interactive decision-making tasks. AutoPlan augments the LLM prompt with a task-solving plan and optimizes it through iterative experience collection and reflection. Our experiments show that AutoPlan, though using no in-context demonstrations, achieves success rates on par with the baselines using human-written demonstrations on ALFWorld and even outperforms them by 8% on HotpotQA. The code is available at https://github.com/owaski/AutoPlan. 2023.findings-emnlp.205 @@ -28056,7 +28056,7 @@ BohongWu FeiYuan HaiZhao - LeiLi + LeiLi JingjingXu 15432-15444 Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these models are not capable of generating high-quality text compared with decoder-based causal language models. Can we transform a pre-trained language understanding model into an effective language generation model? We propose a Semantic-Guided Alignment-then-Denoising (SGA) approach to adapt a multilingual encoder to a multilingual generator with a small number of additional parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-R_{large}. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators. Our code is available at https://github.com/chengzhipanpan/XLMR4MT. diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml index 0a00862971..68e647310f 100644 --- a/data/xml/2024.acl.xml +++ b/data/xml/2024.acl.xml @@ -11575,7 +11575,7 @@ GuangleiZhuCarnegie Mellon University XuandongZhaoUniversity of California, Berkeley LiangmingPanUniversity of California, Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University WilliamWangUC Santa Barbara 15474-15492 Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM’s bias in evaluating their own output. In this paper, we formally define LLM’s self-bias – the tendency to favor its own generation – using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias. diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index 2b12bb3577..bd20b55e17 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -3228,7 +3228,7 @@ BiaoZhangGoogle DeepMind ZhongtaoLiuGoogle William YangWangUC Santa Barbara - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University MarkusFreitagGoogle 1429-1445 Recent large language models (LLM) areleveraging human feedback to improve theirgeneration quality. However, human feedbackis costly to obtain, especially during inference.In this work, we propose LLMRefine, aninference time optimization method to refineLLM’s output. The core idea is to usea learned fine-grained feedback model topinpoint defects and guide LLM to refinethem iteratively. Using original LLM as aproposal of edits, LLMRefine searches fordefect-less text via simulated annealing, tradingoff the exploration and exploitation. Weconduct experiments on three text generationtasks, including machine translation, long-form question answering (QA), and topicalsummarization. LLMRefine consistentlyoutperforms all baseline approaches, achievingimprovements up to 1.7 MetricX points ontranslation tasks, 8.1 ROUGE-L on ASQA, 2.2ROUGE-L on topical summarization. @@ -4399,7 +4399,7 @@ ShujianHuangNanjing University LingpengKongDepartment of Computer Science, The University of Hong Kong JiajunChenNanjing University - LeiLiSchool of Computer Science, Carnegie Mellon University + LeiLiSchool of Computer Science, Carnegie Mellon University 2765-2781 Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs’ performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM. 2024.findings-naacl.176 diff --git a/data/xml/2024.iwslt.xml b/data/xml/2024.iwslt.xml index a824817398..fba61187d5 100644 --- a/data/xml/2024.iwslt.xml +++ b/data/xml/2024.iwslt.xml @@ -328,7 +328,7 @@ BrianYanCarnegie Mellon University PatrickFernandesCarnegie Mellon University WilliamChenCarnegie Mellon University - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University GrahamNeubigCarnegie Mellon University ShinjiWatanabeCarnegie Mellon University 154-159 @@ -366,7 +366,7 @@ SiqiOuyangCarnegie Mellon University WilliamChenCarnegie Mellon University KarenLivescuTTI-Chicago - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University GrahamNeubigCarnegie Mellon University ShinjiWatanabeCarnegie Mellon University 164-169 diff --git a/data/xml/2025.coling.xml b/data/xml/2025.coling.xml index a7ac61845a..cd3265a9a1 100644 --- a/data/xml/2025.coling.xml +++ b/data/xml/2025.coling.xml @@ -6219,7 +6219,7 @@ ZhaojiangLin YuningMao William YangWang - LeiLi + LeiLi Yi-ChiaWang 7819–7830 From ice cream flavors to climate change, people exhibit a wide array of opinions on various topics, and understanding the rationale for these opinions can promote healthy discussion and consensus among them. As such, it can be valuable for a large language model (LLM), particularly as an AI assistant, to be able to empathize with or even explain these various standpoints. In this work, we hypothesize that different topic stances often manifest correlations that can be used to extrapolate to topics with unknown opinions. We explore various prompting and fine-tuning methods to improve an LLM’s ability to (a) extrapolate from opinions on known topics to unknown ones and (b) support their extrapolation with reasoning. Our findings suggest that LLMs possess inherent knowledge from training data about these opinion correlations, and with minimal data, the similarities between human opinions and model-extrapolated opinions can be improved by more than 50%. Furthermore, LLM can generate the reasoning process behind their extrapolation of opinions. diff --git a/data/xml/2025.iwslt.xml b/data/xml/2025.iwslt.xml index 1696407540..fe3e308388 100644 --- a/data/xml/2025.iwslt.xml +++ b/data/xml/2025.iwslt.xml @@ -406,7 +406,7 @@ <fixed-case>CMU</fixed-case>’s <fixed-case>IWSLT</fixed-case> 2025 Simultaneous Speech Translation System SiqiOuyangCarnegie Mellon University XiXuCarnegie Mellon University - LeiLiCarnegie Mellon University + LeiLiCarnegie Mellon University 309-314 This paper presents CMU’s submission to the IWSLT 2025 Simultaneous Speech Translation (SST) task for translating unsegmented English speech into Chinese and German text in a streaming manner. Our end-to-end speech-to-text system integrates a chunkwise causal Wav2Vec 2.0 speech encoder, an adapter, and the Qwen2.5-7B-Instruct as the decoder. We use a two-stage simultaneous training procedure on robust speech segments synthesized from LibriSpeech, CommonVoice, and VoxPopuli datasets, utilizing standard cross-entropy loss. Our model supports adjustable latency through a configurable latency multiplier. Experimental results demonstrate that our system achieves 44.3 BLEU for English-to-Chinese and 25.1 BLEU for English-to-German translations on the ACL60/60 development set, with computation-aware latencies of 2.7 seconds and 2.3 seconds, and theoretical latencies of 2.2 and 1.7 seconds, respectively. 2025.iwslt-1.31 diff --git a/data/xml/D18.xml b/data/xml/D18.xml index d995e57508..38e37e4496 100644 --- a/data/xml/D18.xml +++ b/data/xml/D18.xml @@ -6212,7 +6212,7 @@ HaoyueShi HaoZhou JiazeChen - LeiLi + LeiLi 4631–4641 D18-1492 D18-1492.Attachment.zip diff --git a/data/xml/D19.xml b/data/xml/D19.xml index 061437e9ca..75635449b7 100644 --- a/data/xml/D19.xml +++ b/data/xml/D19.xml @@ -953,7 +953,7 @@ ZhixingTan JinsongSu DeyiXiong - LeiLi + LeiLi 803–812 In this study, we first investigate a novel capsule network with dynamic routing for linear time Neural Machine Translation (NMT), referred as CapsNMT. CapsNMT uses an aggregation mechanism to map the source sentence into a matrix with pre-determined size, and then applys a deep LSTM network to decode the target sequence from the source representation. Unlike the previous work (CITATION) to store the source sentence with a passive and bottom-up way, the dynamic routing policy encodes the source sentence with an iterative process to decide the credit attribution between nodes from lower and higher layers. CapsNMT has two core properties: it runs in time that is linear in the length of the sequences and provides a more flexible way to aggregate the part-whole information of the source sentence. On WMT14 English-German task and a larger WMT14 English-French task, CapsNMT achieves comparable results with the Transformer system. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems. D19-1074 @@ -4288,7 +4288,7 @@ FuliLuo ShunyaoLi PengchengYang - LeiLi + LeiLi BaobaoChang ZhifangSui XuSun diff --git a/data/xml/N18.xml b/data/xml/N18.xml index 1d475291bb..8471d0c101 100644 --- a/data/xml/N18.xml +++ b/data/xml/N18.xml @@ -1409,7 +1409,7 @@ Reinforced Co-Training JiaweiWu - LeiLi + LeiLi William YangWang 1252–1262 Co-training is a popular semi-supervised learning framework to utilize a large amount of unlabeled data in addition to a small labeled set. Co-training methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. However, the selection of samples in existing co-training methods is based on a predetermined policy, which ignores the sampling bias between the unlabeled and the labeled subsets, and fails to explore the data space. In this paper, we propose a novel method, Reinforced Co-Training, to select high-quality unlabeled samples to better co-train on. More specifically, our approach uses Q-learning to learn a data selection policy with a small labeled dataset, and then exploits this policy to train the co-training classifiers automatically. Experimental results on clickbait detection and generic text classification tasks demonstrate that our proposed method can obtain more accurate text classification results. diff --git a/data/xml/P16.xml b/data/xml/P16.xml index 4010ce6a2d..03b1a5bcd3 100644 --- a/data/xml/P16.xml +++ b/data/xml/P16.xml @@ -817,7 +817,7 @@ <fixed-case>CFO</fixed-case>: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases ZihangDai - LeiLi + LeiLi WeiXu 800–810 P16-1076 diff --git a/data/xml/P19.xml b/data/xml/P19.xml index 973cf97486..0e563d8546 100644 --- a/data/xml/P19.xml +++ b/data/xml/P19.xml @@ -2488,7 +2488,7 @@ Enhancing Topic-to-Essay Generation with External Commonsense Knowledge PengchengYang - LeiLi + LeiLi FuliLuo TianyuLiu XuSun @@ -3286,7 +3286,7 @@ PengchengYang ZhihanZhang FuliLuo - LeiLi + LeiLi ChengyangHuang XuSun 2680–2686 @@ -7124,7 +7124,7 @@ HuangzhaoZhang HaoZhou NingMiao - LeiLi + LeiLi 5564–5569 Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHAoutperforms the baseline model on attacking capability. Adversarial training with MHA also leads to better robustness and performance. P19-1559 @@ -7669,7 +7669,7 @@ YuBao HaoZhou ShujianHuang - LeiLi + LeiLi LiliMou OlgaVechtomova Xin-yuDai @@ -7853,7 +7853,7 @@ YunxuanXiao YanruQu HaoZhou - LeiLi + LeiLi WeinanZhang YongYu 6140–6150 @@ -8732,7 +8732,7 @@ Automatic Generation of Personalized Comment Based on User Profile WenhuanZeng AbulikemuAbuduweili - LeiLi + LeiLi PengchengYang 229–235 Comments on social media are very diverse, in terms of content, style and vocabulary, which make generating comments much more challenging than other existing natural language generation (NLG) tasks. Besides, since different user has different expression habits, it is necessary to take the user’s profile into consideration when generating comments. In this paper, we introduce the task of automatic generation of personalized comment (AGPC) for social media. Based on tens of thousands of users’ real comments and corresponding user profiles on weibo, we propose Personalized Comment Generation Network (PCGN) for AGPC. The model utilizes user feature embedding with a gated memory and attends to user description to model personality of users. In addition, external user representation is taken into consideration during the decoding to enhance the comments generation. Experimental results show that our model can generate natural, human-like and personalized comments. diff --git a/data/xml/W19.xml b/data/xml/W19.xml index 51ef0d3e4e..0df0123831 100644 --- a/data/xml/W19.xml +++ b/data/xml/W19.xml @@ -17436,7 +17436,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la YaoFu HaoZhou JiazeChen - LeiLi + LeiLi 24–33 Text attribute transfer is modifying certain linguistic attributes (e.g. sentiment, style, author-ship, etc.) of a sentence and transforming them from one type to another. In this paper, we aim to analyze and interpret what is changed during the transfer process. We start from the observation that in many existing models and datasets, certain words within a sentence play important roles in determining the sentence attribute class. These words are referred as the Pivot Words. Based on these pivot words, we propose a lexical analysis framework, the Pivot Analysis, to quantitatively analyze the effects of these words in text attribute classification and transfer. We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification,while leaving higher-level sentence structures unchanged. Our work provides an in-depth understanding of linguistic attribute transfer and further identifies the future requirements and challenges of this task W19-8604 From e3c16980e79f9afee4fe3cd2fc40277ca9e83d11 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 01:57:26 +0100 Subject: [PATCH 05/19] add more entries to `lei-li-hku` based on personal cv/website `cv-LILEI-2501.pdf` --- data/xml/2021.emnlp.xml | 2 +- data/xml/2021.findings.xml | 2 +- data/xml/2022.findings.xml | 2 +- data/xml/2023.emnlp.xml | 4 ++-- data/xml/2024.acl.xml | 4 ++-- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml index 6068a3f571..587d87fdda 100644 --- a/data/xml/2021.emnlp.xml +++ b/data/xml/2021.emnlp.xml @@ -432,7 +432,7 @@ Dynamic Knowledge Distillation for Pre-trained Language Models - LeiLi + LeiLi YankaiLin ShuhuaiRen PengLi diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index 54401ea44f..d1de8d38bb 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -6240,7 +6240,7 @@ <fixed-case>C</fixed-case>ascade<fixed-case>BERT</fixed-case>: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade - LeiLi + LeiLi YankaiLin DeliChen ShuhuaiRen diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 1d66490920..94322d177d 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -14453,7 +14453,7 @@ Faster and Smaller Speech Translation without Quality Compromise From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models - LeiLiPeking University + LeiLiPeking University YankaiLinGaoling School of Artificial Intelligence, Renmin University of China XuanchengRenPeking University GuangxiangZhaoPeking University diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml index 9c00b0fd08..1455cdd5b6 100644 --- a/data/xml/2023.emnlp.xml +++ b/data/xml/2023.emnlp.xml @@ -8511,7 +8511,7 @@ Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning LeanWang - LeiLi + LeiLi DamaiDai DeliChen HaoZhou @@ -10152,7 +10152,7 @@ Can Language Models Understand Physical Concepts? - LeiLi + LeiLi JingjingXu QingxiuDong CeZheng diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml index 68e647310f..3f1bd5a505 100644 --- a/data/xml/2024.acl.xml +++ b/data/xml/2024.acl.xml @@ -7096,7 +7096,7 @@ Large Language Models are not Fair Evaluators PeiyiWang - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong LiangChen ZefanCai DaweiZhu @@ -10832,7 +10832,7 @@ Multimodal <fixed-case>A</fixed-case>r<fixed-case>X</fixed-case>iv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong YuqiWangUniversity of Hong Kong RunxinXuPeking University PeiyiWangPeking University From e74e1c255ece5adf6d7d6e0c4585aa4b3ce4e34b Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 02:28:25 +0100 Subject: [PATCH 06/19] add more entries to `lei-li-hku/cmu` - based on cmu personal website of publications - hku google scholar --- data/xml/2021.findings.xml | 2 +- data/xml/2023.americasnlp.xml | 2 +- data/xml/2024.acl.xml | 4 ++-- data/xml/2024.naacl.xml | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index d1de8d38bb..f8b3ca72d9 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -3464,7 +3464,7 @@ YuanbinWu JiazeChen HaoZhou - LeiLi + LeiLi 3140–3151 2021.findings-acl.277 10.18653/v1/2021.findings-acl.277 diff --git a/data/xml/2023.americasnlp.xml b/data/xml/2023.americasnlp.xml index 5f13fb31f4..77e2132a93 100644 --- a/data/xml/2023.americasnlp.xml +++ b/data/xml/2023.americasnlp.xml @@ -230,7 +230,7 @@ TianruiGuUniversity of California, Santa Barbara KaieChenUniversity of California, Santa Barbara SiqiOuyangUniversity of California, Santa Barbara - LeiLiUniversity of California Santa Barbara + LeiLiUniversity of California Santa Barbara 173-176 This paper presents PlayGround’s submission to the AmericasNLP 2023 shared task on machine translation (MT) into indigenous languages. We finetuned NLLB-600M, a multilingual MT model pre-trained on Flores-200, on 10 low-resource language directions and examined the effectiveness of weight averaging and back translation. Our experiments showed that weight averaging, on average, led to a 0.0169 improvement in the ChrF++ score. Additionally, we found that back translation resulted in a 0.008 improvement in the ChrF++ score. 2023.americasnlp-1.19 diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml index 3f1bd5a505..66a5ff717f 100644 --- a/data/xml/2024.acl.xml +++ b/data/xml/2024.acl.xml @@ -7079,7 +7079,7 @@ Math-Shepherd: Verify and Reinforce <fixed-case>LLM</fixed-case>s Step-by-step without Human Annotations PeiyiWang - LeiLiUniversity of Hong Kong + LeiLiUniversity of Hong Kong ZhihongShaoTsinghua University, Tsinghua University RunxinXu DamaiDai @@ -14422,7 +14422,7 @@ Watermarking for Large Language Models XuandongZhao Yu-XiangWang - LeiLi + LeiLi 10-11 As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial in both the computational linguistics and machine learning communities. In this tutorial, we aim to provide an in-depth exploration of text watermarking, a subfield of linguistic steganography with the goal of embedding a hidden message (the watermark) within a text passage. We will introduce the fundamentals of text watermarking, discuss the main challenges in identifying AI-generated text, and delve into the current watermarking methods, assessing their strengths and weaknesses. Moreover, we will explore other possible applications of text watermarking and discuss future directions for this field. Each section will be supplemented with examples and key takeaways. 2024.acl-tutorials.6 diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml index 134e936cfb..c291b40970 100644 --- a/data/xml/2024.naacl.xml +++ b/data/xml/2024.naacl.xml @@ -8700,7 +8700,7 @@ MuhaoChenUC Davis ChaoweiXiaoUW-Madison HuanSunOSU - LeiLiCMU + LeiLiCMU LeonDerczynskiUW Seattle AnimaAnandkumarCaltech, NVIDIA FeiWangUSC From a17357d78de3573a4e844c7634510023719733f1 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 02:50:30 +0100 Subject: [PATCH 07/19] introducing `lei-li-bupt` --- data/yaml/name_variants.yaml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index c70af33eef..7ac2003d36 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5751,6 +5751,11 @@ orcid: 0009-0008-6984-5104 comment: University of Hong Kong institution: University of Hong Kong +- canonical: {first: Lei, last: Li} + id: lei-li-bupt + orcid: 0000-0002-3204-6527 + comment: Beijing University of Posts and Telecommunications + institution: Beijing University of Posts and Telecommunications - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 39f974fe848346edcfe2a11781a5502a166a4857 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 02:51:50 +0100 Subject: [PATCH 08/19] add papers to `lei-li-bupt` based on openreview/affiliation --- data/xml/2020.sdp.xml | 2 +- data/xml/2021.findings.xml | 2 +- data/xml/2022.aacl.xml | 2 +- data/xml/2022.findings.xml | 2 +- data/xml/2024.lrec.xml | 2 +- data/xml/K19.xml | 2 +- data/xml/W13.xml | 2 +- data/xml/W16.xml | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/data/xml/2020.sdp.xml b/data/xml/2020.sdp.xml index 5d198dd20d..2e3e41d958 100644 --- a/data/xml/2020.sdp.xml +++ b/data/xml/2020.sdp.xml @@ -349,7 +349,7 @@ <fixed-case>CIST</fixed-case>@<fixed-case>CL</fixed-case>-<fixed-case>S</fixed-case>ci<fixed-case>S</fixed-case>umm 2020, <fixed-case>L</fixed-case>ong<fixed-case>S</fixed-case>umm 2020: Automatic Scientific Document Summarization - LeiLi + LeiLi YangXie WeiLiu YinanLiu diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index f8b3ca72d9..6dc9353b6c 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -925,7 +925,7 @@ <fixed-case>U</fixed-case>ni<fixed-case>K</fixed-case>eyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction HuanqinWu WeiLiu - LeiLi + LeiLi DanNie TaoChen FengZhang diff --git a/data/xml/2022.aacl.xml b/data/xml/2022.aacl.xml index 19f589335b..367920f172 100644 --- a/data/xml/2022.aacl.xml +++ b/data/xml/2022.aacl.xml @@ -553,7 +553,7 @@ <fixed-case>SAPG</fixed-case>raph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph SiyaQi - LeiLi + LeiLi YiyangLi JinJiang DingxinHu diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 94322d177d..25af1088c6 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -11942,7 +11942,7 @@ Faster and Smaller Speech Translation without Quality Compromise SiyiWangBeijing University of Posts and Telecommunications KaiWangBeijing University of Posts and Telecommunications YanquanZhouBeijing University of Posts and Telecommunications - LeiLiBeijing University of Posts and Telecommunications + LeiLiBeijing University of Posts and Telecommunications QingYangDu Xiaoman Technology(Beijing) DongliangXuDu Xiaoman Technology(Beijing) 3880-3886 diff --git a/data/xml/2024.lrec.xml b/data/xml/2024.lrec.xml index 2b6c32c612..c116dfc772 100644 --- a/data/xml/2024.lrec.xml +++ b/data/xml/2024.lrec.xml @@ -9082,7 +9082,7 @@ QingYang DongliangXu YanquanZhou - LeiLi + LeiLi YuzeLi YingqiZhu 8792–8803 diff --git a/data/xml/K19.xml b/data/xml/K19.xml index a909084b16..dc6a3cf72c 100644 --- a/data/xml/K19.xml +++ b/data/xml/K19.xml @@ -955,7 +955,7 @@ In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes - LeiLi + LeiLi WeiLiu MarinaLitvak NataliaVanetik diff --git a/data/xml/W13.xml b/data/xml/W13.xml index 351599ba29..910d016091 100644 --- a/data/xml/W13.xml +++ b/data/xml/W13.xml @@ -5056,7 +5056,7 @@ <fixed-case>CIST</fixed-case> System Report for <fixed-case>ACL</fixed-case> <fixed-case>M</fixed-case>ulti<fixed-case>L</fixed-case>ing 2013 – Track 1: Multilingual Multi-document Summarization - LeiLi + LeiLi WeiHeng JiaYu YuLiu diff --git a/data/xml/W16.xml b/data/xml/W16.xml index 33d4e950d9..944a6fe4c3 100644 --- a/data/xml/W16.xml +++ b/data/xml/W16.xml @@ -2289,7 +2289,7 @@ <fixed-case>CIST</fixed-case> System for <fixed-case>CL</fixed-case>-<fixed-case>S</fixed-case>ci<fixed-case>S</fixed-case>umm 2016 Shared Task - LeiLi + LeiLi LiyuanMao YazhaoZhang JunqiChi From 2a8b49d27f45ee5e726f496319a03f232979726b Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 03:01:41 +0100 Subject: [PATCH 09/19] typo fix --- data/xml/P19.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/xml/P19.xml b/data/xml/P19.xml index 0e563d8546..f954e88bac 100644 --- a/data/xml/P19.xml +++ b/data/xml/P19.xml @@ -7853,7 +7853,7 @@ YunxuanXiao YanruQu HaoZhou - LeiLi + LeiLi WeinanZhang YongYu 6140–6150 From e00b1c71592bb2d420fcf62c107453e396766a02 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 04:33:08 +0100 Subject: [PATCH 10/19] add papers to more specific lei li based on affiliation and CVs/GS matching --- data/xml/2020.fnp.xml | 2 +- data/xml/2021.acl.xml | 2 +- data/xml/2021.emnlp.xml | 2 +- data/xml/2021.findings.xml | 2 +- data/xml/2021.naacl.xml | 4 ++-- data/xml/D19.xml | 2 +- data/xml/W13.xml | 2 +- data/xml/W14.xml | 2 +- data/xml/W17.xml | 2 +- data/xml/W19.xml | 2 +- 10 files changed, 11 insertions(+), 11 deletions(-) diff --git a/data/xml/2020.fnp.xml b/data/xml/2020.fnp.xml index 30cff5edd3..78b5a1ee7f 100644 --- a/data/xml/2020.fnp.xml +++ b/data/xml/2020.fnp.xml @@ -194,7 +194,7 @@ Extractive Financial Narrative Summarisation based on <fixed-case>DPP</fixed-case>s - LeiLi + LeiLi YafeiJiang YinanLiu 100–104 diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index a0b841b9de..5e40a51caf 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -11081,7 +11081,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Pre-training Methods for Neural Machine Translation MingxuanWang - LeiLi + LeiLi 21–25 This tutorial provides a comprehensive guide to make the most of pre-training for neural machine translation. Firstly, we will briefly introduce the background of NMT, pre-training methodology, and point out the main challenges when applying pre-training for NMT. Then we will focus on analysing the role of pre-training in enhancing the performance of NMT, how to design a better pre-training model for executing specific NMT tasks and how to better integrate the pre-trained model into NMT system. In each part, we will provide examples, discuss training techniques and analyse what is transferred when applying pre-training. 2021.acl-tutorials.4 diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml index 587d87fdda..8165009dff 100644 --- a/data/xml/2021.emnlp.xml +++ b/data/xml/2021.emnlp.xml @@ -9717,7 +9717,7 @@ Text <fixed-case>A</fixed-case>uto<fixed-case>A</fixed-case>ugment: Learning Compositional Augmentation Policy for Text Classification ShuhuaiRen JinchaoZhang - LeiLi + LeiLi XuSun JieZhou 9029–9043 diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index 6dc9353b6c..809a2ccb92 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -6725,7 +6725,7 @@ Leveraging Word-Formation Knowledge for <fixed-case>C</fixed-case>hinese Word Sense Disambiguation HuaZheng - LeiLi + LeiLi DamaiDai DeliChen TianyuLiu diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml index 08ce6ff0e0..c896712ba5 100644 --- a/data/xml/2021.naacl.xml +++ b/data/xml/2021.naacl.xml @@ -2243,7 +2243,7 @@ Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in <fixed-case>NLP</fixed-case> Models WenkaiYang - LeiLi + LeiLi ZhiyuanZhang XuanchengRen XuSun @@ -5884,7 +5884,7 @@ Decompose, Fuse and Generate: A Formation-Informed Method for <fixed-case>C</fixed-case>hinese Definition Generation HuaZheng DamaiDai - LeiLi + LeiLi TianyuLiu ZhifangSui BaobaoChang diff --git a/data/xml/D19.xml b/data/xml/D19.xml index 75635449b7..4b3cf71bd4 100644 --- a/data/xml/D19.xml +++ b/data/xml/D19.xml @@ -8707,7 +8707,7 @@ The tutorial will bring researchers and practitioners to be aware of this issue, Discreteness in Neural Natural Language Processing LiliMou HaoZhou - LeiLi + LeiLi This tutorial provides a comprehensive guide to the process of discreteness in neural NLP. As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications. diff --git a/data/xml/W13.xml b/data/xml/W13.xml index 910d016091..227482dfaf 100644 --- a/data/xml/W13.xml +++ b/data/xml/W13.xml @@ -5020,7 +5020,7 @@ Multi-document multilingual summarization corpus preparation, Part 1: <fixed-case>A</fixed-case>rabic, <fixed-case>E</fixed-case>nglish, <fixed-case>G</fixed-case>reek, <fixed-case>C</fixed-case>hinese, <fixed-case>R</fixed-case>omanian - LeiLi + LeiLi CorinaForascu MahmoudEl-Haj GeorgeGiannakopoulos diff --git a/data/xml/W14.xml b/data/xml/W14.xml index 9cdb41c81d..3fe049893c 100644 --- a/data/xml/W14.xml +++ b/data/xml/W14.xml @@ -11786,7 +11786,7 @@ XiaoyueCong FangHuang HongfaXue - LeiLi + LeiLi ZhiqiaoGao 114–119 W14-6818 diff --git a/data/xml/W17.xml b/data/xml/W17.xml index 4e65bc6b2b..20942f770b 100644 --- a/data/xml/W17.xml +++ b/data/xml/W17.xml @@ -1679,7 +1679,7 @@ Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums - LeiLi + LeiLi LiyuanMao MoyeChen 32–36 diff --git a/data/xml/W19.xml b/data/xml/W19.xml index 0df0123831..518a273da1 100644 --- a/data/xml/W19.xml +++ b/data/xml/W19.xml @@ -18512,7 +18512,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la Multi-lingual <fixed-case>W</fixed-case>ikipedia Summarization and Title Generation On Low Resource Corpus WeiLiu - LeiLi + LeiLi ZuyingHuang YinanLiu 17–25 From d3b6b0b69b420cb2bece3c800afdb23094cf9cc6 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 04:34:12 +0100 Subject: [PATCH 11/19] adding new `lei-li-zju` --- data/yaml/name_variants.yaml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 7ac2003d36..9c1838889b 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5756,6 +5756,11 @@ orcid: 0000-0002-3204-6527 comment: Beijing University of Posts and Telecommunications institution: Beijing University of Posts and Telecommunications +- canonical: {first: Lei, last: Li} + id: lei-li-zju + orcid: 0000-0002-7456-2204 + comment: Zhejiang University + institution: Zhejiang University - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 9d573559c409b334076ecd584d2c290fbc93d981 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 04:48:26 +0100 Subject: [PATCH 12/19] adding papers to new `lei-li-zju` based on affiliation displayed, comparison with openreview and g scholar --- data/xml/2022.acl.xml | 2 +- data/xml/2022.coling.xml | 2 +- data/xml/2022.emnlp.xml | 2 +- data/xml/2022.findings.xml | 2 +- data/xml/2023.ijcnlp.xml | 2 +- data/xml/2024.acl.xml | 2 +- data/xml/2024.findings.xml | 2 +- data/xml/2025.acl.xml | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index 72e6ef489f..aa6e7d6894 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -7423,7 +7423,7 @@ in the Case of Unambiguous Gender MoshaChen ZhenBi XiaozhuanLiang - LeiLi + LeiLi XinShang KangpingYin ChuanqiTan diff --git a/data/xml/2022.coling.xml b/data/xml/2022.coling.xml index bfdedc0130..7c277651a0 100644 --- a/data/xml/2022.coling.xml +++ b/data/xml/2022.coling.xml @@ -2431,7 +2431,7 @@ <fixed-case>L</fixed-case>ight<fixed-case>NER</fixed-case>: A Lightweight Tuning Paradigm for Low-resource <fixed-case>NER</fixed-case> via Pluggable Prompting XiangChen - LeiLi + LeiLi ShuminDeng ChuanqiTan ChangliangXu diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml index 7517e92f00..903a6ec7d6 100644 --- a/data/xml/2022.emnlp.xml +++ b/data/xml/2022.emnlp.xml @@ -11575,7 +11575,7 @@ XinXieZhejiang University XiangChenZhejiang University ZhouboLiZhejiang University - LeiLiZhejiang University + LeiLiZhejiang University 98-108 We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video. 2022.emnlp-demos.10 diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 25af1088c6..18f3de968a 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -6196,7 +6196,7 @@ Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction XiangChen NingyuZhang - LeiLi + LeiLi YunzhiYao ShuminDeng ChuanqiTan diff --git a/data/xml/2023.ijcnlp.xml b/data/xml/2023.ijcnlp.xml index 292a8d1274..a9c9103d62 100644 --- a/data/xml/2023.ijcnlp.xml +++ b/data/xml/2023.ijcnlp.xml @@ -1496,7 +1496,7 @@ PengfeiZhu ChaoPang YekunChai - LeiLi + LeiLi ShuohuanWang YuSun HaoTian diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml index 66a5ff717f..5961854d68 100644 --- a/data/xml/2024.acl.xml +++ b/data/xml/2024.acl.xml @@ -13381,7 +13381,7 @@ ZiwenXuZhejiang University ShuofeiQiao RunnanFang - LeiLiTencent + LeiLiTencent ZhenBiZhejiang University GuozhouZheng HuajunChenZhejiang University diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index bd20b55e17..a46865543d 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -32433,7 +32433,7 @@ hai-coaching/ <fixed-case>H</fixed-case>yper<fixed-case>L</fixed-case>o<fixed-case>RA</fixed-case>: Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation ChuanchengLvTsinghua University, Tsinghua University - LeiLiTencent + LeiLiTencent ShitouZhang GangChen FanchaoQi diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index f8b0c3ebc7..996fc761d3 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -17060,7 +17060,7 @@ Uncertainty-Aware Iterative Preference Optimization for Enhanced <fixed-case>LLM</fixed-case> Reasoning - LeiLiTencent + LeiLiTencent HehuanLiu YaxinZhou ZhaoYangGuiTencent From e42139e0c5606b07392af4a8fee21dc4f98b2305 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 04:58:07 +0100 Subject: [PATCH 13/19] add new `lei-li-hkbu` Lei_Li19 on OR --- data/yaml/name_variants.yaml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 9c1838889b..2cf5889c04 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5751,6 +5751,11 @@ orcid: 0009-0008-6984-5104 comment: University of Hong Kong institution: University of Hong Kong +- canonical: {first: Lei, last: Li} + id: lei-li-hkbu + orcid: 0000-0002-5631-2519 + comment: Hong Kong Baptist University + institution: Hong Kong Baptist University - canonical: {first: Lei, last: Li} id: lei-li-bupt orcid: 0000-0002-3204-6527 From 0fb1bced1c584ef5dba5faf2de6e3fabf37273be Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 04:58:41 +0100 Subject: [PATCH 14/19] add papers to new `lei-li-hkbu` using OpenReview and OpenReview --- data/xml/2021.acl.xml | 2 +- data/xml/2022.acl.xml | 2 +- data/xml/2022.coling.xml | 2 +- data/xml/2024.ccl.xml | 2 +- data/xml/2024.lrec.xml | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index 5e40a51caf..5d70b7cf8c 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -5370,7 +5370,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO Personalized Transformer for Explainable Recommendation - LeiLi + LeiLi YongfengZhang LiChen 4947–4957 diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index aa6e7d6894..985472c1c4 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -278,7 +278,7 @@ ShijieGeng ZuohuiFu YingqiangGe - LeiLi + LeiLi Gerardde Melo YongfengZhang 244-255 diff --git a/data/xml/2022.coling.xml b/data/xml/2022.coling.xml index 7c277651a0..83cca481e5 100644 --- a/data/xml/2022.coling.xml +++ b/data/xml/2022.coling.xml @@ -2759,7 +2759,7 @@ Augmenting Legal Judgment Prediction with Contrastive Case Relations DugangLiu WeihaoDu - LeiLi + LeiLi WeikePan ZhongMing 2658–2667 diff --git a/data/xml/2024.ccl.xml b/data/xml/2024.ccl.xml index 44aa040fb4..c505a9ce20 100644 --- a/data/xml/2024.ccl.xml +++ b/data/xml/2024.ccl.xml @@ -1070,7 +1070,7 @@ YuelouXu YanLu KaiWang - LeiLi + LeiLi YanquanZhou 1123–1135 “The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction” diff --git a/data/xml/2024.lrec.xml b/data/xml/2024.lrec.xml index c116dfc772..6a0ace7ce1 100644 --- a/data/xml/2024.lrec.xml +++ b/data/xml/2024.lrec.xml @@ -10424,7 +10424,7 @@ Large Language Models for Generative Recommendation: A Survey and Visionary Discussions - LeiLi + LeiLi YongfengZhang DugangLiu LiChen From 90492a02b05100e114fc8a4f22136a71d5304773 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 05:04:57 +0100 Subject: [PATCH 15/19] add new `lei-li-renmin` plus only paper OR Lei_Li42 --- data/xml/2025.findings.xml | 2 +- data/yaml/name_variants.yaml | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index 7d74c6c282..49010d3c3c 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -43657,7 +43657,7 @@ <fixed-case>A</fixed-case>uto<fixed-case>MIR</fixed-case>: Effective Zero-Shot Medical Information Retrieval without Relevance Labels - LeiLi + LeiLi XiangxuZhangRenmin University of China XiaoZhou ZhengLiu diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 2cf5889c04..2ff2f26a76 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5766,6 +5766,11 @@ orcid: 0000-0002-7456-2204 comment: Zhejiang University institution: Zhejiang University +- canonical: {first: Lei, last: Li} + id: lei-li-renmin + orcid: 0000-0001-5660-0409 + comment: Renmin University + institution: Renmin University - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 78a5c6ec089f6ce61695e80c11d9d50a575903bb Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 05:19:24 +0100 Subject: [PATCH 16/19] add new `lei-li-ecnu` plus papers OR Lei_Li29 , dblp.org/pid/13/7007-43 --- data/xml/2022.emnlp.xml | 2 +- data/xml/2022.findings.xml | 2 +- data/xml/2023.acl.xml | 2 +- data/yaml/name_variants.yaml | 5 +++++ 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml index 903a6ec7d6..5f6ea5d0bd 100644 --- a/data/xml/2022.emnlp.xml +++ b/data/xml/2022.emnlp.xml @@ -11465,7 +11465,7 @@ MinghuiQiuAlibaba Group TaolinZhangEast China Normal University TingtingLiuEast China Normal University - LeiLiEast China Normal University + LeiLiEast China Normal University JianingWangEast China Normal University MingWangAlibaba Group JunHuangAlibaba Group diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 18f3de968a..8a4431f5dc 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -8908,7 +8908,7 @@ TingtingLiuEast China Normal University ChengyuWangAlibaba Group XiangruZhuFudan University - LeiLiEast China Normal University + LeiLiEast China Normal University MinghuiQiuAlibaba Group JunHuangalibaba group MingGaoEast China Normal University diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml index 22590a6adf..2f960a6715 100644 --- a/data/xml/2023.acl.xml +++ b/data/xml/2023.acl.xml @@ -16901,7 +16901,7 @@ <fixed-case>F</fixed-case>ashion<fixed-case>KLIP</fixed-case>: Enhancing <fixed-case>E</fixed-case>-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph XiaodanWangFudan University ChengyuWangAlibaba Group - LeiLiEast China Normal University + LeiLiEast China Normal University ZhixuLiFudan University BenChenAlibaba Group LinboJinAlibaba diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 2ff2f26a76..83b0351cb3 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5771,6 +5771,11 @@ orcid: 0000-0001-5660-0409 comment: Renmin University institution: Renmin University +- canonical: {first: Lei, last: Li} + id: lei-li-ecnu + orcid: 0000-0002-8891-1786 + comment: ECNU + institution: East China Normal University - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 573b6132b9715ae06e0714757481876acfac62e9 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 05:47:04 +0100 Subject: [PATCH 17/19] add new `lei-li-ucph` plus papers --- data/xml/2023.emnlp.xml | 2 +- data/xml/2023.findings.xml | 6 +++--- data/xml/2025.acl.xml | 2 +- data/xml/2025.findings.xml | 6 +++--- data/yaml/name_variants.yaml | 5 +++++ 5 files changed, 13 insertions(+), 8 deletions(-) diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml index 1455cdd5b6..056292e4c9 100644 --- a/data/xml/2023.emnlp.xml +++ b/data/xml/2023.emnlp.xml @@ -4156,7 +4156,7 @@ Can We Edit Factual Knowledge by In-Context Learning? CeZheng - LeiLi + LeiLi QingxiuDong YuxuanFan ZhiyongWu diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml index 08375137ea..99b00022b8 100644 --- a/data/xml/2023.findings.xml +++ b/data/xml/2023.findings.xml @@ -7044,7 +7044,7 @@ Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter YiLiuSchool of Computer Science, Peking University XiaohanBiPeking University - LeiLiPeking University + LeiLiPeking University SishuoChenCenter for Data Science, Peking University WenkaiYangPeking University XuSunPeking University @@ -10714,7 +10714,7 @@ Delving into the Openness of <fixed-case>CLIP</fixed-case> ShuhuaiRenPeking University - LeiLiPeking University + LeiLiPeking University XuanchengRenDAMO Academy, Alibaba Group GuangxiangZhaoShanghai AI lab XuSunPeking University @@ -16398,7 +16398,7 @@ <fixed-case>I</fixed-case>mage<fixed-case>N</fixed-case>et<fixed-case>VC</fixed-case>: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 <fixed-case>I</fixed-case>mage<fixed-case>N</fixed-case>et Categories HemingXia QingxiuDong - LeiLi + LeiLi JingjingXu TianyuLiu ZiweiQin diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index 996fc761d3..f0faf01d79 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -11968,7 +11968,7 @@ TianfangZhangTsinghua University ZongkaiWu Jenq-NengHwang - LeiLi + LeiLi 16780-16790 Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a major challenge. In this paper, we propose the **Deductive and Inductive (DID)** method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW, MR-GSM8K, and our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate great improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought), while maintaining lower computational costs. 2025.acl-long.820 diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index 49010d3c3c..d6309fe740 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -13785,7 +13785,7 @@ ZongkaiWu JohnLeeUniversity of Edinburgh, University of Edinburgh Jenq-NengHwang - LeiLi + LeiLi 10045-10056 In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image’s semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4. 2025.findings-acl.523 @@ -27104,7 +27104,7 @@ JinyuanXu XueHe Jenq-NengHwang - LeiLi + LeiLi 1736-1750 Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment, however, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the Multi-Layer Attention Consistency Score (MACS), a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22% decrease in VRAM usage and 30% reduction in latency. 2025.findings-emnlp.91 @@ -28380,7 +28380,7 @@ XinglinZhangMedical Image Insights TaoChenUniversity of Waterloo Jenq-NengHwang - LeiLi + LeiLi 3456-3467 Contrast-enhanced 3D Medical imaging (e.g., CT, MRI) leverages phase sequences to uncover temporal dynamics vital for diagnosing tumors, lesions, and vascular issues. However, current retrieval models primarily focus on spatial features, neglecting phase-specific progression detailed in clinical reports. We present the **Phase-aware Memory Network (PAMN)**, a novel framework enhancing 3D medical image retrieval by fusing imaging phases with diagnostic text. PAMN creates rich radiological representations that enhance diagnostic accuracy by combining image details with clinical report context, rigorously tested on a novel phase-series dataset of 12,230 hospital CT scans. PAMN achieves an effective balance of performance and scalability in 3D radiology retrieval, outperforming state-of-the-art baselines through the robust fusion of spatial, temporal, and textual information. 2025.findings-emnlp.184 diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 83b0351cb3..6ca575802e 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -5776,6 +5776,11 @@ orcid: 0000-0002-8891-1786 comment: ECNU institution: East China Normal University +- canonical: {first: Lei, last: Li} + id: lei-li-ucph + orcid: 0000-0002-2929-0828 + comment: University of Copenhagen + institution: University of Copenhagen - canonical: {first: Shih-Min, last: Li} variants: - {first: Shi-Min, last: Li} From 5f19668fbc1e4db547aff1aa360120849b2ea677 Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 06:00:26 +0100 Subject: [PATCH 18/19] forgot 1 letter --- data/xml/2025.acl.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index f0faf01d79..b6c5557d7a 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -11968,7 +11968,7 @@ TianfangZhangTsinghua University ZongkaiWu Jenq-NengHwang - LeiLi + LeiLi 16780-16790 Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a major challenge. In this paper, we propose the **Deductive and Inductive (DID)** method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW, MR-GSM8K, and our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate great improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought), while maintaining lower computational costs. 2025.acl-long.820 From edb7a0155ce94b1b28f9bf02e6fb0d3555dec9cb Mon Sep 17 00:00:00 2001 From: weissenh <50957092+weissenh@users.noreply.github.com> Date: Fri, 7 Nov 2025 06:00:26 +0100 Subject: [PATCH 19/19] forgot 1 letter --- data/xml/2025.acl.xml | 2 +- data/xml/2025.findings.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml index f0faf01d79..b6c5557d7a 100644 --- a/data/xml/2025.acl.xml +++ b/data/xml/2025.acl.xml @@ -11968,7 +11968,7 @@ TianfangZhangTsinghua University ZongkaiWu Jenq-NengHwang - LeiLi + LeiLi 16780-16790 Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a major challenge. In this paper, we propose the **Deductive and Inductive (DID)** method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW, MR-GSM8K, and our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate great improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought), while maintaining lower computational costs. 2025.acl-long.820 diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index d6309fe740..0ba8e26440 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -27104,7 +27104,7 @@ JinyuanXu XueHe Jenq-NengHwang - LeiLi + LeiLi 1736-1750 Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment, however, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the Multi-Layer Attention Consistency Score (MACS), a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22% decrease in VRAM usage and 30% reduction in latency. 2025.findings-emnlp.91