diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml
index f7e4e32899..7b0399e1a1 100644
--- a/data/xml/2020.acl.xml
+++ b/data/xml/2020.acl.xml
@@ -4234,7 +4234,7 @@
NingMiao
YuxuanSong
HaoZhou
- LeiLi
+ LeiLi
3436–3441
It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach.
2020.acl-main.314
@@ -10481,7 +10481,7 @@
XijinZhang
SongchengJiang
YuxuanWang
- LeiLi
+ LeiLi
1–8
This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four inte- gral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multi- lingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.
2020.acl-demos.1
diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml
index 1bf9ab29c4..55e6af0b21 100644
--- a/data/xml/2020.emnlp.xml
+++ b/data/xml/2020.emnlp.xml
@@ -1707,7 +1707,7 @@
ShuangZeng
RunxinXu
BaobaoChang
- LeiLi
+ LeiLi
1630–1640
Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across paragraphs. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN), a method to recognize such relations for long paragraphs. GAIN constructs two graphs, a heterogeneous mention-level graph (MG) and an entity-level graph (EG). The former captures complex interaction among different mentions and the latter aggregates mentions underlying for the same entities. Based on the graphs we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at https://github.com/PKUnlp-icler/GAIN.
2020.emnlp-main.127
@@ -2836,7 +2836,7 @@
XipengQiu
JiangtaoFeng
HaoZhou
- LeiLi
+ LeiLi
2649–2663
We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github.com/linzehui/mRASP.
2020.emnlp-main.210
@@ -9842,7 +9842,7 @@
JunxianHe
MingxuanWang
YimingYang
- LeiLi
+ LeiLi
9119–9130
Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.
2020.emnlp-main.733
diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml
index 2382fc6635..63b5c7021e 100644
--- a/data/xml/2020.findings.xml
+++ b/data/xml/2020.findings.xml
@@ -1465,7 +1465,7 @@
Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach
MaosenZhang
NanJiang
- LeiLi
+ LeiLi
YexiangXue
1286–1298
Generating natural language under complex constraints is a principled formulation towards controllable text generation. We present a framework to allow specification of combinatorial constraints for sentence generation. We propose TSMC, an efficient method to generate high likelihood sentences with respect to a pre-trained language model while satisfying the constraints. Our approach is highly flexible, requires no task-specific train- ing, and leverages efficient constraint satisfaction solving techniques. To better handle the combinatorial constraints, a tree search algorithm is embedded into the proposal process of the Markov Chain Monte Carlo (MCMC) to explore candidates that satisfy more constraints. Compared to existing MCMC approaches, our sampling approach has a better mixing performance. Experiments show that TSMC achieves consistent and significant improvement on multiple language generation tasks.
@@ -5726,7 +5726,7 @@
MingxuanWang
WeinanZhang
YongYu
- LeiLi
+ LeiLi
4908–4917
Active learning for sentence understanding aims at discovering informative unlabeled data for annotation and therefore reducing the demand for labeled data. We argue that the typical uncertainty sampling method for active learning is time-consuming and can hardly work in real-time, which may lead to ineffective sample selection. We propose adversarial uncertainty sampling in discrete space (AUSDS) to retrieve informative unlabeled samples more efficiently. AUSDS maps sentences into latent space generated by the popular pre-trained language models, and discover informative unlabeled text samples for annotation via adversarial attack. The proposed approach is extremely efficient compared with traditional uncertainty sampling with more than 10x speedup. Experimental results on five datasets show that AUSDS outperforms strong baselines on effectiveness.
2020.findings-emnlp.441
diff --git a/data/xml/2020.fnp.xml b/data/xml/2020.fnp.xml
index 3752fcbd16..78b5a1ee7f 100644
--- a/data/xml/2020.fnp.xml
+++ b/data/xml/2020.fnp.xml
@@ -194,7 +194,7 @@
Extractive Financial Narrative Summarisation based on DPPs
- LeiLi
+ LeiLi
YafeiJiang
YinanLiu
100–104
diff --git a/data/xml/2020.sdp.xml b/data/xml/2020.sdp.xml
index db6adb6692..2e3e41d958 100644
--- a/data/xml/2020.sdp.xml
+++ b/data/xml/2020.sdp.xml
@@ -349,7 +349,7 @@
CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
- LeiLi
+ LeiLi
YangXie
WeiLiu
YinanLiu
diff --git a/data/xml/2020.wmt.xml b/data/xml/2020.wmt.xml
index 9613f9566f..c516c6e1fe 100644
--- a/data/xml/2020.wmt.xml
+++ b/data/xml/2020.wmt.xml
@@ -471,7 +471,7 @@
ZehuiLin
YaomingZhu
MingxuanWang
- LeiLi
+ LeiLi
305–312
This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training.
2020.wmt-1.33
@@ -1443,7 +1443,7 @@
ZhuoZhi
JunCao
MingxuanWang
- LeiLi
+ LeiLi
985–990
In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining mod- ule adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions.
2020.wmt-1.112
diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml
index f6dfd817ee..5d70b7cf8c 100644
--- a/data/xml/2021.acl.xml
+++ b/data/xml/2021.acl.xml
@@ -284,7 +284,7 @@
ChangzhiSun
YuanbinWu
HaoZhou
- LeiLi
+ LeiLi
JunchiYan
220–231
Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks’ label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell’s label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.
@@ -315,7 +315,7 @@
XiaoPan
MingxuanWang
LiweiWu
- LeiLi
+ LeiLi
244–258
Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual baseline
2021.acl-long.21
@@ -364,7 +364,7 @@
ZehuiLin
LiweiWu
MingxuanWang
- LeiLi
+ LeiLi
293–305
Multilingual neural machine translation aims at learning a single translation model for multiple languages. These jointly trained models often suffer from performance degradationon rich-resource language pairs. We attribute this degeneration to parameter interference. In this paper, we propose LaSS to jointly train a single unified multilingual MT model. LaSS learns Language Specific Sub-network (LaSS) for each language pair to counter parameter interference. Comprehensive experiments on IWSLT and WMT datasets with various Transformer architectures show that LaSS obtains gains on 36 language pairs by up to 1.2 BLEU. Besides, LaSS shows its strong generalization performance at easy adaptation to new language pairs and zero-shot translation. LaSS boosts zero-shot translation with an average of 8.3 BLEU on 30 language pairs. Codes and trained models are available at https://github.com/NLP-Playground/LaSS.
2021.acl-long.25
@@ -2163,7 +2163,7 @@
LinQiu
WeinanZhang
YongYu
- LeiLi
+ LeiLi
1993–2003
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM) for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8×-15× speedup. Note that GLAT does not modify the network architecture, which is a training method to learn word interdependency. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.
2021.acl-long.155
@@ -3869,7 +3869,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker
RunxinXu
TianyuLiu
- LeiLi
+ LeiLi
BaobaoChang
3533–3546
Document-level event extraction aims to recognize event information from a whole piece of article. Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model. In this paper, we propose Heterogeneous Graph-based Interaction Model with a Tracker (GIT) to solve the aforementioned two challenges. For the first challenge, GIT constructs a heterogeneous graph interaction network to capture global interactions among different sentences and entity mentions. For the second, GIT introduces a Tracker module to track the extracted events and hence capture the interdependency among the events. Experiments on a large-scale dataset (Zheng et al, 2019) show GIT outperforms the previous methods by 2.8 F1. Further analysis reveals is effective in extracting multiple correlated events and event arguments that scatter across the document.
@@ -5370,7 +5370,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
Personalized Transformer for Explainable Recommendation
- LeiLi
+ LeiLi
YongfengZhang
LiChen
4947–4957
@@ -7997,7 +7997,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
HaoZhou
ChunGan
ZaixiangZheng
- LeiLi
+ LeiLi
7361–7373
The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether we can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of vocabulary from the perspective of information theory. It motivates us to formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT beats widely-used vocabularies in diverse scenarios, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https://github.com/Jingjing-NLP/VOLT.
2021.acl-long.571
@@ -10453,7 +10453,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
MingxuanWang
QianqianDong
RongYe
- LeiLi
+ LeiLi
55–62
NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products. NeurST aims at facilitating the speech translation research for NLP researchers and building reliable benchmarks for this field. It provides step-by-step recipes for feature extraction, data preprocessing, distributed training, and evaluation. In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research. The toolkit is publicly available at https://github.com/bytedance/neurst and we will continuously update the performance of with other counterparts and studies at https://st-benchmark.github.io/.
2021.acl-demo.7
@@ -11081,7 +11081,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
Pre-training Methods for Neural Machine Translation
MingxuanWang
- LeiLi
+ LeiLi
21–25
This tutorial provides a comprehensive guide to make the most of pre-training for neural machine translation. Firstly, we will briefly introduce the background of NMT, pre-training methodology, and point out the main challenges when applying pre-training for NMT. Then we will focus on analysing the role of pre-training in enhancing the performance of NMT, how to design a better pre-training model for executing specific NMT tasks and how to better integrate the pre-trained model into NMT system. In each part, we will provide examples, discuss training techniques and analyse what is transferred when applying pre-training.
2021.acl-tutorials.4
diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml
index c48bdfa284..c6ad527d06 100644
--- a/data/xml/2021.eacl.xml
+++ b/data/xml/2021.eacl.xml
@@ -3008,7 +3008,7 @@
ChangzhiSun
YuanbinWu
HaoZhou
- LeiLi
+ LeiLi
JunchiYan
2877–2887
Current state-of-the-art systems for joint entity relation extraction (Luan et al., 2019; Wad-den et al., 2019) usually adopt the multi-task learning framework. However, annotations for these additional tasks such as coreference resolution and event extraction are always equally hard (or even harder) to obtain. In this work, we propose a pre-training method ENPAR to improve the joint extraction performance. ENPAR requires only the additional entity annotations that are much easier to collect. Unlike most existing works that only consider incorporating entity information into the sentence encoder, we further utilize the entity pair information. Specifically, we devise four novel objectives,i.e., masked entity typing, masked entity prediction, adversarial context discrimination, and permutation prediction, to pre-train an entity encoder and an entity pair encoder. Comprehensive experiments show that the proposed pre-training method achieves significant improvement over BERT on ACE05, SciERC, and NYT, and outperforms current state-of-the-art on ACE05.
diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml
index e7faac1620..8165009dff 100644
--- a/data/xml/2021.emnlp.xml
+++ b/data/xml/2021.emnlp.xml
@@ -432,7 +432,7 @@
Dynamic Knowledge Distillation for Pre-trained Language Models
- LeiLi
+ LeiLi
YankaiLin
ShuhuaiRen
PengLi
@@ -1301,7 +1301,7 @@
HaoZhou
WeinanZhang
YongYu
- LeiLi
+ LeiLi
1239–1250
Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, which makes the model less transparent. To tackle this challenge, in this paper, we propose LogiRE, a novel probabilistic model for document-level relation extraction by learning logic rules. LogiRE treats logic rules as latent variables and consists of two modules: a rule generator and a relation extractor. The rule generator is to generate logic rules potentially contributing to final predictions, and the relation extractor outputs final predictions based on the generated logic rules. Those two modules can be efficiently optimized with the expectation-maximization (EM) algorithm. By introducing logic rules into neural networks, LogiRE can explicitly capture long-range dependencies as well as enjoy better interpretation. Empirical results show that significantly outperforms several strong baselines in terms of relation performance and logical consistency. Our code is available at https://github.com/rudongyu/LogiRE.
2021.emnlp-main.95
@@ -4705,7 +4705,7 @@
ZhiyuanZeng
JiazeChen
WeiranXu
- LeiLi
+ LeiLi
4102–4108
Neural abstractive summarization systems have gained significant progress in recent years. However, abstractive summarization often produce inconsisitent statements or false facts. How to automatically generate highly abstract yet factually correct summaries? In this paper, we proposed an efficient weak-supervised adversarial data augmentation approach to form the factual consistency dataset. Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings. Experiments and analysis conduct on public annotated summarization and factual consistency datasets demonstrate our approach effective and reasonable.
2021.emnlp-main.337
@@ -7934,7 +7934,7 @@
JunCao
ShanboCheng
ShujianHuang
- LeiLi
+ LeiLi
7280–7290
How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER.
2021.emnlp-main.579
@@ -9717,7 +9717,7 @@
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
ShuhuaiRen
JinchaoZhang
- LeiLi
+ LeiLi
XuSun
JieZhou
9029–9043
diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml
index 5b0b476fb0..809a2ccb92 100644
--- a/data/xml/2021.findings.xml
+++ b/data/xml/2021.findings.xml
@@ -925,7 +925,7 @@
UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
HuanqinWu
WeiLiu
- LeiLi
+ LeiLi
DanNie
TaoChen
FengZhang
@@ -2444,7 +2444,7 @@
ChiHan
MingxuanWang
HengJi
- LeiLi
+ LeiLi
2214–2225
2021.findings-acl.195
10.18653/v1/2021.findings-acl.195
@@ -3026,7 +3026,7 @@
JiazeChen
HaoZhou
XipengQiu
- LeiLi
+ LeiLi
2739–2750
2021.findings-acl.242
10.18653/v1/2021.findings-acl.242
@@ -3300,7 +3300,7 @@
LiweiWu
ShanboCheng
MingxuanWang
- LeiLi
+ LeiLi
3001–3007
2021.findings-acl.264
10.18653/v1/2021.findings-acl.264
@@ -3464,7 +3464,7 @@
YuanbinWu
JiazeChen
HaoZhou
- LeiLi
+ LeiLi
3140–3151
2021.findings-acl.277
10.18653/v1/2021.findings-acl.277
@@ -6240,7 +6240,7 @@
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
- LeiLi
+ LeiLi
YankaiLin
DeliChen
ShuhuaiRen
@@ -6725,7 +6725,7 @@
Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation
HuaZheng
- LeiLi
+ LeiLi
DamaiDai
DeliChen
TianyuLiu
@@ -8770,7 +8770,7 @@
Multilingual Translation via Grafting Pre-trained Language Models
ZeweiSun
MingxuanWang
- LeiLi
+ LeiLi
2735–2747
Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.
2021.findings-emnlp.233
@@ -8864,7 +8864,7 @@
JiangtaoFeng
ChengqiZhao
MingxuanWang
- LeiLi
+ LeiLi
2812–2823
Developing a unified multilingual model has been a long pursuing goal for machine translation. However, existing approaches suffer from performance degradation - a single multilingual model is inferior to separately trained bilingual ones on rich-resource languages. We conjecture that such a phenomenon is due to interference brought by joint training with multiple languages. To accommodate the issue, we propose CIAT, an adapted Transformer model with a small parameter overhead for multilingual machine translation. We evaluate CIAT on multiple benchmark datasets, including IWSLT, OPUS-100, and WMT. Experiments show that the CIAT consistently outperforms strong multilingual baselines on 64 of total 66 language directions, 42 of which have above 0.5 BLEU improvement.
2021.findings-emnlp.240
@@ -10963,7 +10963,7 @@
TaoWang
ChengqiZhao
MingxuanWang
- LeiLi
+ LeiLi
HangLi
DeyiXiong
4639–4644
diff --git a/data/xml/2021.iwslt.xml b/data/xml/2021.iwslt.xml
index e4e7f0c7c5..894656ce75 100644
--- a/data/xml/2021.iwslt.xml
+++ b/data/xml/2021.iwslt.xml
@@ -110,7 +110,7 @@
RongYe
QianqianDong
JunCao
- LeiLi
+ LeiLi
64–74
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications.
2021.iwslt-1.6
diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml
index e10381b920..c896712ba5 100644
--- a/data/xml/2021.naacl.xml
+++ b/data/xml/2021.naacl.xml
@@ -2243,7 +2243,7 @@
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
WenkaiYang
- LeiLi
+ LeiLi
ZhiyuanZhang
XuanchengRen
XuSun
@@ -5884,7 +5884,7 @@
Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation
HuaZheng
DamaiDai
- LeiLi
+ LeiLi
TianyuLiu
ZhifangSui
BaobaoChang
@@ -6173,7 +6173,7 @@
Generative Imagination Elevates Machine Translation
QuanyuLong
MingxuanWang
- LeiLi
+ LeiLi
5738–5748
There are common semantics shared across text and images. Given a sentence in a source language, whether depicting the visual scene helps translation into a target language? Existing multimodal neural machine translation methods (MNMT) require triplets of bilingual sentence - image for training and tuples of source sentence - image for inference. In this paper, we propose ImagiT, a novel machine translation method via visual imagination. ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the “imagined representation” to produce a target translation. Unlike previous methods, it only needs the source sentence at the inference time. Experiments demonstrate that ImagiT benefits from visual imagination and significantly outperforms the text-only neural machine translation baselines. Further analysis reveals that the imagination process in ImagiT helps fill in missing information when performing the degradation strategy.
2021.naacl-main.457
@@ -7335,7 +7335,7 @@
MingxuanWang
HongxiaoBai
HaiZhao
- LeiLi
+ LeiLi
89–96
We propose to improve unsupervised neural machine translation with cross-lingual supervision (), which utilizes supervision signals from high resource language pairs to improve the translation of zero-source languages. Specifically, for training En-Ro system without parallel corpus, we can leverage the corpus from En-Fr and En-De to collectively train the translation from one language into many languages under one model. % is based on multilingual models which require no changes to the standard unsupervised NMT. Simple and effective, significantly improves the translation quality with a big margin in the benchmark unsupervised translation tasks, and even achieves comparable performance to supervised NMT. In particular, on WMT’14 -tasks achieves 37.6 and 35.18 BLEU score, which is very close to the large scale supervised setting and on WMT’16 -tasks achieves 35.09 BLEU score which is even better than the supervised Transformer baseline.
2021.naacl-industry.12
@@ -7361,7 +7361,7 @@
TaoWang
ChengqiZhao
MingxuanWang
- LeiLi
+ LeiLi
DeyiXiong
105–112
Automatic translation of dialogue texts is a much needed demand in many real life scenarios. However, the currently existing neural machine translation delivers unsatisfying results. In this paper, we conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation, including pronoun dropping (), punctuation dropping (), and typos (). In response to these challenges, we propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances. To properly evaluate the performance, we propose a manually annotated dataset with 1,931 Chinese-English parallel utterances from 300 dialogues as a benchmark testbed for dialogue translation. Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines. It also elevates the recovery rate of omitted pronouns from 26.09% to 47.16%. We will publish the code and dataset publicly at https://xxx.xx.
@@ -7376,7 +7376,7 @@
YingXiong
YangWei
MingxuanWang
- LeiLi
+ LeiLi
113–120
Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review.
2021.naacl-industry.15
diff --git a/data/xml/2021.wmt.xml b/data/xml/2021.wmt.xml
index 2df80ca98f..7633cb4f0d 100644
--- a/data/xml/2021.wmt.xml
+++ b/data/xml/2021.wmt.xml
@@ -259,7 +259,7 @@
ZehuiLin
JiangtaoFeng
ShanboCheng
- LeiLi
+ LeiLi
MingxuanWang
HaoZhou
187–196
diff --git a/data/xml/2022.aacl.xml b/data/xml/2022.aacl.xml
index 68f2393158..367920f172 100644
--- a/data/xml/2022.aacl.xml
+++ b/data/xml/2022.aacl.xml
@@ -553,7 +553,7 @@
SAPGraph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph
SiyaQi
- LeiLi
+ LeiLi
YiyangLi
JinJiang
DingxinHu
diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml
index fd1624a921..985472c1c4 100644
--- a/data/xml/2022.acl.xml
+++ b/data/xml/2022.acl.xml
@@ -278,7 +278,7 @@
ShijieGeng
ZuohuiFu
YingqiangGe
- LeiLi
+ LeiLi
Gerardde Melo
YongfengZhang
244-255
@@ -707,7 +707,7 @@
QianDong
YaomingZhu
MingxuanWang
- LeiLi
+ LeiLi
680-694
How to find proper moments to generate partial sentence translation given a streaming speech input? Existing approaches waiting-and-translating for a fixed duration often break the acoustic units in speech, since the boundaries between acoustic units in speech are not even. In this paper, we propose MoSST, a simple yet effective method for translating streaming speech content. Given a usually long speech sequence, we develop an efficient monotonic segmentation module inside an encoder-decoder model to accumulate acoustic information incrementally and detect proper speech unit boundaries for the input in speech translation task. Experiments on multiple translation directions of the MuST-C dataset show that outperforms existing methods and achieves the best trade-off between translation quality (BLEU) and latency. Our code is available at https://github.com/dqqcasia/mosst.
2022.acl-long.50
@@ -2657,7 +2657,7 @@
WangchunshuZhou
JingjingXu
HaoZhou
- LeiLi
+ LeiLi
2701-2714
Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due to the pervasiveness, it naturally raises an interesting question: how do masked language models (MLMs) learn contextual representations? In this work, we analyze the learning dynamics of MLMs and find that it adopts sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these problems, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. To be specific, TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over MLM.
2022.acl-long.193
@@ -6668,7 +6668,7 @@ in the Case of Unambiguous Gender
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
QingkaiFang
RongYe
- LeiLi
+ LeiLi
YangFeng
MingxuanWang
7050-7062
@@ -7423,7 +7423,7 @@ in the Case of Unambiguous Gender
MoshaChen
ZhenBi
XiaozhuanLiang
- LeiLi
+ LeiLi
XinShang
KangpingYin
ChuanqiTan
@@ -7867,7 +7867,7 @@ in the Case of Unambiguous Gender
LihuaQian
XinyuDai
JiajunChen
- LeiLi
+ LeiLi
8398-8409
Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm.
2022.acl-long.575
diff --git a/data/xml/2022.coling.xml b/data/xml/2022.coling.xml
index 2dbc4374c6..83cca481e5 100644
--- a/data/xml/2022.coling.xml
+++ b/data/xml/2022.coling.xml
@@ -2431,7 +2431,7 @@
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting
XiangChen
- LeiLi
+ LeiLi
ShuminDeng
ChuanqiTan
ChangliangXu
@@ -2759,7 +2759,7 @@
Augmenting Legal Judgment Prediction with Contrastive Case Relations
DugangLiu
WeihaoDu
- LeiLi
+ LeiLi
WeikePan
ZhongMing
2658–2667
diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml
index 1a6a3b5d0d..5f6ea5d0bd 100644
--- a/data/xml/2022.emnlp.xml
+++ b/data/xml/2022.emnlp.xml
@@ -11465,7 +11465,7 @@
MinghuiQiuAlibaba Group
TaolinZhangEast China Normal University
TingtingLiuEast China Normal University
- LeiLiEast China Normal University
+ LeiLiEast China Normal University
JianingWangEast China Normal University
MingWangAlibaba Group
JunHuangAlibaba Group
@@ -11575,7 +11575,7 @@
XinXieZhejiang University
XiangChenZhejiang University
ZhouboLiZhejiang University
- LeiLiZhejiang University
+ LeiLiZhejiang University
98-108
We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video.
2022.emnlp-demos.10
diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml
index 9565c09e27..8a4431f5dc 100644
--- a/data/xml/2022.findings.xml
+++ b/data/xml/2022.findings.xml
@@ -880,7 +880,7 @@
XuandongZhao
ZhiguoYu
MingWu
- LeiLi
+ LeiLi
774-781
How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2×) and memory usage (8.0×) compared with state-of-the-art large models. Our implementation is available at https://github.com/XuandongZhao/HPD.
2022.findings-acl.64
@@ -3803,7 +3803,7 @@
ChengqiZhao
ShujianHuang
JiajunChen
- LeiLi
+ LeiLi
3537-3548
This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer model and hope to answer the following question: Is the capacity of current models strong enough for document-level translation? Interestingly, we observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words. We evaluate this model and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that document-level Transformer models outperforms sentence-level ones and many previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation.
2022.findings-acl.279
@@ -4226,7 +4226,7 @@
ZhongqiaoLi
XinboZhang
ChangzhiSun
- LeiLi
+ LeiLi
YanghuaXiao
HaoZhou
3941-3955
@@ -4371,7 +4371,7 @@
Structural Supervision for Word Alignment and Machine Translation
- LeiLi
+ LeiLi
KaiFan
HongjiaLi
ChunYuan
@@ -6196,7 +6196,7 @@
Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
XiangChen
NingyuZhang
- LeiLi
+ LeiLi
YunzhiYao
ShuminDeng
ChuanqiTan
@@ -7198,7 +7198,7 @@
JingjingXu
JiazeChen
HaoZhou
- LeiLi
+ LeiLi
2508-2527
We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at https://github.com/zide05/MTG.
2022.findings-naacl.192
@@ -8908,7 +8908,7 @@
TingtingLiuEast China Normal University
ChengyuWangAlibaba Group
XiangruZhuFudan University
- LeiLiEast China Normal University
+ LeiLiEast China Normal University
MinghuiQiuAlibaba Group
JunHuangalibaba group
MingGaoEast China Normal University
@@ -11942,7 +11942,7 @@ Faster and Smaller Speech Translation without Quality Compromise
SiyiWangBeijing University of Posts and Telecommunications
KaiWangBeijing University of Posts and Telecommunications
YanquanZhouBeijing University of Posts and Telecommunications
- LeiLiBeijing University of Posts and Telecommunications
+ LeiLiBeijing University of Posts and Telecommunications
QingYangDu Xiaoman Technology(Beijing)
DongliangXuDu Xiaoman Technology(Beijing)
3880-3886
@@ -13046,7 +13046,7 @@ Faster and Smaller Speech Translation without Quality Compromise
Distillation-Resistant Watermarking for Model Protection in NLP
XuandongZhaoUC Santa Barbara
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
Yu-XiangWangUCSB
5044-5055
How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim’s prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100% mean average precision for all four tasks while the prior method fails on two.
@@ -13946,7 +13946,7 @@ Faster and Smaller Speech Translation without Quality Compromise
YifanSongPeking University
JingjingXuShanghai AI Lab
ZhifangSuiPeking University
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
5937-5947
Previous literature has proved that Pretrained Language Models (PLMs) can store factual knowledge. However, we find that facts stored in the PLMs are not always correct. It motivates us to explore a fundamental question: How do we calibrate factual knowledge in PLMs without re-training from scratch? In this work, we propose a simple and lightweight method CaliNet to achieve this goal. To be specific, we first detect whether PLMs can learn the right facts via a contrastive score between right and fake facts. If not, we then use a lightweight method to add and adapt new parameters to specific factual texts. Experiments on the knowledge probing task show the calibration effectiveness and efficiency. In addition, through closed-book question answering, we find that the calibrated PLM possesses knowledge generalization ability after finetuning.Beyond the calibration performance, we further investigate and visualize the knowledge calibration mechanism.
2022.findings-emnlp.438
@@ -14453,7 +14453,7 @@ Faster and Smaller Speech Translation without Quality Compromise
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
- LeiLiPeking University
+ LeiLiPeking University
YankaiLinGaoling School of Artificial Intelligence, Renmin University of China
XuanchengRenPeking University
GuangxiangZhaoPeking University
@@ -14613,7 +14613,7 @@ Faster and Smaller Speech Translation without Quality Compromise
Yi-LinTuanUniversity of California, Santa Barbara
YujieLuUniversity of California, Santa Barbara
MichaelSaxonUniversity of California, Santa Barbara
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
William YangWangUnversity of California, Santa Barbara
6559-6574
Is it possible to build a general and automatic natural language generation (NLG) evaluation metric? Existing learned metrics either perform unsatisfactorily or are restricted to tasks where large human rating data is already available. We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation, by utilizing a novel, iterative error synthesis and severity scoring pipeline. This pipeline applies a series of plausible errors to raw text and assigns severity labels by simulating human judgements with entailment. We evaluate SESCORE against existing metrics by comparing how their scores correlate with human ratings. SESCORE outperforms all prior unsupervised metrics on multiple diverse NLG tasks including machine translation, image captioning, and WebNLG text generation. For WMT 20/21En-De and Zh-En, SESCORE improve the average Kendall correlation with human judgement from 0.154 to 0.195. SESCORE even achieves comparable performance to the best supervised metric COMET, despite receiving no human annotated training data.
diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml
index 07610a69ea..3a52524fee 100644
--- a/data/xml/2022.iwslt.xml
+++ b/data/xml/2022.iwslt.xml
@@ -112,7 +112,7 @@
On the Impact of Noises in Crowd-Sourced Data for Speech Translation
SiqiOuyang
RongYe
- LeiLi
+ LeiLi
92-97
Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of the most widely used ST benchmark datasets. It contains around 400 hours of speech-transcript-translation data for each of the eight translation directions. This dataset passes several quality-control filters during creation. However, we find that MuST-C still suffers from three major quality issues: audiotext misalignment, inaccurate translation, and unnecessary speaker’s name. What are the impacts of these data quality issues for model development and evaluation? In this paper, we propose an automatic method to fix or filter the above quality issues, using English-German (En-De) translation as an example. Our experiments show that ST models perform better on clean test sets, and the rank of proposed models remains consistent across different test sets. Besides, simply removing misaligned data points from the training set does not lead to a better ST model.
2022.iwslt-1.9
diff --git a/data/xml/2022.naacl.xml b/data/xml/2022.naacl.xml
index e51dd12cb9..1be30a3583 100644
--- a/data/xml/2022.naacl.xml
+++ b/data/xml/2022.naacl.xml
@@ -973,7 +973,7 @@
Provably Confidential Language Modelling
XuandongZhao
- LeiLi
+ LeiLi
Yu-XiangWang
943-955
Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
@@ -5242,7 +5242,7 @@
Cross-modal Contrastive Learning for Speech Translation
RongYe
MingxuanWang
- LeiLi
+ LeiLi
5099-5113
How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation. To this end, we propose ConST, a cross-modal contrastive learning method for end-to-end speech-to-text translation. We evaluate ConST and a variety of previous baselines on a popular benchmark MuST-C. Experiments show that the proposed ConST consistently outperforms the previous methods, and achieves an average BLEU of 29.4. The analysis further verifies that ConST indeed closes the representation gap of different modalities — its learned representation improves the accuracy of cross-modal speech-text retrieval from 4% to 88%. Code and models are available at https://github.com/ReneeYe/ConST.
2022.naacl-main.376
diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml
index 0137f0c46e..2f960a6715 100644
--- a/data/xml/2023.acl.xml
+++ b/data/xml/2023.acl.xml
@@ -3036,7 +3036,7 @@
WACO: Word-Aligned Contrastive Learning for Speech Translation
SiqiOuyangUniversity of California, Santa Barbara
RongYeByteDance AI Lab
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
3891-3907
End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model’s performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO.
2023.acl-long.216
@@ -4007,7 +4007,7 @@
WendaXuUniversity of California at Santa Barbara
XianQianByteDance AI LAB
MingxuanWangBytedance AI Lab
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
William YangWangUnversity of California, Santa Barbara
5166-5183
Is it possible to train a general metric for evaluating text generation quality without human-annotated ratings? Existing learned metrics either perform unsatisfactory across text generation tasks or require human ratings for training on specific tasks. In this paper, we propose SEScore2, a self-supervised approach for training a model-based metric for text generation evaluation. The key concept is to synthesize realistic model mistakes by perturbing sentences retrieved from a corpus. We evaluate SEScore2 and previous methods on four text generation tasks across three languages. SEScore2 outperforms all prior unsupervised metrics on four text generation evaluation benchmarks, with an average Kendall improvement of 0.158. Surprisingly, SEScore2 even outperforms the supervised BLEURT and COMET on multiple text generation tasks.
@@ -7899,7 +7899,7 @@
WeiShiFudan University
ZiquanFuSystem, Inc
SijieChengFudan University
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
YanghuaXiaoFudan University
9890-9908
Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as “lions don’t live in the ocean”, is also ubiquitous in the world but rarely mentioned explicitly in text. What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.
@@ -12505,7 +12505,7 @@
SiqiOuyangUniversity of California, Santa Barbara
ZhiguoYuMicrosoft
MingWuGitHub, Inc.
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
15590-15606
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, paraphrasing, and multiple-choice question answering. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 15.6% on the GLUE benchmark. Our source code is available at https://anonymous.4open.science/r/NPPrompt.
2023.acl-long.869
@@ -16901,7 +16901,7 @@
FashionKLIP: Enhancing E-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph
XiaodanWangFudan University
ChengyuWangAlibaba Group
- LeiLiEast China Normal University
+ LeiLiEast China Normal University
ZhixuLiFudan University
BenChenAlibaba Group
LinboJinAlibaba
diff --git a/data/xml/2023.americasnlp.xml b/data/xml/2023.americasnlp.xml
index 2a7fc31a22..77e2132a93 100644
--- a/data/xml/2023.americasnlp.xml
+++ b/data/xml/2023.americasnlp.xml
@@ -230,7 +230,7 @@
TianruiGuUniversity of California, Santa Barbara
KaieChenUniversity of California, Santa Barbara
SiqiOuyangUniversity of California, Santa Barbara
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
173-176
This paper presents PlayGround’s submission to the AmericasNLP 2023 shared task on machine translation (MT) into indigenous languages. We finetuned NLLB-600M, a multilingual MT model pre-trained on Flores-200, on 10 low-resource language directions and examined the effectiveness of weight averaging and back translation. Our experiments showed that weight averaging, on average, led to a 0.0169 improvement in the ChrF++ score. Additionally, we found that back translation resulted in a 0.008 improvement in the ChrF++ score.
2023.americasnlp-1.19
diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml
index b333e00563..056292e4c9 100644
--- a/data/xml/2023.emnlp.xml
+++ b/data/xml/2023.emnlp.xml
@@ -4156,7 +4156,7 @@
Can We Edit Factual Knowledge by In-Context Learning?
CeZheng
- LeiLi
+ LeiLi
QingxiuDong
YuxuanFan
ZhiyongWu
@@ -5132,7 +5132,7 @@
ZhenqiaoSong
MarkusFreitag
WilliamWang
- LeiLi
+ LeiLi
5967-5994
Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics do not provide explicit explanation of their verdict, nor associate the scores with defects in the generated text. To address this limitation, we present INSTRUCTSCORE, a fine-grained explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate INSTRUCTSCORE on a variety of generation tasks, including translation, captioning, data-to-text, and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our INSTRUCTSCORE, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.
2023.emnlp-main.365
@@ -8511,7 +8511,7 @@
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
LeanWang
- LeiLi
+ LeiLi
DamaiDai
DeliChen
HaoZhou
@@ -9223,7 +9223,7 @@
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
DanqingWang
- LeiLi
+ LeiLi
10667-10685
Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ.
2023.emnlp-main.659
@@ -10152,7 +10152,7 @@
Can Language Models Understand Physical Concepts?
- LeiLi
+ LeiLi
JingjingXu
QingxiuDong
CeZheng
diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml
index c14c5e0829..99b00022b8 100644
--- a/data/xml/2023.findings.xml
+++ b/data/xml/2023.findings.xml
@@ -4664,7 +4664,7 @@
YongkangWuHuawei
MengHanHuawei
YutaoZhuUniversity of Montreal
- LeiLiHuawei
+ LeiLiHuawei
XinyuZhangHuawei Technologies Co., Ltd
RuofeiLaiHuawei
XiaoguangLiHuawei Noah’s Ark Lab
@@ -7044,7 +7044,7 @@
Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
YiLiuSchool of Computer Science, Peking University
XiaohanBiPeking University
- LeiLiPeking University
+ LeiLiPeking University
SishuoChenCenter for Data Science, Peking University
WenkaiYangPeking University
XuSunPeking University
@@ -7617,7 +7617,7 @@
LET: Leveraging Error Type Information for Grammatical Error Correction
LingyuYangTsinghua University
HongjiaLiTsinghua University
- LeiLiTsinghua University
+ LeiLiTsinghua University
ChengyinXuTsinghua University
ShutaoXiaTsinghua University
ChunYuanTsinghua University
@@ -10714,7 +10714,7 @@
Delving into the Openness of CLIP
ShuhuaiRenPeking University
- LeiLiPeking University
+ LeiLiPeking University
XuanchengRenDAMO Academy, Alibaba Group
GuangxiangZhaoShanghai AI lab
XuSunPeking University
@@ -12344,7 +12344,7 @@
YinquanLuShanghai AI Laboratory
WenhaoZhuNational Key Laboratory for Novel Software Technology, Nanjing University
LingpengKongThe University of Hong Kong
- LeiLiUniversity of California Santa Barbara
+ LeiLiUniversity of California Santa Barbara
YuQiaoShanghai AI Lab
JingjingXuShanghai AI Lab
11518-11533
@@ -16398,7 +16398,7 @@
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
HemingXia
QingxiuDong
- LeiLi
+ LeiLi
JingjingXu
TianyuLiu
ZiweiQin
@@ -17331,7 +17331,7 @@
AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models
SiqiOuyang
- LeiLi
+ LeiLi
3114-3128
Recent large language models (LLMs) are promising for making decisions in grounded environments. However, LLMs frequently fail in complex decision-making tasks due to the misalignment between the pre-trained knowledge in LLMs and the actual rules in the environment. Existing methods require either costly gradient computation or lengthy in-context demonstrations. In this paper, we propose AutoPlan, an approach to guide LLM-based agents to accomplish interactive decision-making tasks. AutoPlan augments the LLM prompt with a task-solving plan and optimizes it through iterative experience collection and reflection. Our experiments show that AutoPlan, though using no in-context demonstrations, achieves success rates on par with the baselines using human-written demonstrations on ALFWorld and even outperforms them by 8% on HotpotQA. The code is available at https://github.com/owaski/AutoPlan.
2023.findings-emnlp.205
@@ -28056,7 +28056,7 @@
BohongWu
FeiYuan
HaiZhao
- LeiLi
+ LeiLi
JingjingXu
15432-15444
Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these models are not capable of generating high-quality text compared with decoder-based causal language models. Can we transform a pre-trained language understanding model into an effective language generation model? We propose a Semantic-Guided Alignment-then-Denoising (SGA) approach to adapt a multilingual encoder to a multilingual generator with a small number of additional parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-R_{large}. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators. Our code is available at https://github.com/chengzhipanpan/XLMR4MT.
diff --git a/data/xml/2023.ijcnlp.xml b/data/xml/2023.ijcnlp.xml
index 8a6f4f1f91..a9c9103d62 100644
--- a/data/xml/2023.ijcnlp.xml
+++ b/data/xml/2023.ijcnlp.xml
@@ -1496,7 +1496,7 @@
PengfeiZhu
ChaoPang
YekunChai
- LeiLi
+ LeiLi
ShuohuanWang
YuSun
HaoTian
diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml
index 6a03746558..5961854d68 100644
--- a/data/xml/2024.acl.xml
+++ b/data/xml/2024.acl.xml
@@ -7079,7 +7079,7 @@
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
PeiyiWang
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
ZhihongShaoTsinghua University, Tsinghua University
RunxinXu
DamaiDai
@@ -7096,7 +7096,7 @@
Large Language Models are not Fair Evaluators
PeiyiWang
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
LiangChen
ZefanCai
DaweiZhu
@@ -10832,7 +10832,7 @@
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
YuqiWangUniversity of Hong Kong
RunxinXuPeking University
PeiyiWangPeking University
@@ -11575,7 +11575,7 @@
GuangleiZhuCarnegie Mellon University
XuandongZhaoUniversity of California, Berkeley
LiangmingPanUniversity of California, Santa Barbara
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
WilliamWangUC Santa Barbara
15474-15492
Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM’s bias in evaluating their own output. In this paper, we formally define LLM’s self-bias – the tendency to favor its own generation – using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias.
@@ -13381,7 +13381,7 @@
ZiwenXuZhejiang University
ShuofeiQiao
RunnanFang
- LeiLiTencent
+ LeiLiTencent
ZhenBiZhejiang University
GuozhouZheng
HuajunChenZhejiang University
@@ -14422,7 +14422,7 @@
Watermarking for Large Language Models
XuandongZhao
Yu-XiangWang
- LeiLi
+ LeiLi
10-11
As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial in both the computational linguistics and machine learning communities. In this tutorial, we aim to provide an in-depth exploration of text watermarking, a subfield of linguistic steganography with the goal of embedding a hidden message (the watermark) within a text passage. We will introduce the fundamentals of text watermarking, discuss the main challenges in identifying AI-generated text, and delve into the current watermarking methods, assessing their strengths and weaknesses. Moreover, we will explore other possible applications of text watermarking and discuss future directions for this field. Each section will be supplemented with examples and key takeaways.
2024.acl-tutorials.6
diff --git a/data/xml/2024.ccl.xml b/data/xml/2024.ccl.xml
index af96f82500..c505a9ce20 100644
--- a/data/xml/2024.ccl.xml
+++ b/data/xml/2024.ccl.xml
@@ -1070,7 +1070,7 @@
YuelouXu
YanLu
KaiWang
- LeiLi
+ LeiLi
YanquanZhou
1123–1135
“The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction”
diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml
index b36fd0d2d0..92ce3743bb 100644
--- a/data/xml/2024.emnlp.xml
+++ b/data/xml/2024.emnlp.xml
@@ -902,7 +902,7 @@
A Survey on In-context Learning
QingxiuDong
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
DamaiDai
CeZhengPeking University
JingyuanMa
@@ -912,7 +912,7 @@
ZhiyongWuShanghai Artificial Intelligence Laboratory
BaobaoChangPeking University
XuSun
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
ZhifangSuiPeking University
1107-1128
With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.
@@ -5036,7 +5036,7 @@
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
ZhihuiXieShanghai Jiao Tong University
MukaiLi
ShunianChenShenzhen Research Institute of Big Data
@@ -8701,7 +8701,7 @@
WendaXu
JiachenLiUniversity of California, Santa Barbara
William YangWangUC Santa Barbara
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
11125-11139
Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of online training. Specifically, we identify that the learned LLM should adhere to the proximity of the behavior LLM, which collects the training samples. To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment.We conduct extensive experiments to validate the effectiveness and applicability of our approach by integrating it with various DAP methods, resulting in significant performance improvements across a wide range of tasks when training with the same amount of preference data. Even when only introducing one additional data collection phase, our online BPO improves its offline DAP baseline from 72.0% to 80.2% on TL;DR and from 82.2% to 89.1% on Anthropic Helpfulness in terms of win rate against human reference text.
2024.emnlp-main.623
@@ -10250,7 +10250,7 @@
HanlinZhuElectrical Engineering & Computer Science Department, University of California Berkeley
XiaomengYangGoogle DeepMind
AndrewCohen
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
YuandongTianMeta AI (FAIR)
13274-13292
Recent research has increasingly focused on evaluating large language models’ (LLMs) alignment with diverse human values and preferences, particularly for open-ended tasks like story generation. Traditional evaluation metrics rely heavily on lexical similarity with human-written references, often showing poor correlation with human judgments and failing to account for alignment with the diversity of human preferences. To address these challenges, we introduce PerSE, an interpretable evaluation framework designed to assess alignment with specific human preferences. It is tuned to infer specific preferences from an in-context personal profile and evaluate the alignment between the generated content and personal preferences. PerSE enhances interpretability by providing detailed comments and fine-grained scoring, facilitating more personalized content generation. Our 13B LLaMA-2-based PerSE shows a 15.8% increase in Kendall correlation and a 13.7% rise in accuracy with zero-shot reviewers compared to GPT-4. It also outperforms GPT-4 by 46.01% in Kendall correlation on new domains, indicating its transferability
@@ -18075,7 +18075,7 @@
WendaXu
XiXu
SiqiOuyangCMU, Carnegie Mellon University
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
344-350
With the rapid advancement of machine translation research, evaluation toolkits have become essential for benchmarking system progress. Tools like COMET and SacreBLEU offer single quality score assessments that are effective for pairwise system comparisons. However, these tools provide limited insights for fine-grained system-level comparisons and the analysis of instance-level defects. To address these limitations, we introduce Translation Canvas, an explainable interface designed to pinpoint and analyze translation systems’ performance: 1) Translation Canvas assists machine translation researchers in comprehending system-level model performance by identifying common errors (their frequency and severity) and analyzing relationships between different systems based on various evaluation metrics. 2) It supports fine-grained analysis by highlighting error spans with explanations and selectively displaying systems’ predictions. According to human evaluation, Translation Canvas demonstrates superior performance over COMET and SacreBLEU packages under enjoybility and understandbility criteria.
2024.emnlp-demo.36
diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml
index 3892c3c698..a46865543d 100644
--- a/data/xml/2024.findings.xml
+++ b/data/xml/2024.findings.xml
@@ -3228,7 +3228,7 @@
BiaoZhangGoogle DeepMind
ZhongtaoLiuGoogle
William YangWangUC Santa Barbara
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
MarkusFreitagGoogle
1429-1445
Recent large language models (LLM) areleveraging human feedback to improve theirgeneration quality. However, human feedbackis costly to obtain, especially during inference.In this work, we propose LLMRefine, aninference time optimization method to refineLLM’s output. The core idea is to usea learned fine-grained feedback model topinpoint defects and guide LLM to refinethem iteratively. Using original LLM as aproposal of edits, LLMRefine searches fordefect-less text via simulated annealing, tradingoff the exploration and exploitation. Weconduct experiments on three text generationtasks, including machine translation, long-form question answering (QA), and topicalsummarization. LLMRefine consistentlyoutperforms all baseline approaches, achievingimprovements up to 1.7 MetricX points ontranslation tasks, 8.1 ROUGE-L on ASQA, 2.2ROUGE-L on topical summarization.
@@ -4399,7 +4399,7 @@
ShujianHuangNanjing University
LingpengKongDepartment of Computer Science, The University of Hong Kong
JiajunChenNanjing University
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
2765-2781
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs’ performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.
2024.findings-naacl.176
@@ -8815,7 +8815,7 @@
Red Teaming Visual Language Models
MukaiLi
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
YuweiYin
MasoodAhmed
ZhenguangLiuZhejiang University
@@ -13143,7 +13143,7 @@
YiLiuPeking University
YuxiangWang
ShuhuaiRen
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
SishuoChenAlibaba Group
XuSun
LuHouHuawei Technologies Ltd.
@@ -15929,7 +15929,7 @@
FeiYuan
ShuaiYuan
ZhiyongWuShanghai Artificial Intelligence Laboratory
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
12111-12130
Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM’s multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant .
2024.findings-acl.721
@@ -18707,7 +18707,7 @@
ZhenqiaoSong
TaiqiHe
William YangWangUC Santa Barbara
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
15654-15669
How can large language models (LLMs) process and translate endangered languages? Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely perform well in unseen, endangered languages. On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary. We propose LingoLLM, a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training. Our key insight is to demonstrate linguistic knowledge of an unseen language in an LLM’s prompt, including a dictionary, a grammar book, and morphologically analyzed input text. We implement LingoLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages. Our results show that LingoLLM elevates translation capability from GPT-4’s 0 to 10.5 BLEU for 10 language directions. Our findings demonstrate the tremendous value of linguistic knowledge in the age of LLMs for endangered languages. Our data, code, and model generations will be released to the public. Our data, code, and model generations can be found at https://github.com/LLiLab/llm4endangeredlang.
2024.findings-acl.925
@@ -19577,7 +19577,7 @@
BabakDamavandi
Xin LunaDongFacebook
ChristosFaloutsosAmazon and Carnegie Mellon University
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
SeungwhanMoonFacebook
247-266
Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named SnapNTell, specifically tailored for entity-centric VQA. This task aims to test the models’ capabilities in identifying entities and providing detailed, entity-specific knowledge. We have developed the SnapNTell Dataset, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses. The dataset is organized into 22 major categories, containing 7,568 unique entities in total. For each entity, we curated 10 illustrative images and crafted 10 knowledge-intensive QA pairs. To address this novel task, we devised a scalable, efficient, and transparent retrieval-augmented multimodal LLM. Our approach markedly outperforms existing methods on the SnapNTell dataset, achieving a 66.5% improvement in the BELURT score.
@@ -24567,7 +24567,7 @@ and high variation in performance on the subset, suggesting our plausibility cri
João DSMarquesInstituto Superior Técnico and INESC-ID
MiguelGraça
MiguelFreire
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
Arlindo L.Oliveira
6473-6486
Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content’s semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to identify the point within a group of sequential passages where the content begins to shift. To evaluate our method, we introduce GutenQA, a benchmark with 3000 “needle in a haystack” type of question-answer pairs derived from 100 public domain narrative books available on Project Gutenberg. Our experiments show that LumberChunker not only outperforms the most competitive baseline by 7.37% in retrieval performance (DCG@20) but also that, when integrated into a RAG pipeline, LumberChunker proves to be more effective than other chunking methods and competitive baselines, such as the Gemini 1.5M Pro.
@@ -28060,7 +28060,7 @@ and high variation in performance on the subset, suggesting our plausibility cri
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
YinquanLuShanghai AI Laboratory
WenhaoZhuNanjing University
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
YuQiao
FeiYuan
10748-10772
@@ -32433,7 +32433,7 @@ hai-coaching/
HyperLoRA: Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation
ChuanchengLvTsinghua University, Tsinghua University
- LeiLiTencent
+ LeiLiTencent
ShitouZhang
GangChen
FanchaoQi
diff --git a/data/xml/2024.iwslt.xml b/data/xml/2024.iwslt.xml
index 4384df4c78..fba61187d5 100644
--- a/data/xml/2024.iwslt.xml
+++ b/data/xml/2024.iwslt.xml
@@ -328,7 +328,7 @@
BrianYanCarnegie Mellon University
PatrickFernandesCarnegie Mellon University
WilliamChenCarnegie Mellon University
- LeiLiCarnegie Mellon University
+ LeiLiCarnegie Mellon University
GrahamNeubigCarnegie Mellon University
ShinjiWatanabeCarnegie Mellon University
154-159
@@ -366,7 +366,7 @@
SiqiOuyangCarnegie Mellon University
WilliamChenCarnegie Mellon University
KarenLivescuTTI-Chicago
- LeiLiCarnegie Mellon University
+ LeiLiCarnegie Mellon University
GrahamNeubigCarnegie Mellon University
ShinjiWatanabeCarnegie Mellon University
164-169
diff --git a/data/xml/2024.lrec.xml b/data/xml/2024.lrec.xml
index 94b70df131..6a0ace7ce1 100644
--- a/data/xml/2024.lrec.xml
+++ b/data/xml/2024.lrec.xml
@@ -9082,7 +9082,7 @@
QingYang
DongliangXu
YanquanZhou
- LeiLi
+ LeiLi
YuzeLi
YingqiZhu
8792–8803
@@ -10424,7 +10424,7 @@
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
- LeiLi
+ LeiLi
YongfengZhang
DugangLiu
LiChen
diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml
index 4f3494d646..c291b40970 100644
--- a/data/xml/2024.naacl.xml
+++ b/data/xml/2024.naacl.xml
@@ -8700,7 +8700,7 @@
MuhaoChenUC Davis
ChaoweiXiaoUW-Madison
HuanSunOSU
- LeiLiCMU
+ LeiLiCMU
LeonDerczynskiUW Seattle
AnimaAnandkumarCaltech, NVIDIA
FeiWangUSC
diff --git a/data/xml/2025.acl.xml b/data/xml/2025.acl.xml
index 7e21d6cd67..b6c5557d7a 100644
--- a/data/xml/2025.acl.xml
+++ b/data/xml/2025.acl.xml
@@ -4615,7 +4615,7 @@
XuandongZhaoUniversity of California, Berkeley
ChenwenLiao
Yu-XiangWangUniversity of California, San Diego
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
6304-6316
Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection. Our code is publicly available at https://github.com/XuandongZhao/llm-watermark-location.
2025.acl-long.316
@@ -11968,7 +11968,7 @@
TianfangZhangTsinghua University
ZongkaiWu
Jenq-NengHwang
- LeiLi
+ LeiLi
16780-16790
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a major challenge. In this paper, we propose the **Deductive and Inductive (DID)** method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW, MR-GSM8K, and our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate great improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought), while maintaining lower computational costs.
2025.acl-long.820
@@ -17060,7 +17060,7 @@
Uncertainty-Aware Iterative Preference Optimization for Enhanced LLM Reasoning
- LeiLiTencent
+ LeiLiTencent
HehuanLiu
YaxinZhou
ZhaoYangGuiTencent
@@ -19239,7 +19239,7 @@
Benchmarking Long-Context Language Models on Long Code Understanding
JiaLi
XuyuanGuoPeking University
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
KechiZhangPeking University
GeLiPeking University
JiaLiTsinghua University
@@ -23304,7 +23304,7 @@
Design Choices for Extending the Context Length of Visual Language Models
MukaiLi
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
ShansanGong
QiLiuUniversity of Hong Kong
33425-33438
diff --git a/data/xml/2025.coling.xml b/data/xml/2025.coling.xml
index 5ff4b44d22..cd3265a9a1 100644
--- a/data/xml/2025.coling.xml
+++ b/data/xml/2025.coling.xml
@@ -6219,7 +6219,7 @@
ZhaojiangLin
YuningMao
William YangWang
- LeiLi
+ LeiLi
Yi-ChiaWang
7819–7830
From ice cream flavors to climate change, people exhibit a wide array of opinions on various topics, and understanding the rationale for these opinions can promote healthy discussion and consensus among them. As such, it can be valuable for a large language model (LLM), particularly as an AI assistant, to be able to empathize with or even explain these various standpoints. In this work, we hypothesize that different topic stances often manifest correlations that can be used to extrapolate to topics with unknown opinions. We explore various prompting and fine-tuning methods to improve an LLM’s ability to (a) extrapolate from opinions on known topics to unknown ones and (b) support their extrapolation with reasoning. Our findings suggest that LLMs possess inherent knowledge from training data about these opinion correlations, and with minimal data, the similarities between human opinions and model-extrapolated opinions can be improved by more than 50%. Furthermore, LLM can generate the reasoning process behind their extrapolation of opinions.
diff --git a/data/xml/2025.emnlp.xml b/data/xml/2025.emnlp.xml
index 32787a68cc..6a1a3cc5ae 100644
--- a/data/xml/2025.emnlp.xml
+++ b/data/xml/2025.emnlp.xml
@@ -22082,7 +22082,7 @@
ShengWang
JingweiDongthe University of Hong Kong
KaiLiu
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
JiahuiGao
JiyueJiang
LingpengKongDepartment of Computer Science, The University of Hong Kong
@@ -23892,7 +23892,7 @@
XiaonanLiFudan University
MingZhongUniversity of Illinois Urbana Champaign
ShansanGong
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
JunZhangByteDance
JingjingXu
LingpengKongDepartment of Computer Science, The University of Hong Kong
@@ -25167,7 +25167,7 @@
AdamOfficerUniversity of Pittsburgh Medical Center
AngelaChen
YufeiHuangUniversity of Pittsburgh
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
480-486
Comprehensive pathway datasets are essential resources for advancing biological research, yet constructing these datasets is labor intensive. Recognizing the labor-intensive nature of constructing these critical resources, we present BioGraphia, a web-based annotation platform designed to facilitate collaborative pathway graph annotation. BioGraphia supports multi-user collaboration with real-time monitoring, curation, and interactive pathway graph visualization. It enables users to directly annotate the nodes and relations on the candidate graph, guided by detailed instructions. The platform is further enhanced with a large language model that automatically generates explainable and span-aligned pre-annotation to accelerate the annotation process. Its modular design allows flexible integration of external knowledge bases, and customization of the definition of annotation schema and, to support adaptation to other graph-based annotation tasks. Code is available at https://github.com/LeiLiLab/BioGraphia
2025.emnlp-demos.34
diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml
index 3c597b3104..0ba8e26440 100644
--- a/data/xml/2025.findings.xml
+++ b/data/xml/2025.findings.xml
@@ -3679,7 +3679,7 @@
A Practical Examination of AI-Generated Text Detectors for Large Language Models
BrianTufts
XuandongZhaoUniversity of California, Berkeley
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
4824-4841
The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, PHD, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate practical adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate.
2025.findings-naacl.271
@@ -5290,7 +5290,7 @@
XiXu
WendaXu
SiqiOuyangCMU, Carnegie Mellon University
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
7062-7067
Simultaneous speech translation (SimulST) systems must balance translation quality with response time, making latency measurement crucial for evaluating their real-world performance. However, there has been a longstanding belief that current metrics yield unrealistically high latency measurements in unsegmented streaming settings. In this paper, we investigate this phenomenon, revealing its root cause in a fundamental misconception underlying existing latency evaluation approaches. We demonstrate that this issue affects not only streaming but also segment-level latency evaluation across different metrics. Furthermore, we propose a modification to correctly measure computation-aware latency for SimulST systems, addressing the limitations present in existing metrics.
2025.findings-naacl.393
@@ -8591,7 +8591,7 @@
InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model
SiqiOuyangCMU, Carnegie Mellon University
XiXu
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
3032-3046
Simultaneous translation of unbounded streaming speech remains a challenging problem due to the need for effectively processing the historical speech context and past translations so that quality and latency, including computation overhead, can be balanced. Most prior works assume pre-segmented speech, limiting their real-world applicability. In this paper, we propose InfiniSST, a novel approach that formulates SST as a multi-turn dialogue task, enabling seamless translation of unbounded speech. We construct translation trajectories and robust segments from MuST-C with multi-latency augmentation during training and develop a key-value (KV) cache management strategy to facilitate efficient inference. Experiments on MuST-C En-Es, En-De, and En-Zh demonstrate that InfiniSST reduces computation-aware latency by 0.5 to 1 second while maintaining the same translation quality compared to baselines. Ablation studies further validate the contributions of our data construction and cache management strategy. Code is released at https://github.com/LeiLiLab/InfiniSST.
2025.findings-acl.157
@@ -13785,7 +13785,7 @@
ZongkaiWu
JohnLeeUniversity of Edinburgh, University of Edinburgh
Jenq-NengHwang
- LeiLi
+ LeiLi
10045-10056
In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image’s semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4.
2025.findings-acl.523
@@ -23213,7 +23213,7 @@
LegoMT2: Selective Asynchronous Sharded Data Parallel Training for Massive Neural Machine Translation
FeiYuan
YinquanLuShanghai AI Laboratory
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
JingjingXu
23359-23376
It is a critical challenge to learn a single model for massive languages. Prior methods focus on increasing the model size and training data size. However, large models are difficult to optimize efficiently even with distributed parallel training and translation capacity can interfere among languages. To address the challenge, we propose LegoMT2, an efficient training approach with an asymmetric multi-way model architecture for massive multilingual neural machine translation. LegoMT2 shards 435 languages into 8 language-centric groups and attributes one local encoder for each group’s languages and a mix encoder-decoder for all languages. LegoMT2 trains the model through local data parallel and asynchronous distributed updating of parameters. LegoMT2 is 16.2\times faster than the distributed training method for M2M-100-12B (which only for 100 languages) while improving the translation performance by an average of 2.2 BLEU on Flores-101, especially performing better for low-resource languages .
@@ -27104,7 +27104,7 @@
JinyuanXu
XueHe
Jenq-NengHwang
- LeiLi
+ LeiLi
1736-1750
Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment, however, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the Multi-Layer Attention Consistency Score (MACS), a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22% decrease in VRAM usage and 30% reduction in latency.
2025.findings-emnlp.91
@@ -28380,7 +28380,7 @@
XinglinZhangMedical Image Insights
TaoChenUniversity of Waterloo
Jenq-NengHwang
- LeiLi
+ LeiLi
3456-3467
Contrast-enhanced 3D Medical imaging (e.g., CT, MRI) leverages phase sequences to uncover temporal dynamics vital for diagnosing tumors, lesions, and vascular issues. However, current retrieval models primarily focus on spatial features, neglecting phase-specific progression detailed in clinical reports. We present the **Phase-aware Memory Network (PAMN)**, a novel framework enhancing 3D medical image retrieval by fusing imaging phases with diagnostic text. PAMN creates rich radiological representations that enhance diagnostic accuracy by combining image details with clinical report context, rigorously tested on a novel phase-series dataset of 12,230 hospital CT scans. PAMN achieves an effective balance of performance and scalability in 3D radiology retrieval, outperforming state-of-the-art baselines through the robust fusion of spatial, temporal, and textual information.
2025.findings-emnlp.184
@@ -38256,7 +38256,7 @@
WenhaoZhuByteDance Inc.
HanxuHuMicrosoft Research
ConghuiHeShanghai AI Lab
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
ShujianHuangNanjing University
FeiYuan
16751-16774
@@ -43657,7 +43657,7 @@
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels
- LeiLi
+ LeiLi
XiangxuZhangRenmin University of China
XiaoZhou
ZhengLiu
diff --git a/data/xml/2025.iwslt.xml b/data/xml/2025.iwslt.xml
index dafa1acfc8..fe3e308388 100644
--- a/data/xml/2025.iwslt.xml
+++ b/data/xml/2025.iwslt.xml
@@ -406,7 +406,7 @@
CMU’s IWSLT 2025 Simultaneous Speech Translation System
SiqiOuyangCarnegie Mellon University
XiXuCarnegie Mellon University
- LeiLiCarnegie Mellon University
+ LeiLiCarnegie Mellon University
309-314
This paper presents CMU’s submission to the IWSLT 2025 Simultaneous Speech Translation (SST) task for translating unsegmented English speech into Chinese and German text in a streaming manner. Our end-to-end speech-to-text system integrates a chunkwise causal Wav2Vec 2.0 speech encoder, an adapter, and the Qwen2.5-7B-Instruct as the decoder. We use a two-stage simultaneous training procedure on robust speech segments synthesized from LibriSpeech, CommonVoice, and VoxPopuli datasets, utilizing standard cross-entropy loss. Our model supports adjustable latency through a configurable latency multiplier. Experimental results demonstrate that our system achieves 44.3 BLEU for English-to-Chinese and 25.1 BLEU for English-to-German translations on the ACL60/60 development set, with computation-aware latencies of 2.7 seconds and 2.3 seconds, and theoretical latencies of 2.2 and 1.7 seconds, respectively.
2025.iwslt-1.31
diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml
index b6e65a2d99..62936b7d90 100644
--- a/data/xml/2025.naacl.xml
+++ b/data/xml/2025.naacl.xml
@@ -1282,7 +1282,7 @@
SiyuYuan
KaiZhang
YikaiZhang
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
YanghuaXiaoFudan University
1872-1888
Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.
@@ -3938,7 +3938,7 @@
ZhehuaiChen
VitalyLavrukhinNVIDIA
JagadeeshBalamNVIDIA
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
BorisGinsburgNVIDIA
5547-5557
Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text. Existing SMT methods only use the partial utterance that has already arrived at the input and the generated hypothesis. Motivated by human interpreters’ technique to forecast future words before hearing them, we propose Translation by Anticipating Future (TAF), a method to improve translation quality while retaining low latency. Its core idea is to use a large language model (LLM) to predict future source words and opportunistically translate without introducing too much risk. We evaluate our TAF and multiple baselines of SMT on four language directions. Experiments show that TAF achieves the best translation quality-latency trade-off and outperforms the baselines by up to 5 BLEU points at the same latency (three words).
@@ -4961,7 +4961,7 @@
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
XijiaTao
ShuaiZhong
- LeiLiUniversity of Hong Kong
+ LeiLiUniversity of Hong Kong
QiLiuUniversity of Hong Kong
LingpengKongDepartment of Computer Science, The University of Hong Kong
7048-7063
@@ -5567,7 +5567,7 @@
ShangZhou
DanqingWangCMU, Carnegie Mellon University
William YangWangUC Santa Barbara
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
7959-7973
Sampling is a basic operation for large language models (LLMs). In reinforcement learning rollouts and meta generation algorithms such as Best-of-N, it is essential to sample correct trajectories within a given compute budget. To find an optimal allocation for sample compute budgets, several choices need to be made:Which sampling configurations (model, temperature, language, etc.) to use?How many samples to generate in each configuration?We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations.Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks.is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.Our code and generations are released at https://github.com/LeiLiLab/OSCA.
2025.naacl-long.404
@@ -6287,7 +6287,7 @@
ChangMa
ShuaiYuan
QiushiSunUniversity of Hong Kong
- LeiLiSchool of Computer Science, Carnegie Mellon University
+ LeiLiSchool of Computer Science, Carnegie Mellon University
9077-9090
The lottery ticket hypothesis posits the existence of “winning tickets” within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens’ embedding of LLaMA suffices to reach the fine-tuning translation performance .
2025.naacl-long.458
diff --git a/data/xml/D18.xml b/data/xml/D18.xml
index 8f6b6858aa..38e37e4496 100644
--- a/data/xml/D18.xml
+++ b/data/xml/D18.xml
@@ -6212,7 +6212,7 @@
HaoyueShi
HaoZhou
JiazeChen
- LeiLi
+ LeiLi
4631–4641
D18-1492
D18-1492.Attachment.zip
diff --git a/data/xml/D19.xml b/data/xml/D19.xml
index 5848dd445b..4b3cf71bd4 100644
--- a/data/xml/D19.xml
+++ b/data/xml/D19.xml
@@ -953,7 +953,7 @@
ZhixingTan
JinsongSu
DeyiXiong
- LeiLi
+ LeiLi
803–812
In this study, we first investigate a novel capsule network with dynamic routing for linear time Neural Machine Translation (NMT), referred as CapsNMT. CapsNMT uses an aggregation mechanism to map the source sentence into a matrix with pre-determined size, and then applys a deep LSTM network to decode the target sequence from the source representation. Unlike the previous work (CITATION) to store the source sentence with a passive and bottom-up way, the dynamic routing policy encodes the source sentence with an iterative process to decide the credit attribution between nodes from lower and higher layers. CapsNMT has two core properties: it runs in time that is linear in the length of the sequences and provides a more flexible way to aggregate the part-whole information of the source sentence. On WMT14 English-German task and a larger WMT14 English-French task, CapsNMT achieves comparable results with the Transformer system. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems.
D19-1074
@@ -4288,7 +4288,7 @@
FuliLuo
ShunyaoLi
PengchengYang
- LeiLi
+ LeiLi
BaobaoChang
ZhifangSui
XuSun
@@ -8707,7 +8707,7 @@ The tutorial will bring researchers and practitioners to be aware of this issue,
Discreteness in Neural Natural Language Processing
LiliMou
HaoZhou
- LeiLi
+ LeiLi
This tutorial provides a comprehensive guide to the process of discreteness in neural NLP.
As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications.
diff --git a/data/xml/K19.xml b/data/xml/K19.xml
index bd138b9624..dc6a3cf72c 100644
--- a/data/xml/K19.xml
+++ b/data/xml/K19.xml
@@ -955,7 +955,7 @@
In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes
- LeiLi
+ LeiLi
WeiLiu
MarinaLitvak
NataliaVanetik
diff --git a/data/xml/N18.xml b/data/xml/N18.xml
index 9422c64272..8471d0c101 100644
--- a/data/xml/N18.xml
+++ b/data/xml/N18.xml
@@ -1409,7 +1409,7 @@
Reinforced Co-Training
JiaweiWu
- LeiLi
+ LeiLi
William YangWang
1252–1262
Co-training is a popular semi-supervised learning framework to utilize a large amount of unlabeled data in addition to a small labeled set. Co-training methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. However, the selection of samples in existing co-training methods is based on a predetermined policy, which ignores the sampling bias between the unlabeled and the labeled subsets, and fails to explore the data space. In this paper, we propose a novel method, Reinforced Co-Training, to select high-quality unlabeled samples to better co-train on. More specifically, our approach uses Q-learning to learn a data selection policy with a small labeled dataset, and then exploits this policy to train the co-training classifiers automatically. Experimental results on clickbait detection and generic text classification tasks demonstrate that our proposed method can obtain more accurate text classification results.
diff --git a/data/xml/P16.xml b/data/xml/P16.xml
index 5dcb56b7d4..03b1a5bcd3 100644
--- a/data/xml/P16.xml
+++ b/data/xml/P16.xml
@@ -817,7 +817,7 @@
CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases
ZihangDai
- LeiLi
+ LeiLi
WeiXu
800–810
P16-1076
diff --git a/data/xml/P19.xml b/data/xml/P19.xml
index 0cdab3ee01..f954e88bac 100644
--- a/data/xml/P19.xml
+++ b/data/xml/P19.xml
@@ -2488,7 +2488,7 @@
Enhancing Topic-to-Essay Generation with External Commonsense Knowledge
PengchengYang
- LeiLi
+ LeiLi
FuliLuo
TianyuLiu
XuSun
@@ -3286,7 +3286,7 @@
PengchengYang
ZhihanZhang
FuliLuo
- LeiLi
+ LeiLi
ChengyangHuang
XuSun
2680–2686
@@ -7124,7 +7124,7 @@
HuangzhaoZhang
HaoZhou
NingMiao
- LeiLi
+ LeiLi
5564–5569
Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHAoutperforms the baseline model on attacking capability. Adversarial training with MHA also leads to better robustness and performance.
P19-1559
@@ -7669,7 +7669,7 @@
YuBao
HaoZhou
ShujianHuang
- LeiLi
+ LeiLi
LiliMou
OlgaVechtomova
Xin-yuDai
@@ -7853,7 +7853,7 @@
YunxuanXiao
YanruQu
HaoZhou
- LeiLi
+ LeiLi
WeinanZhang
YongYu
6140–6150
@@ -8732,7 +8732,7 @@
Automatic Generation of Personalized Comment Based on User Profile
WenhuanZeng
AbulikemuAbuduweili
- LeiLi
+ LeiLi
PengchengYang
229–235
Comments on social media are very diverse, in terms of content, style and vocabulary, which make generating comments much more challenging than other existing natural language generation (NLG) tasks. Besides, since different user has different expression habits, it is necessary to take the user’s profile into consideration when generating comments. In this paper, we introduce the task of automatic generation of personalized comment (AGPC) for social media. Based on tens of thousands of users’ real comments and corresponding user profiles on weibo, we propose Personalized Comment Generation Network (PCGN) for AGPC. The model utilizes user feature embedding with a gated memory and attends to user description to model personality of users. In addition, external user representation is taken into consideration during the decoding to enhance the comments generation. Experimental results show that our model can generate natural, human-like and personalized comments.
diff --git a/data/xml/W13.xml b/data/xml/W13.xml
index b3a11bbd51..227482dfaf 100644
--- a/data/xml/W13.xml
+++ b/data/xml/W13.xml
@@ -5020,7 +5020,7 @@
Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
- LeiLi
+ LeiLi
CorinaForascu
MahmoudEl-Haj
GeorgeGiannakopoulos
@@ -5056,7 +5056,7 @@
CIST System Report for ACL MultiLing 2013 – Track 1: Multilingual Multi-document Summarization
- LeiLi
+ LeiLi
WeiHeng
JiaYu
YuLiu
diff --git a/data/xml/W14.xml b/data/xml/W14.xml
index 493fcd347b..3fe049893c 100644
--- a/data/xml/W14.xml
+++ b/data/xml/W14.xml
@@ -11786,7 +11786,7 @@
XiaoyueCong
FangHuang
HongfaXue
- LeiLi
+ LeiLi
ZhiqiaoGao
114–119
W14-6818
diff --git a/data/xml/W16.xml b/data/xml/W16.xml
index 41fd606e0d..944a6fe4c3 100644
--- a/data/xml/W16.xml
+++ b/data/xml/W16.xml
@@ -2289,7 +2289,7 @@
CIST System for CL-SciSumm 2016 Shared Task
- LeiLi
+ LeiLi
LiyuanMao
YazhaoZhang
JunqiChi
diff --git a/data/xml/W17.xml b/data/xml/W17.xml
index 06300c5187..20942f770b 100644
--- a/data/xml/W17.xml
+++ b/data/xml/W17.xml
@@ -1679,7 +1679,7 @@
Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums
- LeiLi
+ LeiLi
LiyuanMao
MoyeChen
32–36
@@ -4186,7 +4186,7 @@ is able to handle phenomena related to scope by means of an higher-order type th
DanchenZhang
DaqingHe
SanqiangZhao
- LeiLi
+ LeiLi
263–271
W17-2333
10.18653/v1/W17-2333
diff --git a/data/xml/W19.xml b/data/xml/W19.xml
index 441404c593..518a273da1 100644
--- a/data/xml/W19.xml
+++ b/data/xml/W19.xml
@@ -17436,7 +17436,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la
YaoFu
HaoZhou
JiazeChen
- LeiLi
+ LeiLi
24–33
Text attribute transfer is modifying certain linguistic attributes (e.g. sentiment, style, author-ship, etc.) of a sentence and transforming them from one type to another. In this paper, we aim to analyze and interpret what is changed during the transfer process. We start from the observation that in many existing models and datasets, certain words within a sentence play important roles in determining the sentence attribute class. These words are referred as the Pivot Words. Based on these pivot words, we propose a lexical analysis framework, the Pivot Analysis, to quantitatively analyze the effects of these words in text attribute classification and transfer. We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification,while leaving higher-level sentence structures unchanged. Our work provides an in-depth understanding of linguistic attribute transfer and further identifies the future requirements and challenges of this task
W19-8604
@@ -18512,7 +18512,7 @@ In this tutorial on MT and post-editing we would like to continue sharing the la
Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus
WeiLiu
- LeiLi
+ LeiLi
ZuyingHuang
YinanLiu
17–25
diff --git a/data/xml/Y06.xml b/data/xml/Y06.xml
index 8d5cfaeb92..317f8c12a5 100644
--- a/data/xml/Y06.xml
+++ b/data/xml/Y06.xml
@@ -669,7 +669,7 @@
Research on Olympics-oriented Mobile Game News Ordering System
YongguiYang
- LeiLi
+ LeiLi
459–462
Y06-1069
http://hdl.handle.net/2065/29047
diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml
index 20ab849f87..6ca575802e 100644
--- a/data/yaml/name_variants.yaml
+++ b/data/yaml/name_variants.yaml
@@ -5738,6 +5738,49 @@
- canonical: {first: Junhui, last: Li}
variants:
- {first: JunHui, last: Li}
+- canonical: {first: Lei, last: Li}
+ id: lei-li
+ comment: May refer to several people
+- canonical: {first: Lei, last: Li}
+ id: lei-li-cmu
+ orcid: 0000-0003-3095-9776
+ comment: Carnegie Mellon University
+ institution: Carnegie Mellon University
+- canonical: {first: Lei, last: Li}
+ id: lei-li-hku
+ orcid: 0009-0008-6984-5104
+ comment: University of Hong Kong
+ institution: University of Hong Kong
+- canonical: {first: Lei, last: Li}
+ id: lei-li-hkbu
+ orcid: 0000-0002-5631-2519
+ comment: Hong Kong Baptist University
+ institution: Hong Kong Baptist University
+- canonical: {first: Lei, last: Li}
+ id: lei-li-bupt
+ orcid: 0000-0002-3204-6527
+ comment: Beijing University of Posts and Telecommunications
+ institution: Beijing University of Posts and Telecommunications
+- canonical: {first: Lei, last: Li}
+ id: lei-li-zju
+ orcid: 0000-0002-7456-2204
+ comment: Zhejiang University
+ institution: Zhejiang University
+- canonical: {first: Lei, last: Li}
+ id: lei-li-renmin
+ orcid: 0000-0001-5660-0409
+ comment: Renmin University
+ institution: Renmin University
+- canonical: {first: Lei, last: Li}
+ id: lei-li-ecnu
+ orcid: 0000-0002-8891-1786
+ comment: ECNU
+ institution: East China Normal University
+- canonical: {first: Lei, last: Li}
+ id: lei-li-ucph
+ orcid: 0000-0002-2929-0828
+ comment: University of Copenhagen
+ institution: University of Copenhagen
- canonical: {first: Shih-Min, last: Li}
variants:
- {first: Shi-Min, last: Li}