acl-org · weissenh · Nov 6, 2025 · Nov 6, 2025 · Nov 6, 2025 · Nov 7, 2025
diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml
@@ -4234,7 +4234,7 @@
       <author><first>Ning</first><last>Miao</last></author>
       <author><first>Yuxuan</first><last>Song</last></author>
       <author><first>Hao</first><last>Zhou</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>3436–3441</pages>
       <abstract>It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach.</abstract>
       <url hash="5c7e1235">2020.acl-main.314</url>
@@ -10481,7 +10481,7 @@
       <author><first>Xijin</first><last>Zhang</last></author>
       <author><first>Songcheng</first><last>Jiang</last></author>
       <author><first>Yuxuan</first><last>Wang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>1–8</pages>
       <abstract>This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four inte- gral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multi- lingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.</abstract>
       <url hash="a9a9e7e8">2020.acl-demos.1</url>

diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml
@@ -1707,7 +1707,7 @@
       <author><first>Shuang</first><last>Zeng</last></author>
       <author><first>Runxin</first><last>Xu</last></author>
       <author><first>Baobao</first><last>Chang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>1630–1640</pages>
       <abstract>Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across paragraphs. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN), a method to recognize such relations for long paragraphs. GAIN constructs two graphs, a heterogeneous mention-level graph (MG) and an entity-level graph (EG). The former captures complex interaction among different mentions and the latter aggregates mentions underlying for the same entities. Based on the graphs we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at <url>https://github.com/PKUnlp-icler/GAIN</url>.</abstract>
       <url hash="f205ef83">2020.emnlp-main.127</url>
@@ -2836,7 +2836,7 @@
       <author><first>Xipeng</first><last>Qiu</last></author>
       <author><first>Jiangtao</first><last>Feng</last></author>
       <author><first>Hao</first><last>Zhou</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>2649–2663</pages>
       <abstract>We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at <url>https://github.com/linzehui/mRASP</url>.</abstract>
       <url hash="a0f25581">2020.emnlp-main.210</url>
@@ -9842,7 +9842,7 @@
       <author><first>Junxian</first><last>He</last></author>
       <author><first>Mingxuan</first><last>Wang</last></author>
       <author><first>Yiming</first><last>Yang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>9119–9130</pages>
       <abstract>Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at <url>https://github.com/bohanli/BERT-flow</url>.</abstract>
       <url hash="b156fa71">2020.emnlp-main.733</url>

diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml
@@ -1465,7 +1465,7 @@
       <title>Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced <fixed-case>M</fixed-case>onte-<fixed-case>C</fixed-case>arlo Approach</title>
       <author><first>Maosen</first><last>Zhang</last></author>
       <author><first>Nan</first><last>Jiang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <author><first>Yexiang</first><last>Xue</last></author>
       <pages>1286–1298</pages>
       <abstract>Generating natural language under complex constraints is a principled formulation towards controllable text generation. We present a framework to allow specification of combinatorial constraints for sentence generation. We propose TSMC, an efficient method to generate high likelihood sentences with respect to a pre-trained language model while satisfying the constraints. Our approach is highly flexible, requires no task-specific train- ing, and leverages efficient constraint satisfaction solving techniques. To better handle the combinatorial constraints, a tree search algorithm is embedded into the proposal process of the Markov Chain Monte Carlo (MCMC) to explore candidates that satisfy more constraints. Compared to existing MCMC approaches, our sampling approach has a better mixing performance. Experiments show that TSMC achieves consistent and significant improvement on multiple language generation tasks.</abstract>
@@ -5726,7 +5726,7 @@
       <author><first>Mingxuan</first><last>Wang</last></author>
       <author><first>Weinan</first><last>Zhang</last></author>
       <author><first>Yong</first><last>Yu</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>4908–4917</pages>
       <abstract>Active learning for sentence understanding aims at discovering informative unlabeled data for annotation and therefore reducing the demand for labeled data. We argue that the typical uncertainty sampling method for active learning is time-consuming and can hardly work in real-time, which may lead to ineffective sample selection. We propose adversarial uncertainty sampling in discrete space (AUSDS) to retrieve informative unlabeled samples more efficiently. AUSDS maps sentences into latent space generated by the popular pre-trained language models, and discover informative unlabeled text samples for annotation via adversarial attack. The proposed approach is extremely efficient compared with traditional uncertainty sampling with more than 10x speedup. Experimental results on five datasets show that AUSDS outperforms strong baselines on effectiveness.</abstract>
       <url hash="a49de01f">2020.findings-emnlp.441</url>

diff --git a/data/xml/2020.fnp.xml b/data/xml/2020.fnp.xml
@@ -194,7 +194,7 @@
     </paper>
     <paper id="17">
       <title>Extractive Financial Narrative Summarisation based on <fixed-case>DPP</fixed-case>s</title>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-bupt"><first>Lei</first><last>Li</last></author>
       <author><first>Yafei</first><last>Jiang</last></author>
       <author><first>Yinan</first><last>Liu</last></author>
       <pages>100–104</pages>

diff --git a/data/xml/2020.sdp.xml b/data/xml/2020.sdp.xml
@@ -349,7 +349,7 @@
     </paper>
     <paper id="25">
       <title><fixed-case>CIST</fixed-case>@<fixed-case>CL</fixed-case>-<fixed-case>S</fixed-case>ci<fixed-case>S</fixed-case>umm 2020, <fixed-case>L</fixed-case>ong<fixed-case>S</fixed-case>umm 2020: Automatic Scientific Document Summarization</title>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-bupt"><first>Lei</first><last>Li</last></author>
       <author><first>Yang</first><last>Xie</last></author>
       <author id="wei-liu-kcl"><first>Wei</first><last>Liu</last></author>
       <author><first>Yinan</first><last>Liu</last></author>

diff --git a/data/xml/2020.wmt.xml b/data/xml/2020.wmt.xml
@@ -471,7 +471,7 @@
       <author><first>Zehui</first><last>Lin</last></author>
       <author><first>Yaoming</first><last>Zhu</last></author>
       <author><first>Mingxuan</first><last>Wang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>305–312</pages>
       <abstract>This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training.</abstract>
       <url hash="58264a1d">2020.wmt-1.33</url>
@@ -1443,7 +1443,7 @@
       <author><first>Zhuo</first><last>Zhi</last></author>
       <author><first>Jun</first><last>Cao</last></author>
       <author><first>Mingxuan</first><last>Wang</last></author>
-      <author><first>Lei</first><last>Li</last></author>
+      <author id="lei-li-cmu"><first>Lei</first><last>Li</last></author>
       <pages>985–990</pages>
       <abstract>In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining mod- ule adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions.</abstract>
       <url hash="98a59e41">2020.wmt-1.112</url>