Skip to content

Commit 346affb

Browse files
Ingestion: MTSUMMIT (#5534)
1 parent 55ffb4b commit 346affb

File tree

11 files changed

+1293
-0
lines changed

11 files changed

+1293
-0
lines changed

data/xml/2013.mtsummit.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -714,6 +714,7 @@
714714
<year>2013</year>
715715
<editor><first>Shoichi</first><last>Yokoyama</last></editor>
716716
<venue>mtsummit</venue>
717+
<venue>pslt</venue>
717718
</meta>
718719
<frontmatter>
719720
<url hash="2147e2c6">2013.mtsummit-wpt.0</url>

data/xml/2015.mtsummit.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,7 @@
445445
<month>October 30 – November 3</month>
446446
<year>2015</year>
447447
<venue>mtsummit</venue>
448+
<venue>pslt</venue>
448449
</meta>
449450
<paper id="1">
450451
<title>Full-text patent translation at <fixed-case>WIPO</fixed-case>; scalability, quality and usability</title>

data/xml/2025.aielpl.xml

Lines changed: 127 additions & 0 deletions
Large diffs are not rendered by default.

data/xml/2025.at4ssl.xml

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
<?xml version='1.0' encoding='UTF-8'?>
2+
<collection id="2025.at4ssl">
3+
<volume id="1" ingest-date="2025-08-07" type="proceedings">
4+
<meta>
5+
<booktitle>Proceedings of the Third International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)</booktitle>
6+
<editor><first>Dimitar</first><last>Shterionov</last></editor>
7+
<editor><first>Mirella De</first><last>Sisto</last></editor>
8+
<editor><first>Bram</first><last>Vanroy</last></editor>
9+
<editor><first>Vincent</first><last>Vandeghinste</last></editor>
10+
<editor><first>Victoria</first><last>Nyst</last></editor>
11+
<editor><first>Myriam</first><last>Vermeerbergen</last></editor>
12+
<editor><first>Floris</first><last>Roelofsen</last></editor>
13+
<editor><first>Lisa</first><last>Lepp</last></editor>
14+
<editor><first>Irene</first><last>Strasly</last></editor>
15+
<publisher>European Association for Machine Translation</publisher>
16+
<address>Geneva, Switzerland</address>
17+
<month>June</month>
18+
<year>2025</year>
19+
<url hash="ad6c6129">2025.at4ssl-1</url>
20+
<venue>at4ssl</venue>
21+
<isbn>978-2-9701897-3-2</isbn>
22+
</meta>
23+
<frontmatter>
24+
<url hash="68b3c4fa">2025.at4ssl-1.0</url>
25+
<bibkey>at4ssl-2025-1</bibkey>
26+
</frontmatter>
27+
<paper id="1">
28+
<title>Pose-Based Sign Language Appearance Transfer</title>
29+
<author><first>Amit</first><last>Moryossef</last></author>
30+
<author><first>Gerard</first><last>Sant</last></author>
31+
<author><first>Zifan</first><last>Jiang</last></author>
32+
<pages>1–6</pages>
33+
<abstract>We introduce a method for transferring the signer’s appearance in sign language skeletal poses while preserving the sign content. Using estimated poses, we transfer the appearance of one signer to another, maintaining natural movements and transitions. This approach improves pose-based rendering and sign stitching while obfuscating identity. Our experiments show that while the method reduces signer identification accuracy, it slightly harms sign recognition performance, highlighting a tradeoff between privacy and utility.</abstract>
34+
<url hash="daaabcf5">2025.at4ssl-1.1</url>
35+
<bibkey>moryossef-etal-2025-pose</bibkey>
36+
</paper>
37+
<paper id="2">
38+
<title>Spontaneous <fixed-case>C</fixed-case>atalan <fixed-case>S</fixed-case>ign <fixed-case>L</fixed-case>anguage Recognition: Data Acquisition and Classification</title>
39+
<author><first>Naiara</first><last>Garmendia</last></author>
40+
<author><first>Horacio</first><last>Saggion</last></author>
41+
<author><first>Euan</first><last>McGill</last></author>
42+
<pages>7–15</pages>
43+
<abstract>This work presents the first investigation into Spontaneous Isolated Sign Language Recognition for Catalan Sign Language (LSC). Our work is grounded on the derivation of a dataset of signs and their glosses from a corpus of spontaneous dialogues and monologues. The recognition model is based on a Multi-Scale Graph Convolutional network fitted to our data. Results are promising since several signs are recognized with a high level of accuracy, and an average accuracy of 71% on the top 5 predicted classes from a total of 105 available. An interactive interface with experimental results is also presented. The data and software are made available to the research community.</abstract>
44+
<url hash="1dfd5e1e">2025.at4ssl-1.2</url>
45+
<bibkey>garmendia-etal-2025-spontaneous</bibkey>
46+
</paper>
47+
<paper id="3">
48+
<title>User Involvement in the Research and Development Life Cycle of Sign Language Machine Translation Systems</title>
49+
<author><first>Lisa</first><last>Lepp</last></author>
50+
<author><first>Dimitar</first><last>Shterionov</last></author>
51+
<author><first>Mirella De</first><last>Sisto</last></author>
52+
<pages>16–36</pages>
53+
<abstract>Machine translation (MT) has evolved rapidly over the last 70 years thanks to the advances in processing technology, methodologies as well as the ever-increasing volumes of data. This trend is observed in the context of MT for spoken languages. However, when it comes to sign languages (SL) translation technologies, the progress is much slower; SLMT is still in its infancy with limited applications. One of the main factors for this set back is the lack of effective, respectful and fair user involvement across the different phases of the research and development of SLMT. We present a meta-review of 111 articles on SLMT from the perspective of user involvement. Our analysis investigates what users are involved and what tasks they assume in the first four phrases of MT research: (i) Problem and definition, (ii) Dataset construction, (iii) Model Design and Training, (iv) Model Validation and Evaluation. We find out that users have primarily been involved as data creators and monitors as well as evaluators. We assess that effective co-creation, as defined in (Lepp et al., 2025), has not been performed and conclude with recommendations for improving the MT research and development landscape from a co-creative perspective.</abstract>
54+
<url hash="8ed9c1d5">2025.at4ssl-1.3</url>
55+
<bibkey>lepp-etal-2025-user</bibkey>
56+
</paper>
57+
<paper id="4">
58+
<title><fixed-case>P</fixed-case>a<fixed-case>SC</fixed-case>o1: A Parallel Video-<fixed-case>S</fixed-case>i<fixed-case>GML</fixed-case> <fixed-case>S</fixed-case>wiss <fixed-case>F</fixed-case>rench <fixed-case>S</fixed-case>ign <fixed-case>L</fixed-case>anguage Corpus in Medical Domain</title>
59+
<author><first>Bastien</first><last>David</last></author>
60+
<author><first>Pierrette</first><last>Bouillon</last></author>
61+
<author><first>Jonathan</first><last>Mutal</last></author>
62+
<author><first>Irene</first><last>Strasly</last></author>
63+
<author><first>Johanna</first><last>Gerlach</last></author>
64+
<author><first>Hervé</first><last>Spechbach</last></author>
65+
<pages>37–43</pages>
66+
<abstract>This article introduces the parallel sign language translation corpus, PaSCo1, developed as part of the BabelDr project, an automatic speech translation system for medical triage. PaSCo1 aims to make a set of medical data available in Swiss French Sign Language (LSF-CH) in the form of both videos signed by a human and their description in G-SiGML mark-up language. We describe the beginnings of the corpus as part of the BabelDr project, as well as the methodology used to create the videos and generate the G-SiGML language using the SiGLA platform. The resulting FAIR corpus comprises 2 031 medical questions and instructions in the form of videos and G-SiGML code.</abstract>
67+
<url hash="e9744d8d">2025.at4ssl-1.4</url>
68+
<bibkey>david-etal-2025-pasco1</bibkey>
69+
</paper>
70+
</volume>
71+
</collection>

data/xml/2025.ctt.xml

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
<?xml version='1.0' encoding='UTF-8'?>
2+
<collection id="2025.ctt">
3+
<volume id="1" ingest-date="2025-08-07" type="proceedings">
4+
<meta>
5+
<booktitle>Proceedings of the Second Workshop on Creative-text Translation and Technology (CTT)</booktitle>
6+
<editor><first>Bram</first><last>Vanroy</last></editor>
7+
<editor><first>Marie-Aude</first><last>Lefer</last></editor>
8+
<editor><first>Lieve</first><last>Macken</last></editor>
9+
<editor><first>Paola</first><last>Ruffo</last></editor>
10+
<editor><first>Ana Guerberof</first><last>Arenas</last></editor>
11+
<editor><first>Damien</first><last>Hansen</last></editor>
12+
<publisher>European Association for Machine Translation</publisher>
13+
<address>Geneva, Switzerland</address>
14+
<month>June</month>
15+
<year>2025</year>
16+
<url hash="e62d4cdf">2025.ctt-1</url>
17+
<venue>ctt</venue>
18+
<isbn>978-2-9701897-6-3</isbn>
19+
</meta>
20+
<frontmatter>
21+
<url hash="77b7cbe9">2025.ctt-1.0</url>
22+
<bibkey>ctt-2025-1</bibkey>
23+
</frontmatter>
24+
<paper id="1">
25+
<title>The Role of Translation Workflows in Overcoming Translation Difficulties: A Comparative Analysis of Human and Machine Translation (Post-Editing) Approaches</title>
26+
<author><first>Lieve</first><last>Macken</last></author>
27+
<author><first>Paola</first><last>Ruffo</last></author>
28+
<author><first>Joke</first><last>Daems</last></author>
29+
<pages>1–13</pages>
30+
<abstract>This study investigates the impact of different translation workflows and underlying machine translation technologies on the translation strategies used in literary translations. We compare human translation, translation within a computer-assisted translation (CAT) tool, and machine translation post-editing (MTPE), alongside neural machine translation (NMT) and large language models (LLMs). Using three short stories translated from English into Dutch, we annotated translation difficulties and strategies employed to overcome them. Our analysis reveals differences in translation solutions across modalities, highlighting the influence of technology on the final translation. The findings suggest that while MTPE tends to produce more literal translations, human translators and CAT tools exhibit greater creativity and employ more non-literal translation strategies. Additionally, LLMs reduced the number of literal translation solutions compared to traditional NMT systems. While our study provides valuable insights, it is limited by the use of only three texts and a single language pair. Further research is needed to explore these dynamics across a broader range of texts and languages, to better understand the full impact of translation workflows and technologies on literary translation.</abstract>
31+
<url hash="ccb910a1">2025.ctt-1.1</url>
32+
<bibkey>macken-etal-2025-role</bibkey>
33+
</paper>
34+
<paper id="2">
35+
<title>Does the perceived source of a translation (<fixed-case>NMT</fixed-case> vs. <fixed-case>HT</fixed-case>) impact student revision quality for news and literary texts?</title>
36+
<author><first>Xiaoye</first><last>Li</last></author>
37+
<author><first>Joke</first><last>Daems</last></author>
38+
<pages>14–26</pages>
39+
<abstract>With quality improvements in neural machine translation (NMT), scholars have argued that human translation revision and MT post-editing are becoming more alike, which would have implications for translator training. This study contributes to this growing body of work by exploring the ability of student translators (ZH-EN) to distinguish between NMT and human translation (HT) for news text and literary text and analyses how text type and student perceptions influence their subsequent revision process. We found that participants were reasonably adept at distinguishing between NMT and HT, particularly for literary texts. Participants’ revision quality was dependent on the text type as well as the perceived source of translation. The findings also highlight student translators’ limited competence in revision and post-editing, emphasizing the need to integrate NMT, revision, and post-editing into translation training programmes.</abstract>
40+
<url hash="b38bf2af">2025.ctt-1.2</url>
41+
<bibkey>li-daems-2025-perceived</bibkey>
42+
</paper>
43+
<paper id="3">
44+
<title>Effects of Domain-adapted Machine Translation on the Machine Translation User Experience of Video Game Translators</title>
45+
<author><first>Judith</first><last>Brenner</last></author>
46+
<author><first>Julia</first><last>Othlinghaus-Wulhorst</last></author>
47+
<pages>27–43</pages>
48+
<abstract>In this empirical study we examine three different translation modes with varying involvement of machine translation (MT) post-editing (PE) when translating video game texts. The three translation modes are translation from scratch without MT, full PE of MT output in a static way, and flexible PE as a combination of translation from scratch and post-editing of only those machine-translated sentences deemed useful by the translator. Data generation took place at the home offices of freelance game translators. In a mixed-methods approach, quantitative data was generated through keylogging, eye tracking, error annotation, and user experience questionnaires as well as qualitative data through interviews. Results show a negative perception of PE and suggest that translators’ user experience is positive when translating from scratch, neutral with a positive tendency when doing flexible PE of domain-adapted MT output and negative with static PE of generic MT output.</abstract>
49+
<url hash="9d60e590">2025.ctt-1.3</url>
50+
<bibkey>brenner-othlinghaus-wulhorst-2025-effects</bibkey>
51+
</paper>
52+
<paper id="4">
53+
<title>Fine-tuning and evaluation of <fixed-case>NMT</fixed-case> models for literary texts using <fixed-case>R</fixed-case>om<fixed-case>C</fixed-case>ro v.2.0</title>
54+
<author><first>Bojana</first><last>Mikelenić</last></author>
55+
<author><first>Antoni</first><last>Oliver</last></author>
56+
<author><first>Sergi Àlvarez</first><last>Vidal</last></author>
57+
<pages>44–51</pages>
58+
<abstract>This paper explores the fine-tuning and evaluation of neural machine translation (NMT) models for literary texts using RomCro v.2.0, an expanded multilingual and multidirectional parallel corpus. RomCro v.2.0 is based on RomCro v.1.0, but includes additional literary works, as well as texts in Catalan, making it a valuable resource for improving MT in underrepresented language pairs. Given the challenges of literary translation, where style, narrative voice, and cultural nuances must be preserved, fine-tuning on high-quality domain-specific data is essential for enhancing MT performance. We fine-tune existing NMT models with RomCro v.2.0 and evaluate their performance for six different language combinations using automatic metrics and for Spanish-Croatian and French-Catalan using manual evaluation. Results indicate that fine-tuned models outperform general-purpose systems, achieving greater fluency and stylistic coherence. These findings support the effectiveness of corpus-driven fine-tuning for literary translation and highlight the importance of curated high-quality corpus.</abstract>
59+
<url hash="823c5046">2025.ctt-1.4</url>
60+
<bibkey>mikelenic-etal-2025-fine</bibkey>
61+
</paper>
62+
<paper id="5">
63+
<title>Can Peter Pan Survive <fixed-case>MT</fixed-case>? A Stylometric Study of <fixed-case>LLM</fixed-case>s, <fixed-case>NMT</fixed-case>s, and <fixed-case>HT</fixed-case>s in Children’s Literature Translation</title>
64+
<author><first>Delu</first><last>Kong</last></author>
65+
<author><first>Lieve</first><last>Macken</last></author>
66+
<pages>52–70</pages>
67+
<abstract>This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in children’s literature translation (CLT) from a stylometric perspective. The research constructs a extitPeter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs). The analysis employs a generic feature set (including lexical, syntactic, readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhyme, translatability, and miscellaneous levels, yielding 447 linguistic features in total. Using classification and clustering techniques in machine learning, we conduct a stylometric analysis of these translations. Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-一样, while NMTs and LLMs show significant variation in descriptive words usage and adverb ratios. Regarding CTT-specific features, LLMs outperform NMTs in distribution, aligning more closely with HTs in stylistic characteristics, demonstrating the potential of LLMs in CLT.</abstract>
68+
<url hash="32c4e38d">2025.ctt-1.5</url>
69+
<bibkey>kong-macken-2025-peter</bibkey>
70+
</paper>
71+
</volume>
72+
</collection>

0 commit comments

Comments
 (0)