machine_translation

Abstract

Machine translation is a central task in natural language processing. It aims to translate text between languages to overcome communication barriers. Traditional machine translation methods, such as phrase-based or rule-based models, often fall short in handling language complexities ranging from idiomatic expressions to syntactic nuances. Machine translation with deep learning, especially with sequence-to-sequence (seq2seq) model, has achieved substantial improvements in translation performance due to its ability to capture and learn complex language mappings [1].

This project focuses on understanding how dataset characteristics affect machine translation performance. Specifically, we will compare the impact of formal, structured datasets versus informal, conversational ones. We will use French-to-English translation as our test case, given the availability of large, freely accessible datasets that fit these criteria. Our goal is to analyze how current machine translation models respond to diverse linguistic styles.

To achieve this, we will train and test the same model on each dataset type separately, and then evaluate how the model performs when applied across dataset types. By testing each model on examples from both datasets, we aim to determine which dataset type better supports generalization in machine translation.

Understanding which types of data best support generalization will inform the development of future machine translation models. Rather than focusing solely on model advancements, we can improve machine translation performance by building datasets tailored to enhance model training outcomes and foster more effective and adaptable translation systems.

Directory Organization

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
models		models
notebooks		notebooks
results		results
0.4		0.4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

machine_translation

Abstract

Directory Organization

About

Uh oh!

Releases

Packages

Languages

itserichuynh/machine_translation

Folders and files

Latest commit

History

Repository files navigation

machine_translation

Abstract

Directory Organization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages