Machine Translation is the automated translation of source material into another language without human intervention. The database comes from ACL2014 Ninth Workshop on Statistical Machine Translation. This workshop mainly focuses on language translation between European language pairs. The idea behind the workshop is to provide the ability for two parties to communicate and exchange ideas from different countries.
The database is basically sentences in German/English of various events. Three datasets are obtained from the Statistical Machine Translation workshop. Either the dataset can be downloaded from the link or can be used from the shared files. Three datasets are,
- Europarl v7
- Common Crawl corpus
- News Commentary
Link to download the dataset: https://statmt.org/wmt14/translation-task.html\
Design a Machine Translation model that can be used to translate sentences from German language to English language or vice-versa.
‣ Input: Context and Dataset
‣ Process:
‣ Step 1: Import and merge all three datasets
‣ Step 2: Data cleansing
‣ Step 3: NLP pre-processing - Dataset suitable to be used for AIML model learning
‣ Step 4: Design, train and test simple RNN & LSTM model
‣ Step 5: Interim report
‣ Submission: Interim report, Jupyter Notebook with all the steps in Milestone-1
‣ Input: Preprocessed output from Milestone-1
‣ Process:
‣ Step 1: Design, train and test RNN & LSTM model with embeddings
‣ Step 2: Design, train and test the bidirectional RNN & LSTM model
‣ Step 3: Design, train and test Encoder-Decoder RNN & LSTM model (Optional-If interested can try, but marks will not be reduced if not attempted)
‣ Step 4: Choose the best-performing model and pickle it.
‣ Step 5: Final Report
‣ Submission: Final report, Jupyter Notebook with all the steps in Milestone-1 and Milestone-2
‣ Process:
‣ Step 1: Design a clickable UI-based Translation interface.
Hint: Input - Sentence in one language(German/English), Output - Translated sentence in other languages (English/German)
‣ Submission: Final report, Jupyter Notebook with the addition of clickable UI-based interface