Skip to content

ishanbakshi91/PGP-AIML-CapstoneProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Translation Capstone Project

PROBLEM STATEMENT

• DOMAIN: MACHINE TRANSLATION

• CONTEXT:

Machine Translation is the automated translation of source material into another language without human intervention. The database comes from ACL2014 Ninth Workshop on Statistical Machine Translation. This workshop mainly focuses on language translation between European language pairs. The idea behind the workshop is to provide the ability for two parties to communicate and exchange ideas from different countries.

• DATA DESCRIPTION:

The database is basically sentences in German/English of various events. Three datasets are obtained from the Statistical Machine Translation workshop. Either the dataset can be downloaded from the link or can be used from the shared files. Three datasets are,

  1. Europarl v7
  2. Common Crawl corpus
  3. News Commentary

Link to download the dataset: https://statmt.org/wmt14/translation-task.html\

• PROJECT OBJECTIVE:

Design a Machine Translation model that can be used to translate sentences from German language to English language or vice-versa.


• PROJECT TASK:

1. Milestone 1:

‣ Input: Context and Dataset
‣ Process:

‣ Step 1: Import and merge all three datasets
‣ Step 2: Data cleansing
‣ Step 3: NLP pre-processing - Dataset suitable to be used for AIML model learning
‣ Step 4: Design, train and test simple RNN & LSTM model
‣ Step 5: Interim report

‣ Submission: Interim report, Jupyter Notebook with all the steps in Milestone-1

2. Milestone 2:

‣ Input: Preprocessed output from Milestone-1
‣ Process:

‣ Step 1: Design, train and test RNN & LSTM model with embeddings
‣ Step 2: Design, train and test the bidirectional RNN & LSTM model
‣ Step 3: Design, train and test Encoder-Decoder RNN & LSTM model (Optional-If interested can try, but marks will not be reduced if not attempted)
‣ Step 4: Choose the best-performing model and pickle it.
‣ Step 5: Final Report

‣ Submission: Final report, Jupyter Notebook with all the steps in Milestone-1 and Milestone-2

3. Milestone 3: [Optional]

‣ Process:

‣ Step 1: Design a clickable UI-based Translation interface.
Hint: Input - Sentence in one language(German/English), Output - Translated sentence in other languages (English/German)

‣ Submission: Final report, Jupyter Notebook with the addition of clickable UI-based interface

About

UT Austin-Great Lakes-PGP AIML Capstone Project for the Post Graduate Program in Artificial Intelligence and Machine Learning designed by leading academic and industry experts and faculty recognised by The University of Texas at Austin and Great Lakes University. Facilitated by Great Learning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors