Skip to content

minseoc03/ko-en-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌏 Korean-English Translation Model

This repository contains a Transformer-based model for Korean to English translation. The model was trained on conversational data to provide natural translations between Korean and English languages.

📌 Overview

This project implements a Neural Machine Translation (NMT) system using the Transformer architecture. It was trained on a dataset of 100,000 conversational Korean-English sentence pairs from AI-HUB, a free community resource created to promote AI development in Korea.

✨ Features

✅ Transformer-based architecture for high-quality translations
✅ Trained on conversational data for natural translations
✅ Easy configuration through YAML files using Hydra
✅ Simple inference interface
✅ Achieved 🔥 29.9 BLEU Score

📂 Repository Structure

ko-en-translation/
│── main.py                   # Entry point
│── dataset.py                # Data preprocessing
│── transformer.py            # Transformer model implementation
│── trainer.py                # Training script
│── translation.py            # Inference script
│── requirements.txt          # Dependencies
│── LICENSE                   # License file
│── README.md                 # Documentation
│── conf/                     # Configuration files
│   ├── config.yaml
│   ├── model/
│   │   ├── transformer.yaml
│   ├── dataset/
│   │   ├── ai_hub_conversation_100K.yaml
│   ├── inference/
│   │   ├── translation.yaml
│   ├── trainer/
│   │   ├── default.yaml
│── data/
│   ├── 대화체.xlsx             # Dataset file

⚡ Installation

Clone the repository:

git clone https://github.com/minseoc03/ko-en-translation.git
cd ko-en-translation

Install the required dependencies:

pip install -r requirements.txt

🚀 Usage

🔄 Translation

To translate Korean text to English, run main.py:

python main.py

⚙️ Configuration with Hydra

This project uses Hydra for flexible and easy configuration management. You can modify configuration parameters either by editing the YAML files or by overriding them directly from the command line.

🔹 Method 1: Edit the configuration file

Modify the src_text parameter in:

conf/inference/translation.yaml

🔹 Method 2: Override from the command line

python main.py inference.translation.src_text="안녕하세요. 어떻게 지내세요?"

📦 Pre-trained Model

To use a pre-trained model, create a pretrained/ folder and place the following files inside:

Transformer.pt
Transformer_history.pt

📖 Examples

✅ Basic example

1️⃣ Open the translation configuration file:

nano conf/inference/translation.yaml

2️⃣ Change the src_text field:

# Original
src_text: "안녕하세요. 만나서 반갑습니다."

# Modified
src_text: "오늘 날씨가 정말 좋네요. 산책하러 갈까요?"

3️⃣ Run the translation:

python main.py

4️⃣ The output will show:

입력 : 오늘 날씨가 정말 좋네요. 산책하러 갈까요?
번역 : The weather is really nice today. Shall we go for a walk?

🔹 Command line override examples

Translate a simple greeting:

python main.py inference.translation.src_text="안녕하세요. 반갑습니다."

Translate a longer sentence:

python main.py inference.translation.src_text="저는 한국어를 공부하고 있습니다. 이 번역기가 도움이 될 것 같아요."

Change model parameters:

python main.py inference.translation.src_text="안녕하세요" model.transformer.n_layers=6 trainer.default.epoch=100

🏗️ Model Details

📜 License

This project is licensed under the MIT License.

🙌 Acknowledgements

About

This repository contains a Transformer-based model for Korean to English translation. The model was trained on conversational data to provide natural translations between Korean and English languages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages