🌏 Korean-English Translation Model

This repository contains a Transformer-based model for Korean to English translation. The model was trained on conversational data to provide natural translations between Korean and English languages.

📌 Overview

This project implements a Neural Machine Translation (NMT) system using the Transformer architecture. It was trained on a dataset of 100,000 conversational Korean-English sentence pairs from AI-HUB, a free community resource created to promote AI development in Korea.

✨ Features

✅ Transformer-based architecture for high-quality translations
✅ Trained on conversational data for natural translations
✅ Easy configuration through YAML files using Hydra
✅ Simple inference interface
✅ Achieved 🔥 29.9 BLEU Score

📂 Repository Structure

ko-en-translation/
│── main.py                   # Entry point
│── dataset.py                # Data preprocessing
│── transformer.py            # Transformer model implementation
│── trainer.py                # Training script
│── translation.py            # Inference script
│── requirements.txt          # Dependencies
│── LICENSE                   # License file
│── README.md                 # Documentation
│── conf/                     # Configuration files
│   ├── config.yaml
│   ├── model/
│   │   ├── transformer.yaml
│   ├── dataset/
│   │   ├── ai_hub_conversation_100K.yaml
│   ├── inference/
│   │   ├── translation.yaml
│   ├── trainer/
│   │   ├── default.yaml
│── data/
│   ├── 대화체.xlsx             # Dataset file

⚡ Installation

Clone the repository:

git clone https://github.com/minseoc03/ko-en-translation.git
cd ko-en-translation

Install the required dependencies:

pip install -r requirements.txt

🚀 Usage

🔄 Translation

To translate Korean text to English, run main.py:

python main.py

⚙️ Configuration with Hydra

This project uses Hydra for flexible and easy configuration management. You can modify configuration parameters either by editing the YAML files or by overriding them directly from the command line.

🔹 Method 1: Edit the configuration file

Modify the src_text parameter in:

conf/inference/translation.yaml

🔹 Method 2: Override from the command line

python main.py inference.translation.src_text="안녕하세요. 어떻게 지내세요?"

📦 Pre-trained Model

To use a pre-trained model, create a pretrained/ folder and place the following files inside:

Transformer.pt
Transformer_history.pt

📖 Examples

✅ Basic example

1️⃣ Open the translation configuration file:

nano conf/inference/translation.yaml

2️⃣ Change the src_text field:

# Original
src_text: "안녕하세요. 만나서 반갑습니다."

# Modified
src_text: "오늘 날씨가 정말 좋네요. 산책하러 갈까요?"

3️⃣ Run the translation:

python main.py

4️⃣ The output will show:

입력 : 오늘 날씨가 정말 좋네요. 산책하러 갈까요?
번역 : The weather is really nice today. Shall we go for a walk?

🔹 Command line override examples

Translate a simple greeting:

python main.py inference.translation.src_text="안녕하세요. 반갑습니다."

Translate a longer sentence:

python main.py inference.translation.src_text="저는 한국어를 공부하고 있습니다. 이 번역기가 도움이 될 것 같아요."

Change model parameters:

python main.py inference.translation.src_text="안녕하세요" model.transformer.n_layers=6 trainer.default.epoch=100

🏗️ Model Details

Architecture: Transformer
Training Data: 100,000 Korean-English conversational sentence pairs from AI-HUB
Tokenizer: HuggingFace Helsinki-NLP/opus-mt-ko-en

📜 License

This project is licensed under the MIT License.

🙌 Acknowledgements

Dataset : AI-HUB Korean-English Conversation Dataset
Tokenizer : HuggingFace Helsinki-NLP/opus-mt-ko-en

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌏 Korean-English Translation Model

📌 Overview

✨ Features

📂 Repository Structure

⚡ Installation

🚀 Usage

🔄 Translation

⚙️ Configuration with Hydra

🔹 Method 1: Edit the configuration file

🔹 Method 2: Override from the command line

📦 Pre-trained Model

📖 Examples

✅ Basic example

🔹 Command line override examples

🏗️ Model Details

📜 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
conf		conf
data		data
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
main.py		main.py
requirements.txt		requirements.txt
trainer.py		trainer.py
transformer.py		transformer.py
translation.py		translation.py

Folders and files

Latest commit

History

Repository files navigation

🌏 Korean-English Translation Model

📌 Overview

✨ Features

📂 Repository Structure

⚡ Installation

🚀 Usage

🔄 Translation

⚙️ Configuration with Hydra

🔹 Method 1: Edit the configuration file

🔹 Method 2: Override from the command line

📦 Pre-trained Model

📖 Examples

✅ Basic example

🔹 Command line override examples

🏗️ Model Details

📜 License

🙌 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages