Skip to content

Commit eeae2b9

Browse files
committed
feat: added initial files and readme file with: Added Object Storage namespace to the prerequisites and setup steps
Updated the configuration section to include the new Object Storage settings Added a clear Configuration Priority section explaining how values are resolved Made environment variables optional, clarifying that all values can be set in config.yaml Updated the config.yaml example to show both Language Translation and Object Storage sections Improved the formatting and organization of the configuration documentation Added more detailed explanations about where to find the namespace value Clarified that environment variables override config.yaml values
1 parent 84658e3 commit eeae2b9

File tree

6 files changed

+613
-0
lines changed

6 files changed

+613
-0
lines changed

oci-language-translation/README.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# OCI Language Translation Tools
2+
3+
## Introduction
4+
5+
This repository contains two powerful tools for leveraging OCI Language Translation services:
6+
7+
1. **Bulk Document Translation**: Automatically translate multiple documents stored in an OCI Object Storage bucket. This tool supports various document formats and maintains the original file structure in the target bucket.
8+
9+
2. **CSV/JSON Field Translation**: Selectively translate specific columns in CSV files or keys in JSON documents while preserving the original structure. This is particularly useful for localizing data files while maintaining their format and untranslated fields.
10+
11+
## Prerequisites
12+
13+
- Python 3.8 or higher
14+
- OCI Account with Language Translation service enabled
15+
- Required IAM Policies and Permissions
16+
- Object Storage buckets (for document translation)
17+
- OCI CLI configured with proper credentials
18+
19+
### OCI Setup Requirements
20+
21+
1. Create an OCI account if you don't have one
22+
2. Enable Language Translation service in your tenancy
23+
3. Set up OCI CLI and create API keys:
24+
```bash
25+
# Install OCI CLI
26+
bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"
27+
28+
# Configure OCI CLI (this will create ~/.oci/config)
29+
oci setup config
30+
```
31+
4. Set up appropriate IAM policies
32+
5. Create source and target buckets in Object Storage (for document translation)
33+
6. Note your Object Storage namespace (visible in the OCI Console under Object Storage)
34+
35+
## Getting Started
36+
37+
1. Clone this repository:
38+
```bash
39+
git clone <repository-url>
40+
cd oci-language-translation
41+
```
42+
43+
2. Install required dependencies:
44+
```bash
45+
pip install -r requirements.txt
46+
```
47+
48+
3. Configure the environment (optional - can be set in config.yaml instead):
49+
```bash
50+
# Optional - all these values can be set in config.yaml
51+
export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..your-compartment-id"
52+
export OCI_SOURCE_LANG="en"
53+
export OCI_TARGET_LANG="es"
54+
```
55+
56+
4. Update `config.yaml` with your translation and storage settings:
57+
```yaml
58+
# Language Translation Service Configuration
59+
language_translation:
60+
compartment_id: "ocid1.compartment.oc1..your-compartment-id"
61+
source_bucket: "source-bucket-name"
62+
target_bucket: "target-bucket-name"
63+
source_language: "en" # ISO language code
64+
target_language: "es" # ISO language code
65+
66+
# Object Storage Configuration
67+
object_storage:
68+
namespace: "your-namespace" # Your tenancy's Object Storage namespace
69+
bucket_name: "your-bucket-name" # Bucket for CSV/JSON translations
70+
```
71+
72+
5. For bulk document translation:
73+
```bash
74+
python bucket_translation.py
75+
```
76+
77+
6. For CSV/JSON translation:
78+
```bash
79+
# For CSV files (column numbers start from 1)
80+
python csv_json_translation.py csv input.csv output.csv 1 2 3
81+
82+
# For JSON files
83+
python csv_json_translation.py json input.json output.json key1 key2
84+
```
85+
86+
## Usage Examples
87+
88+
### Bulk Document Translation
89+
```bash
90+
# Translate all documents from source bucket to target bucket
91+
python bucket_translation.py
92+
```
93+
94+
### CSV Translation
95+
```bash
96+
# Translate columns 1, 3, and 5 from English to Spanish
97+
python csv_json_translation.py csv products.csv products_es.csv 1 3 5
98+
```
99+
100+
### JSON Translation
101+
```bash
102+
# Translate 'name' and 'details' fields in a JSON file
103+
python csv_json_translation.py json catalog.json catalog_es.json name details
104+
```
105+
106+
## Configuration
107+
108+
The project uses three types of configuration:
109+
110+
1. **OCI Configuration** (`~/.oci/config`):
111+
- Created automatically by `oci setup config`
112+
- Contains your OCI authentication details
113+
- Used for API authentication
114+
115+
2. **Translation Configuration** (`config.yaml`):
116+
```yaml
117+
# Language Translation Service Configuration
118+
language_translation:
119+
compartment_id: "ocid1.compartment.oc1..your-compartment-id"
120+
source_bucket: "source-bucket-name"
121+
target_bucket: "target-bucket-name"
122+
source_language: "en"
123+
target_language: "es"
124+
125+
# Object Storage Configuration
126+
object_storage:
127+
namespace: "your-namespace" # Your tenancy's Object Storage namespace
128+
bucket_name: "your-bucket-name" # Bucket for CSV/JSON translations
129+
```
130+
131+
3. **Environment Variables** (optional, override config.yaml):
132+
- `OCI_COMPARTMENT_ID`: Your OCI compartment OCID
133+
- `OCI_SOURCE_LANG`: Source language code
134+
- `OCI_TARGET_LANG`: Target language code
135+
136+
### Configuration Priority
137+
138+
The configuration values are loaded in the following priority order:
139+
1. Environment variables (if set)
140+
2. Values from config.yaml
141+
3. Default values (for language codes only: en -> es)
142+
143+
## Supported Languages
144+
145+
The service supports a wide range of languages. Common language codes include:
146+
- English (en)
147+
- Spanish (es)
148+
- French (fr)
149+
- German (de)
150+
- Italian (it)
151+
- Portuguese (pt)
152+
- Chinese Simplified (zh-CN)
153+
- Japanese (ja)
154+
155+
For a complete list of supported languages, refer to the OCI Documentation.
156+
157+
## Error Handling
158+
159+
Both tools include comprehensive error handling:
160+
- Configuration validation
161+
- Service availability checks
162+
- File format validation
163+
- Translation status monitoring
164+
165+
## Contributing
166+
167+
Contributions are welcome! Please feel free to submit a Pull Request.
168+
169+
## License
170+
171+
This project is licensed under the MIT License - see the LICENSE file for details.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
#!/usr/bin/env python3
2+
3+
import oci
4+
import yaml
5+
import sys
6+
import time
7+
from pathlib import Path
8+
9+
def load_config():
10+
"""Load configuration from config.yaml file"""
11+
with open("config.yaml", "r") as file:
12+
config = yaml.safe_load(file)
13+
return config
14+
15+
def init_clients(config):
16+
"""Initialize OCI clients"""
17+
# Initialize the AI Language client
18+
ai_client = oci.ai_language.AIServiceLanguageClient(
19+
oci.config.from_file(
20+
file_location="config.yaml",
21+
profile_name="DEFAULT"
22+
)
23+
)
24+
25+
# Initialize Object Storage client
26+
object_storage = oci.object_storage.ObjectStorageClient(
27+
oci.config.from_file(
28+
file_location="config.yaml",
29+
profile_name="DEFAULT"
30+
)
31+
)
32+
33+
return ai_client, object_storage
34+
35+
def list_objects_in_bucket(object_storage, namespace, bucket_name):
36+
"""List all objects in a bucket"""
37+
list_objects_response = object_storage.list_objects(
38+
namespace_name=namespace,
39+
bucket_name=bucket_name
40+
)
41+
return [obj.name for obj in list_objects_response.data.objects]
42+
43+
def translate_documents(ai_client, config):
44+
"""Translate all documents in the source bucket"""
45+
try:
46+
# Get configuration values
47+
compartment_id = config["language_translation"]["compartment_id"]
48+
source_bucket = config["language_translation"]["source_bucket"]
49+
target_bucket = config["language_translation"]["target_bucket"]
50+
source_language = config["language_translation"]["source_language"]
51+
target_language = config["language_translation"]["target_language"]
52+
53+
# Create batch document translation job
54+
create_batch_job_response = ai_client.create_batch_document_translation_job(
55+
create_batch_document_translation_job_details=oci.ai_language.models.CreateBatchDocumentTranslationJobDetails(
56+
compartment_id=compartment_id,
57+
display_name=f"Batch_Translation_{time.strftime('%Y%m%d_%H%M%S')}",
58+
source_language_code=source_language,
59+
target_language_code=target_language,
60+
input_location=oci.ai_language.models.ObjectStorageLocation(
61+
bucket_name=source_bucket,
62+
namespace_name=namespace
63+
),
64+
output_location=oci.ai_language.models.ObjectStorageLocation(
65+
bucket_name=target_bucket,
66+
namespace_name=namespace
67+
)
68+
)
69+
)
70+
71+
job_id = create_batch_job_response.data.id
72+
print(f"Translation job created with ID: {job_id}")
73+
74+
# Monitor job status
75+
while True:
76+
job_status = ai_client.get_batch_document_translation_job(
77+
batch_document_translation_job_id=job_id
78+
).data.lifecycle_state
79+
80+
print(f"Job status: {job_status}")
81+
if job_status in ["SUCCEEDED", "FAILED"]:
82+
break
83+
time.sleep(30)
84+
85+
return job_status == "SUCCEEDED"
86+
87+
except Exception as e:
88+
print(f"Error during translation: {str(e)}")
89+
return False
90+
91+
def main():
92+
try:
93+
# Load configuration
94+
config = load_config()
95+
96+
# Initialize clients
97+
ai_client, object_storage = init_clients(config)
98+
99+
# Start translation
100+
success = translate_documents(ai_client, config)
101+
102+
if success:
103+
print("Translation completed successfully!")
104+
else:
105+
print("Translation failed.")
106+
107+
except Exception as e:
108+
print(f"Error: {str(e)}")
109+
sys.exit(1)
110+
111+
if __name__ == "__main__":
112+
main()
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Language Translation Service Configuration
2+
language_translation:
3+
compartment_id: "ocid1.compartment.oc1..your-compartment-id"
4+
source_bucket: "source-bucket-name"
5+
target_bucket: "target-bucket-name"
6+
source_language: "en" # ISO language code
7+
target_language: "es" # ISO language code
8+
9+
# Object Storage Configuration
10+
object_storage:
11+
namespace: "your-namespace" # Your tenancy's Object Storage namespace
12+
bucket_name: "your-bucket-name" # Bucket for CSV/JSON translations

0 commit comments

Comments
 (0)