GitHub - admdemiraj/Text_Augmentation_via_Translation: A simple method to augment text by using NMT (neural machine translation).

# Text_Augmentation_via_Translation

A simple method to augment text by using NMT (neural machine translation).

!! Note!! If you have any issues using this library or you feel that it is lacking something don't hesitate to message me.

How to use (Linux Dist.):

1.Install using pip ---> pip install translation-augmentation

2.Import the library--->

from translation_augmentation.translation_augmentation import augment_data_simple

from translation_augmentation.translation_augmentation import augment_data

Example:

1.

from translation_augmentation.translation_augmentation import augment_data_simple

import numpy as np

s1 = "The food was awful"

s2 = "That was the best meal I have had in ages."

# the sentences we want to augment

X_train = np.array([s1, s2])

# how many times we want to augment each sentence

times = 2

X_train_new =augment_data_simple(X_train, times)

print("Original: ", X_train)

print("After augmentation: ", X_train_new)

>>Original:

['The food was awful'

'That was the best meal I have had in ages.']

>>After augmentation:

['The food was awful'

'The food was terrible'

'The food was horrible'

'That was the best meal I have had in ages.'

'This was the best meal I have had in years.'

'It was the best meal I had in a long time.']

2.

from translation_augmentation.translation_augmentation import augment_data import numpy as np

s1 = "The food was awful"

s2 = "That was the best meal I have had in ages."

# the sentences we want to augment

X_train = np.array([s1, s2])

# the class that is present in each sentense in binary(in this case we have sentiment for each sentece either positive,negative or neutral)

y_train = np.array([[0, 1, 0], [1, 0, 0]])

# how many times we want to augment each class (dictionary)

classes_x_times = {}

classes_x_times[0] = 1 # double the first class

classes_x_times[1] = 2 # triple the second class

classes_x_times[2] = 0 # leave the third class as it is

X_train_new, y_train_new = augment_data(X_train, y_train, classes_x_times)

print("Original: ", X_train, "n", y_train)

print("After augmentation: ", X_train_new, "n", y_train_new)

>>Original:

['The food was awful'

'That was the best meal I have had in ages.']

[[0 1 0]

[1 0 0]]

>>After augmentation:

['The food was awful'

'The food was terrible'

'The food was horrible'

'That was the best meal I have had in ages.'

'This was the best meal I have had in years.']

[[0, 1, 0],

[0, 1, 0],

[0, 1, 0],

[1, 0, 0],

[1, 0, 0]]

Method Documentations:

1.

### augment_data_simple ###

This is a method that does text augmentation by translating the given text in another language and then translating it back in the original one. The text will have changed a bit (it will be different) but hopefully similar with the original text. Given X_train we return the new X_train with the augmented data.

param text:

param times:

param strategy:

param src:

:return:(X_train augmented)

text --> (X_train original) the text we want to augment (and array of sentences)

times --> how many times we want to augment each given sentence (up to 3 because

we get good translations only between the languages: English,Spanish,German,French)

strategy --> single : translate from the original language to another and back to the original

e.g. EN to DE to EN

double : translate from the original language to 2 other languages and back to the original

e.g. EN to DE to SP to EN

src --> in which language the initial text is in.Possible options 'en','de','es','fr'

!!Note!!

A folder named 'Translation' is created in the current working directory and the translation is saved there in a file named 'translation_simple.p'. If you re-run this method the translation will be loaded from that file in order to save time. If you want to make a new translation each time simply delete the file 'translation_simple.p'.

2.

### augment_data ###

This is a method that does text augmentation by translating the given text in another language and then translating it back in the original one. The text will have changed a bit (it will be different) but hopefully similar with the original text.Given X_train and y_train we return the new X_train with the augmented data and the new y_train.

param src:

param text:

param all_classes:

param classes_x_times:

param strategy:

return: return_sentences (X_train augmented), return_all_classes(y_train)

text --> (X_train original) the text we want to augment (and array of sentences)

all_classes --> (y_train) the classes that are present in each sentence

classes_x_times --> dictionary containing the classes we want to augment and how many times (up to 3 because we get good translations only between the languages: English,Spanish,German,French)

strategy --> single : translate from the original language to another and back to the original

e.g. EN to DE to EN

double : translate from the original language to 2 other languages and back to the original

e.g. EN to DE to SP to EN

src --> in which language the initial text is in.Possible options 'en','de','es','fr'

!!Note!!

A folder named 'Translation' is created in the current working directory and the translation is saved there in a file named 'translation.p'. If you re-run this method the translation will be loaded from that file in order to save time. If you want to make a new translation each time simply delete the file 'translation.p'.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
__pycache__		__pycache__
build		build
sample		sample
translation_augmentation.egg-info		translation_augmentation.egg-info
translation_augmentation		translation_augmentation
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
runner		runner
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

param src:
param text:
param all_classes:
param classes_x_times:
param strategy:
return:	return_sentences (X_train augmented), return_all_classes(y_train)

License

admdemiraj/Text_Augmentation_via_Translation

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages