diff --git a/data_samples/imdb/sample.imdb b/data_samples/imdb/sample.imdb
new file mode 100644
index 000000000..74d17a873
--- /dev/null
+++ b/data_samples/imdb/sample.imdb
@@ -0,0 +1,6 @@
+"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.
The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.
It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.
I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",negative
+"A wonderful little production.
The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece.
The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life.
The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
+"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.
This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.
This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
+"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.
This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.
OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.
3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",positive
+"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter.
This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.
The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.
The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.
We wish Mr. Mattei good luck and await anxiously for his next work.",positive
+"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring. It just never gets old, despite my having seen it some 15 or more times in the last 25 years. Paul Lukas' performance brings tears to my eyes, and Bette Davis, in one of her very few truly sympathetic roles, is a delight. The kids are, as grandma says, more like ""dressed-up midgets"" than children, but that only makes them more fun to watch. And the mother's slow awakening to what's happening in the world and under her own roof is believable and startling. If I had a dozen thumbs, they'd all be ""up"" for this movie.",positive
diff --git a/examples/classification_example/README.md b/examples/classification_example/README.md
new file mode 100644
index 000000000..53e22a723
--- /dev/null
+++ b/examples/classification_example/README.md
@@ -0,0 +1,28 @@
+# Sentence Sentiment Classifier
+
+This is a Sentence Sentiment Classifier as an example of Classification Task
+
+The example contains two classifier, Conv Classifier and Bert Classifier
+
+The example shows:
+ * Training and predicting pipeline using Forte
+ * How to write a reader for [IMDB dataset](https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
+ * How to set configuration in Train Preprocessor
+ * How to switch Classifier from CNN to Bert
+ * How to add data augmentation
+
+ # Usage
+Use the following to train the network:
+```
+python main_train.py
+```
+Use the following to train the network:
+```
+python main_predict.py
+```
+
+If you want to switch model from CNN to Bert, set the "model" field in [config_model.yml](./config_model.yml)
+
+You can also define your own classifier or network in another python file like [cnn.py](./cnn.py) here
+
+Define you utility function in [util.py](./util.py)
\ No newline at end of file
diff --git a/examples/classification_example/classification_data/dev/sample.imdb b/examples/classification_example/classification_data/dev/sample.imdb
new file mode 100644
index 000000000..74d17a873
--- /dev/null
+++ b/examples/classification_example/classification_data/dev/sample.imdb
@@ -0,0 +1,6 @@
+"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.
The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.
It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.
I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",negative
+"A wonderful little production.
The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece.
The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life.
The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
+"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.
This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.
This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
+"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.
This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.
OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.
3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",positive
+"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter.
This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.
The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.
The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.
We wish Mr. Mattei good luck and await anxiously for his next work.",positive
+"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring. It just never gets old, despite my having seen it some 15 or more times in the last 25 years. Paul Lukas' performance brings tears to my eyes, and Bette Davis, in one of her very few truly sympathetic roles, is a delight. The kids are, as grandma says, more like ""dressed-up midgets"" than children, but that only makes them more fun to watch. And the mother's slow awakening to what's happening in the world and under her own roof is believable and startling. If I had a dozen thumbs, they'd all be ""up"" for this movie.",positive
diff --git a/examples/classification_example/classification_data/test/sample.imdb b/examples/classification_example/classification_data/test/sample.imdb
new file mode 100644
index 000000000..74d17a873
--- /dev/null
+++ b/examples/classification_example/classification_data/test/sample.imdb
@@ -0,0 +1,6 @@
+"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.
The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.
It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.
I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",negative
+"A wonderful little production.
The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece.
The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life.
The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
+"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.
This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.
This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
+"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.
This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.
OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.
3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",positive
+"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter.
This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.
The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.
The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.
We wish Mr. Mattei good luck and await anxiously for his next work.",positive
+"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring. It just never gets old, despite my having seen it some 15 or more times in the last 25 years. Paul Lukas' performance brings tears to my eyes, and Bette Davis, in one of her very few truly sympathetic roles, is a delight. The kids are, as grandma says, more like ""dressed-up midgets"" than children, but that only makes them more fun to watch. And the mother's slow awakening to what's happening in the world and under her own roof is believable and startling. If I had a dozen thumbs, they'd all be ""up"" for this movie.",positive
diff --git a/examples/classification_example/classification_data/train/sample.imdb b/examples/classification_example/classification_data/train/sample.imdb
new file mode 100644
index 000000000..74d17a873
--- /dev/null
+++ b/examples/classification_example/classification_data/train/sample.imdb
@@ -0,0 +1,6 @@
+"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.
The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.
It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.
I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",negative
+"A wonderful little production.
The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece.
The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life.
The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
+"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.
This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.
This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
+"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.
This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.
OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.
3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",positive
+"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter.
This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.
The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.
The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.
We wish Mr. Mattei good luck and await anxiously for his next work.",positive
+"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring. It just never gets old, despite my having seen it some 15 or more times in the last 25 years. Paul Lukas' performance brings tears to my eyes, and Bette Davis, in one of her very few truly sympathetic roles, is a delight. The kids are, as grandma says, more like ""dressed-up midgets"" than children, but that only makes them more fun to watch. And the mother's slow awakening to what's happening in the world and under her own roof is believable and startling. If I had a dozen thumbs, they'd all be ""up"" for this movie.",positive
diff --git a/examples/classification_example/cnn.py b/examples/classification_example/cnn.py
new file mode 100644
index 000000000..a359ce7ec
--- /dev/null
+++ b/examples/classification_example/cnn.py
@@ -0,0 +1,42 @@
+# Copyright 2020 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from texar.torch.modules.embedders import WordEmbedder
+from texar.torch.modules.classifiers.conv_classifiers import Conv1DClassifier
+from torch import nn
+from texar.torch.data import Batch
+from examples.classification_example.util import pad_each_bach
+
+
+class CNN_Classifier(nn.Module):
+ def __init__(self, in_channels, word_embedding_table):
+ super().__init__()
+ self.embedder = WordEmbedder(init_value=word_embedding_table)
+
+ self.classifier = \
+ Conv1DClassifier(in_channels=in_channels,
+ in_features=word_embedding_table.size()[1])
+
+ self.max_sen_len = in_channels
+
+ def forward(self, batch: Batch):
+ word = batch["text_tag"]["data"]
+
+ word_pad = pad_each_bach(word, self.max_sen_len)
+
+ word_pad_embed = self.embedder(word_pad)
+
+ logits, pred = self.classifier(word_pad_embed)
+
+ return logits, pred
diff --git a/examples/classification_example/config_data.yml b/examples/classification_example/config_data.yml
new file mode 100644
index 000000000..0c046d427
--- /dev/null
+++ b/examples/classification_example/config_data.yml
@@ -0,0 +1,13 @@
+train_path: "classification_data/train/"
+val_path: "classification_data/dev/"
+test_path: "classification_data/test/"
+train_state_path: "train_state.pkl"
+
+num_epochs: 3
+batch_size_tokens: 2
+test_batch_size: 2
+
+max_char_length: 45
+num_char_pad: 2
+
+data_aug: False
diff --git a/examples/classification_example/config_model.yml b/examples/classification_example/config_model.yml
new file mode 100644
index 000000000..93b2f8cc8
--- /dev/null
+++ b/examples/classification_example/config_model.yml
@@ -0,0 +1,20 @@
+word_emb:
+ dim: 100
+
+learning_rate: 0.01
+momentum: 0.9
+decay_interval: 1
+decay_rate: 0.05
+
+random_seed: 1234
+
+initializer:
+ "type": "xavier_uniform_"
+
+# path to save model
+model_path: "best_classification_model.ckpt"
+
+# path to save resources
+resource_dir: "resources/"
+
+model: "cnn"
diff --git a/examples/classification_example/config_predict.yml b/examples/classification_example/config_predict.yml
new file mode 100644
index 000000000..75831d2e8
--- /dev/null
+++ b/examples/classification_example/config_predict.yml
@@ -0,0 +1,4 @@
+test_path: "classification_data/test/"
+model_path: "best_classification_model.ckpt"
+train_state_path: "train_state.pkl"
+batch_size: 2
\ No newline at end of file
diff --git a/examples/classification_example/main_predict.py b/examples/classification_example/main_predict.py
new file mode 100644
index 000000000..42e23440f
--- /dev/null
+++ b/examples/classification_example/main_predict.py
@@ -0,0 +1,86 @@
+# Copyright 2020 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This file predict the sentiment label for IMDB dataset."""
+
+import yaml
+import torch
+from forte.pipeline import Pipeline
+from forte.predictor import Predictor
+from ft.onto.base_ontology import Sentence
+from forte.data.readers.imdb_reader import IMDBReader
+from examples.classification_example.util import pad_each_bach
+
+
+def predict_forward_fn(model, batch):
+ '''Use model and batch data to predict label.'''
+ word = batch["text_tag"]["data"]
+ logits, pred = None, None
+ if config_model["model"] == "cnn":
+ logits, pred = model(batch)
+
+ if config_model["model"] == "bert":
+ mask = batch["text_tag"]["masks"][0]
+ logits, pred = model(pad_each_bach(word, 500),
+ torch.sum(mask, dim=1))
+ pred = pred.numpy()
+ return {"label_tag": pred}
+
+
+config_model = yaml.safe_load(open("config_model.yml", "r"))
+config_predict = yaml.safe_load(open("config_predict.yml", "r"))
+saved_model = torch.load(config_predict['model_path'])
+train_state = torch.load(config_predict['train_state_path'])
+
+reader = IMDBReader()
+predictor = Predictor(batch_size=config_predict['batch_size'],
+ model=saved_model,
+ predict_forward_fn=predict_forward_fn,
+ feature_resource=train_state['feature_resource'])
+
+pl = Pipeline()
+pl.set_reader(reader)
+pl.add(predictor)
+pl.initialize()
+
+predict_sentiment_list = []
+for pack in pl.process_dataset(config_predict['test_path']):
+ print("---- pack ----")
+ for instance in pack.get(Sentence):
+ sentence = instance.text
+ predicts = []
+ for entry in pack.get(Sentence, instance):
+ predicts.append(entry.speaker)
+ predict_sentiment_list.append(entry.speaker)
+ print('---- example -----')
+ print("sentence: ", sentence)
+ print("predict sentiment: ", predicts)
+
+# evaluate on the test set
+gold_sentiment_list = []
+with open(config_predict['test_path'] + "sample.imdb", "r", encoding="utf8") as f:
+ for line in f:
+ line = line.strip()
+ if line != "":
+ line_list = line.split("\",")
+ gold_sentiment = line_list[1]
+ gold_sentiment_list.append(gold_sentiment)
+
+print("gold_sentiment_list: ", gold_sentiment_list)
+print("predict_sentiment_list: ", predict_sentiment_list)
+right_predict = 0
+for i in range(len(gold_sentiment_list)):
+ if gold_sentiment_list[i] == predict_sentiment_list[i]:
+ right_predict += 1
+
+print("Testing Accuracy: ", right_predict / len(predict_sentiment_list))
diff --git a/examples/classification_example/main_train.py b/examples/classification_example/main_train.py
new file mode 100644
index 000000000..3adfa36a6
--- /dev/null
+++ b/examples/classification_example/main_train.py
@@ -0,0 +1,271 @@
+# Copyright 2020 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+import numpy as np
+import torch
+import yaml
+
+from torch import nn
+from torch.optim import SGD
+from torch.optim.optimizer import Optimizer
+from texar.torch.data import Batch
+from tqdm import tqdm
+from typing import Iterator, Dict, Any
+from forte.common.configuration import Config
+from forte.data.extractor.attribute_extractor \
+ import AttributeExtractor
+from forte.data.extractor.base_extractor \
+ import BaseExtractor
+from forte.train_preprocessor import TrainPreprocessor
+from forte.data.readers.imdb_reader import IMDBReader
+from forte.pipeline import Pipeline
+from ft.onto.base_ontology import Sentence, Token, Entry
+from texar.torch.modules.embedders import WordEmbedder
+from examples.classification_example.cnn import CNN_Classifier
+from forte.processors.base.data_augment_processor import ReplacementDataAugmentProcessor
+from texar.torch.modules.classifiers.bert_classifier import BERTClassifier
+from examples.classification_example.util import pad_each_bach
+
+logger = logging.getLogger(__name__)
+
+logging.basicConfig(level=logging.INFO)
+
+device = torch.device("cuda") if torch.cuda.is_available() \
+ else torch.device("cpu")
+
+
+def construct_word_embedding_table(embed_dict, extractor: BaseExtractor):
+ embedding_dim = list(embed_dict.values())[0].shape[-1]
+
+ scale = np.sqrt(3.0 / embedding_dim)
+ table = np.empty(
+ [extractor.size(), embedding_dim], dtype=np.float32
+ )
+ oov = 0
+ for word, index in extractor.items():
+ if word in embed_dict:
+ embedding = embed_dict[word]
+ elif word.lower() in embed_dict:
+ embedding = embed_dict[word.lower()]
+ else:
+ embedding = np.random.uniform(
+ -scale, scale, [1, embedding_dim]
+ ).astype(np.float32)
+ oov += 1
+ table[index, :] = embedding
+ return torch.from_numpy(table)
+
+
+def create_model(text_extractor: AttributeExtractor,
+ config: Config, in_channels: int):
+ embedding_dict = {}
+ for word, index in text_extractor.items():
+ embedding_dict[word] = torch.tensor([0.0 for i in range(100)])
+
+ word_embedding_table = \
+ construct_word_embedding_table(embedding_dict, text_extractor)
+
+ model: nn.Module = \
+ CNN_Classifier(in_channels=in_channels,
+ word_embedding_table=word_embedding_table)
+
+ if config.config_model.model == "bert":
+ model: nn.Module = BERTClassifier()
+
+ return model, word_embedding_table
+
+
+def train(model: nn.Module, optim: Optimizer, batch: Batch, max_sen_length: int):
+ word = batch["text_tag"]["data"]
+ labels = batch["label_tag"]["data"]
+ optim.zero_grad()
+
+ logits, pred = None, None
+
+ if config.config_model.model == "cnn":
+ logits, pred = model(batch)
+
+ if config.config_model.model == "bert":
+ mask = batch["text_tag"]["masks"][0]
+ logits, pred = model(pad_each_bach(word, max_sen_length),
+ torch.sum(mask, dim=1))
+
+ labels_1D = torch.squeeze(labels)
+ true_one_batch = (labels_1D == pred).sum().item()
+ loss = criterion(logits, labels_1D)
+
+ loss.backward()
+ optim.step()
+
+ batch_train_err = loss.item() * batch.batch_size
+
+ return batch_train_err, true_one_batch
+
+
+# All the configs
+config_data = yaml.safe_load(open("config_data.yml", "r"))
+config_model = yaml.safe_load(open("config_model.yml", "r"))
+
+config = Config({}, default_hparams=None)
+config.add_hparam('config_data', config_data)
+config.add_hparam('config_model', config_model)
+
+# Generate request
+text_extractor: AttributeExtractor = \
+ AttributeExtractor(config={"entry_type": Token,
+ "vocab_method": "indexing",
+ "attribute": "text"})
+
+
+class SentimentExtractor(AttributeExtractor):
+ def get_attribute(self, entry: Entry, attr: str):
+ if entry.sentiment["positive"] == 1.0:
+ return "positive"
+ else:
+ return "negative"
+
+ def set_attribute(self, entry: Entry, attr: str, value: Any):
+ if value == "positive":
+ entry.sentiment = {
+ "positive": 1.0,
+ "negative": 0.0,
+ }
+ else:
+ entry.sentiment = {
+ "positive": 0.0,
+ "negative": 1.0,
+ }
+
+
+label_extractor: AttributeExtractor = \
+ SentimentExtractor(config={"entry_type": Sentence,
+ "vocab_method": "indexing",
+ "need_pad": False,
+ "vocab_use_unk": False,
+ "attribute": "sentiment"})
+tp_request: Dict = {
+ "scope": Sentence,
+ "schemes": {
+ "text_tag": {
+ "type": TrainPreprocessor.DATA_INPUT,
+ "extractor": text_extractor
+ },
+ "label_tag": {
+ "type": TrainPreprocessor.DATA_OUTPUT,
+ "extractor": label_extractor
+ }
+ }
+}
+
+
+# Default settings can be found here:
+# https://texar-pytorch.readthedocs.io/en/latest/code/data.html#texar.torch.data.DatasetBase.default_hparams
+tp_config = {
+ "preprocess": {
+ "device": device.type
+ },
+ "dataset": {
+ "batch_size": config.config_data.batch_size_tokens
+ }
+}
+
+processor_config = {
+ 'augment_entry': "ft.onto.base_ontology.Token",
+ 'other_entry_policy': {
+ "kwargs": {
+ "ft.onto.base_ontology.Sentence": "auto_align"
+ }
+ },
+ 'type': 'data_augmentation_op',
+ 'data_aug_op': 'tests.forte.processors.base.data_augment_replacement_processor_test.TmpReplacer',
+ "data_aug_op_config": {
+ 'kwargs': {}
+ },
+ 'augment_pack_names': {
+ 'kwargs': {}
+ }
+}
+
+imdb_train_reader = IMDBReader()
+
+pl = Pipeline()
+pl.set_reader(imdb_train_reader)
+if config.config_data.data_aug:
+ pl.add(ReplacementDataAugmentProcessor(), processor_config)
+pl.initialize()
+
+datapack_generator = pl.process_dataset(config.config_data.train_path)
+
+
+train_preprocessor = TrainPreprocessor(pack_generator=datapack_generator,
+ request=tp_request,
+ config=tp_config)
+
+
+max_sen_length = 0
+train_batch_iter: Iterator[Batch] = \
+ train_preprocessor.get_train_batch_iterator()
+
+# Fing the max sentence length in the whole dataset
+for batch in tqdm(train_batch_iter):
+ max_sen_length = max(batch["text_tag"]["data"].size()[1],
+ max_sen_length)
+
+model, word_embedding_table = \
+ create_model(text_extractor=text_extractor,
+ config=config, in_channels=max_sen_length)
+
+word_embedder = WordEmbedder(init_value=word_embedding_table)
+
+model.to(device)
+
+criterion = nn.CrossEntropyLoss()
+
+optim: Optimizer = SGD(model.parameters(),
+ lr=config.config_model.learning_rate,
+ momentum=config.config_model.momentum,
+ nesterov=True)
+
+epoch = 0
+train_err: float = 0.0
+train_total: float = 0.0
+train_sentence_len_sum: int = 0
+
+logger.info("Start training.")
+
+while epoch < config.config_data.num_epochs:
+ epoch += 1
+
+ train_batch_iter: Iterator[Batch] = \
+ train_preprocessor.get_train_batch_iterator()
+
+ true_total = 0
+ train_total_One_Epoch = 0
+
+ for batch in tqdm(train_batch_iter):
+ batch_train_err, true_one_batch = train(model, optim, batch, max_sen_length)
+
+ train_err += batch_train_err
+ train_total += batch.batch_size
+ true_total += true_one_batch
+ train_total_One_Epoch += batch.batch_size
+
+ logger.info(f"{epoch}th Epoch training, "
+ f"total number of examples: {train_total}, "
+ f"Train Accuracy: {(true_total / train_total_One_Epoch):0.3f}, "
+ f"loss: {(train_err / train_total):0.3f}")
+
+# Save training result to disk
+# train_preprocessor.save_state(config.config_data.train_state_path)
+# torch.save(model, config.config_model.model_path)
diff --git a/examples/classification_example/util.py b/examples/classification_example/util.py
new file mode 100644
index 000000000..fad967f26
--- /dev/null
+++ b/examples/classification_example/util.py
@@ -0,0 +1,29 @@
+# Copyright 2020 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""A utility function for padding over dataser"""
+
+import torch
+
+
+def pad_each_bach(word, max_sen_len):
+ batch_size = word.shape[0]
+ curr_len = word.shape[1]
+ word_list = word.tolist()
+
+ # Line 0 in word_embedding_table is padding vec
+ for i in range(batch_size):
+ for j in range(max_sen_len - curr_len):
+ word_list[i].append(0)
+
+ return torch.LongTensor(word_list)
diff --git a/forte/data/readers/imdb_reader.py b/forte/data/readers/imdb_reader.py
new file mode 100644
index 000000000..769e2244f
--- /dev/null
+++ b/forte/data/readers/imdb_reader.py
@@ -0,0 +1,113 @@
+# Copyright 2019 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+The reader that reads IMDB data into data pack.
+Data Overview:
+https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
+Data Format:
+https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
+"""
+import logging
+import os
+from typing import Iterator
+from forte.data.data_utils_io import dataset_path_iterator
+from forte.data.data_pack import DataPack
+from forte.data.readers.base_reader import PackReader
+from ft.onto.base_ontology import Sentence, Token, Document
+
+
+__all__ = [
+ "IMDBReader"
+]
+
+
+class IMDBReader(PackReader):
+ r""":class:`IMDBReader` is designed to read
+ in the imdb review dataset used
+ by sentiment classification task.
+ The Original data format:
+ "movie comment, positive"
+ "movie comment, negative"
+ """
+
+ def _collect(self, *args, **kwargs) -> Iterator[str]:
+ r"""Iterator over text files in the data_source
+
+ Args:
+ args: args[0] is the directory to the .imdb files.
+ kwargs:
+
+ Returns: Iterator over files in the path with imdb extensions.
+ """
+
+ imdb_directory: str = args[0]
+
+ imdb_file_extension = "imdb"
+
+ logging.info(type(kwargs))
+
+ logging.info("Reading dataset from %s with extension %s",
+ imdb_directory, imdb_file_extension)
+ return dataset_path_iterator(imdb_directory, imdb_file_extension)
+
+ def _cache_key_function(self, imdb_file: str) -> str:
+ return os.path.basename(imdb_file)
+
+ def _parse_pack(self, file_path: str) -> Iterator[DataPack]:
+ pack: DataPack = DataPack()
+ text: str = ""
+ offset: int = 0
+
+ with open(file_path, "r", encoding="utf8") as f:
+ for line in f:
+ line = line.strip()
+ if line != "":
+ line_list = line.split("\",")
+ sentence = line_list[0].strip("\"")
+ sentiment = line_list[1]
+
+ # Add sentence.
+ senobj = Sentence(pack, offset + 1,
+ offset + len(sentence) + 1)
+ if sentiment == "positive":
+ senobj.sentiment["positive"] = 1.0
+ senobj.sentiment["negative"] = 0.0
+ else:
+ senobj.sentiment["positive"] = 0.0
+ senobj.sentiment["negative"] = 1.0
+
+ # Add token
+ wordoffset = offset + 1
+ words = sentence.split(" ")
+ for word in words:
+ lastch = word[len(word) - 1]
+ new_word = word
+ if lastch in (',', '.'):
+ new_word = word[:len(word) - 1]
+ Token(pack, wordoffset, wordoffset + len(new_word))
+ wordoffset += len(word)
+ # For space between words
+ wordoffset += 1
+
+ # For \n
+ offset += len(line) + 1
+ text += line + " "
+
+ pack.set_text(text, replace_func=self.text_replace_operation)
+
+ Document(pack, 0, len(text))
+
+ pack.pack_name = file_path
+
+ yield pack
diff --git a/tests/forte/data/readers/imdb_reader_test.py b/tests/forte/data/readers/imdb_reader_test.py
new file mode 100644
index 000000000..07f7faa6f
--- /dev/null
+++ b/tests/forte/data/readers/imdb_reader_test.py
@@ -0,0 +1,108 @@
+# Copyright 2019 The Forte Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Unit tests for IMDBReader.
+"""
+import os
+import unittest
+from typing import Iterator, Iterable, List
+from forte.data.readers.imdb_reader import IMDBReader
+from forte.data.data_pack import DataPack
+from forte.pipeline import Pipeline
+from ft.onto.base_ontology import Sentence, Token, Document
+
+
+class IMDBReaderTest(unittest.TestCase):
+
+ def setUp(self):
+ # Define and config the pipeline.
+ self.dataset_path: str = os.path.abspath(os.path.join(
+ os.path.dirname(os.path.realpath(__file__)),
+ *([os.path.pardir] * 4),
+ 'data_samples/imdb'))
+
+ self.pipeline: Pipeline = Pipeline[DataPack]()
+ self.reader: IMDBReader = IMDBReader()
+ self.pipeline.set_reader(self.reader)
+ self.pipeline.initialize()
+
+ def test_process_next(self):
+ data_packs: Iterable[DataPack] = \
+ self.pipeline.process_dataset(self.dataset_path)
+ file_paths: Iterator[str] = \
+ self.reader._collect(self.dataset_path)
+
+ count_packs: int = 0
+
+ # Each .imdb file is corresponding to an Iterable Obj
+ for pack, file_path in zip(data_packs, file_paths):
+
+ count_packs += 1
+ expected_doc: str = ""
+
+ # Read all lines in .imdb file
+ with open(file_path, "r", encoding="utf8", errors='ignore') as file:
+ expected_doc = file.read()
+
+ # Test document.
+ actual_docs: List[Document] = list(pack.get(Document))
+ self.assertEqual(len(actual_docs), 1)
+
+ lines: List[str] = expected_doc.split('\n')
+ comment_lines = []
+ sentiment_labels = []
+ wordlist = []
+ for line in lines:
+ # For empty or invalid line
+ if len(line) < 5:
+ continue
+ comment = line.split("\",")[0].strip("\"")
+ sentiment_label = line.split("\",")[1]
+ comment_lines.append(comment)
+ sentiment_labels.append(sentiment_label)
+
+ tempwordlist = comment.split(" ")
+ for w in tempwordlist:
+ wordlist.append(w)
+
+ actual_sentences: Iterator[Sentence] = pack.get(Sentence)
+ actual_word: Iterator[Token] = pack.get(Token)
+ # Force sorting as Link entries have no order when retrieving from
+ # data pack.
+ for line, label, actual_sentence in \
+ zip(comment_lines, sentiment_labels, actual_sentences):
+ line = line.strip()
+ label = label
+ comment = actual_sentence.text
+ # Test comment.
+ if actual_sentence.sentiment["positive"] == 1.0:
+ read_label = "positive"
+ else:
+ read_label = "negative"
+
+ self.assertEqual(comment, line)
+ self.assertEqual(read_label, label)
+
+ for word_read, word_in_pack in zip(wordlist, actual_word):
+ new_word_read = word_read
+ lastch = word_read[len(word_read) - 1]
+ if lastch == "," or lastch == '.':
+ new_word_read = word_read[:len(word_read) - 1]
+ self.assertEqual(new_word_read, word_in_pack.text)
+
+ self.assertEqual(count_packs, 1)
+
+
+if __name__ == '__main__':
+ unittest.main()