You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"``sentiment`` :- This is modeled as single sentence classification task to determine where a piece of text conveys a positive or negative sentiment.\n",
14
+
"\n",
15
+
"**Conversational Utility** :- To determine whether a review is positive or negative.\n",
16
+
"\n",
17
+
"**Data** :- In this example, we are using the <a href=\"https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data\">IMDB</a> data which can be downloaded after accepting the terms and saved under `imdb_data` directory. The data is having total 50k samples labeled as positive or negative.\n"
"The data file `imdb_dataset` is having 50k samples with two columns - review and sentiment. Sentiment is the label which can be positive or negative.\n",
44
+
"We already provide a sample transformation function ``imdb_sentiment_data_to_tsv`` to convert this data to required tsv format.\n",
45
+
"Running data transformations will save the required train and test tsv data files under ``data`` directory in root of library. For more details on the data transformation process, refer to <a href=\"https://multi-task-nlp.readthedocs.io/en/latest/data_transformations.html\">data transformations</a> in documentation.\n",
46
+
"\n",
47
+
"The transformation file should have the following details which is already created ``transform_file_imdb.yml``.\n",
48
+
"\n",
49
+
"```\n",
50
+
"transform1:\n",
51
+
" transform_func: imdb_sentiment_data_to_tsv\n",
52
+
" read_file_names:\n",
53
+
" - imdb_sentiment_data.csv\n",
54
+
" read_dir: imdb_data\n",
55
+
" save_dir: ../../data\n",
56
+
"```"
57
+
]
58
+
},
59
+
{
60
+
"cell_type": "code",
61
+
"execution_count": null,
62
+
"metadata": {},
63
+
"outputs": [],
64
+
"source": [
65
+
"!python ../../data_transformations.py \\\n",
66
+
" --transform_file 'transform_file_imdb.yml'"
67
+
]
68
+
},
69
+
{
70
+
"cell_type": "markdown",
71
+
"metadata": {},
72
+
"source": [
73
+
"# Step -2 Data Preparation\n",
74
+
"\n",
75
+
"For more details on the data preparation process, refer to <a href=\"https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-data-preparation\">data preparation</a> in documentation.\n",
76
+
"\n",
77
+
"Defining tasks file for training single model for sentiment task. The file is already created at ``tasks_file_imdb.yml``\n",
78
+
"\n",
79
+
"```\n",
80
+
"sentiment:\n",
81
+
" model_type: BERT\n",
82
+
" config_name: bert-base-uncased\n",
83
+
" dropout_prob: 0.2\n",
84
+
" label_map_or_file:\n",
85
+
" - negative\n",
86
+
" - positive\n",
87
+
" class_num: 2\n",
88
+
" metrics:\n",
89
+
" - classification_accuracy\n",
90
+
" loss_type: CrossEntropyLoss\n",
91
+
" task_type: SingleSenClassification\n",
92
+
" file_names:\n",
93
+
" - imdb_sentiment_train.tsv\n",
94
+
" - imdb_sentiment_test.tsv\n",
95
+
"```"
96
+
]
97
+
},
98
+
{
99
+
"cell_type": "code",
100
+
"execution_count": null,
101
+
"metadata": {},
102
+
"outputs": [],
103
+
"source": [
104
+
"!python ../../data_preparation.py \\\n",
105
+
" --task_file 'tasks_file_imdb.yml' \\\n",
106
+
" --data_dir '../../data' \\\n",
107
+
" --max_seq_len 200"
108
+
]
109
+
},
110
+
{
111
+
"cell_type": "markdown",
112
+
"metadata": {},
113
+
"source": [
114
+
"# Step - 3 Running train\n",
115
+
"\n",
116
+
"Following command will start the training for the tasks. The log file reporting the loss, metrics and the tensorboard logs will be present in a time-stamped directory.\n",
117
+
"\n",
118
+
"For knowing more details about the train process, refer to <a href= \"https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-train\">running training</a> in documentation."
"You can import and use the ``inferPipeline`` to get predictions for the required tasks.\n",
148
+
"The trained model and maximum sequence length to be used needs to be specified.\n",
149
+
"\n",
150
+
"For knowing more details about infering, refer to <a href=\"https://multi-task-nlp.readthedocs.io/en/latest/infering.html\">infer pipeline</a> in documentation."
This function transforms the IMDb moview review data available at `IMDb <http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz>`_
457
+
This function transforms the IMDb moview review data available at `IMDb <https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data>`_ after accepting the terms.
458
458
459
-
For sentiment analysis task, postive sentiment has label -> 1 and negative -> 0.
460
-
First 25k samples are positive and next 25k samples are negative as combined by the script
461
-
``combine_imdb_data.sh``. Following transformed files are written at wrtDir
459
+
The data is having total 50k samples labeled as `positive` or `negative`. The reviews have some html tags which are cleaned
460
+
by this function. Following transformed files are written at wrtDir
461
+
462
462
463
463
- IMDb train transformed tsv file for sentiment analysis task
464
-
- IMDb dev transformed tsv file for sentiment analysis task
465
464
- IMDb test transformed tsv file for sentiment analysis task
466
465
467
466
For using this transform function, set ``transform_func`` : **imdb_sentiment_data_to_tsv** in transform file.
0 commit comments