Document Classification CNN

Preprocessed and Extracted Mail IDs, Subjects, Content,and Labels from all ~19K text files. Extensively used Regex and NLTK for text preprocessing tasks.
Transformed Texts to Sequences using Keras and performed analysis on sequence lengths to provide appropriate Padding to make all sequences of same length.
Created Word Embedding Matrix of unique words in our data using Glove to be later used as weights in Neural Network.
Created Char Embedding Matrix of unique chars in our data using Glove to be later used as weights in Neural Network.
Performed analysis on Class Distribution and computed Class Weights to be later used in Neural Network.
Built a Word Embedded and Char Embedded Convolutional Neural Network (CNN) using a combination of Embedding, Conv1d, MaxPooling, Flatten, Dropout and Dense layers and implemented necessary callbacks(EarlyStopping, ModelCheckpoint and Tensorboard).
Achieved an Accuracy Score of 0.81 on Word CNN.

Provide feedback