Framework appears relatively slow, even using Dean's smaller transformer model, when compared to his own framework. Consider looking into this and speeding up if it is confirmed