When scanning the phone screen that contains multiple lines of text, this model gets very weird result:    While in the original EAST model, 