Text Recognition in Documents or Invoice Recepits?

Can we feed the Document images dataset instead of a small word dataset to this Architecture?

What is the max-sequence length that can be used?

can you please suggest me good text detection Architecture other than East Architecture?