Can we feed the Document images dataset instead of a small word dataset to this Architecture? What is the max-sequence length that can be used? can you please suggest me good text detection Architecture other than East Architecture?