when we extract pdf to json, we save json file in backend then use a lib to render it as html, currently we save the full html text into document.json as the document's content.
I think we should only put file path in documents.json, then the frontend renders it from json file every time.