https://github.com/Team-Taz-FTRI-54/AI-ML-Projec https://www.npmjs.com/package/pdf-parse
1/ Parse through the PDF and extract all the text 2/ Store the text (locally / database (???) - depends on the size)
- Where to store the files?
- Should we calculate the number of words? / Number of tokens and decide based on the number of tokens?
3/ Figure out what amount of context to embed?
- Is it a summary of each chapter?
- What's the best way of embedding an entire PDF text content? The PDF could be from 5 pages to 300 pages.