Script for pulling each line in a set of PDFs that contains a matching string from a list of specified strings. Initial use case was to enable being able to quickly pull out rows in tables that contain a match. See example/.
Note: These instructions are written for MacOS (>= Sierra) and assumes basic command line familiarity
Assuming you have homebrew installed, install pdfgrep:
brew install pdfgrepThis make take a minute or two.
- Download
search_pdfs.shand put it into the same directory as the PDFs you want to search. cdinto that directory and runchmod u+x search-pdfs.shto make it executable.- Create a text file in the directory called
search_strings.txt. It should contain all the strings you want to search, one string per line. See the examplesearch_strings.txt. - Run
./search-pdfs.sh. - Once it completes, there should be a new file called
results.txtin the directory containing all the matched lines. See the exampleresults.txt.