extractor.py should ignore already extracted PDFs 


As of now, the extractor will run on[ all qualified PDFs](https://github.com/opencleveland/drocer-webapp/blob/9c71aabe74a0c157205b7d4383a8205431d00829/tools/extractor.py#L113), even ones that have already been extracted. 

As we incrementally add newly released files, re-extracting them again is a waste of time.