This project extracts relevant information from resumes in PDF format, including skills, experience, education, and more. The extracted data is output as CSV and JSON files.
- Extracts technical and non-technical skills.
- Categorizes internships and jobs into technical and non-technical roles.
- Extracts educational details such as degree, course, CGPA, HSC, and SSC.
- Supports OCR fallback for non-standard PDFs.
- Python 3.x
-
Clone the repository:
git clone https://github.com/AdiD-code/ResumeParser.git cd resume-info-extraction -
Install the dependencies
The requirements.txt file includes the necessary Python libraries for the project. To install them, run: Install Python libraries using pip:
pip install -r requirements.txt- Install Poppler
For Windows: Download the binaries from Poppler for Windows and add the path to poppler/bin to your system's PATH environment variable.
Set environment variables for Poppler (Windows only): Add the path to the poppler/bin directory to your system's PATH environment variable. This allows the program to find the Poppler tools.
For macOS: Use homebrew:
brew install popplerFor Linux: Install using your package manager:
sudo apt-get install poppler-utils- Install Tesseract OCR (for OCR capabilities)
For Windows: Download and install from Tesseract at UB Mannheim.
Set environment variables for Tesseract-OCR (Windows only): Add the path to the installed Tesseract application to your system's PATH environment variable. This allows the program to find the Tesseract-OCR.
For macOS: Use Homebrew:
brew install tesseractFor Linux: Install using your package manager:
sudo apt-get install tesseract-ocrMake sure to install additional system dependencies like Poppler and Tesseract as described above.
The extracted data will be saved in both CSV and JSON formats.
The output includes: Technical and non-technical skills. Details about internships and jobs, categorized by technical and non-technical roles. Educational details such as degree, course, CGPA, HSC, and SSC. Feel free to customize the file paths and commands as needed for your environment.