The Better Resume Extractor using LLMs

Setup

Run these two commands on the terminal:

git clone https://github.com/HemanthVikash/llm-resume-extractor.git
conda env create -f environment.yml

The main code for the extractor is present in the llm-based-api.ipynb file.

Extract text from pdf with OCR

Package used: PyPDF2

Use the method extract_text_from_pdf from the ocr_extractor.py file to extract any pdf files that are stored in the data/ folder

Extract vital information using RAG

The need for using LLMs here are 2-fold:

Proper Formatting:

If you have ever used OCR to convert pdf to text, you are sure to run into problems of this kind.

For example, when translating
```
Built a prediction algorithm to classify hate speech on a few social media platforms using the results from the Twitter web survey that had been conducted.
```
We are sure to get something like this:
```
Built a predicPon algorithm to classify hate speech on a few social media plamorms using the results from the Twiner web survey that had been conducted.
```
This is mostly seen in most of the job boards through which candidates apply.

Using LLMs, we can regenerate the exact same words in the pdf by simply spell-checking the OCR-extracted text.
Structured Extraction: LLMs give us the freedom to extract anything from the text while giving us clues about the context.

For example, we can get rid of the following errors that commonly occur in resume parsers in job boards:
- Confusing the employer with the project name or vice versa
- Mistakenly adding two jobs in a single job field
We can reduce such mistakes by structuring the output response model and feeding additional context for each field.
Ask for unwritten information: (Additional Advantage)

For example, if the resume bullet points are as follows:
```
* Designing and training custom ML/DL algorithms for a climber weight prediction system for a large corporate client. 
* Trained  an  ML  algorithm  with  97%  test  accuracy  using  Keras  and  Scipy  for  integration  into  an  embedded system, using physics-based feature extraction
```
We can have a response model that would support us getting information like:
- Summary of the experience
- Technical Skills used in the experience
These valuable information, while not written by the candidates themselves, can be generated using our LLMs

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
settings.yml		settings.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Better Resume Extractor using LLMs

Setup

Extract text from pdf with OCR

Extract vital information using RAG

About

Uh oh!

Releases

Packages

Languages

HemanthVikash/llm-resume-extractor

Folders and files

Latest commit

History

Repository files navigation

The Better Resume Extractor using LLMs

Setup

Extract text from pdf with OCR

Extract vital information using RAG

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages