Extracting job responsibilities from job ads #12133

Salekeen · 2023-01-19T15:30:04Z

Salekeen
Jan 19, 2023

Hello, for my job, I have to extract job responsibilities from job ads. I'm thinking of approaching it as a span extraction problem. Where I'm gonna label the job responsibility span manually for around 1000 samples. And use supervised learning. Is there any better way to approach this problem? Is there any pretrained model I can use to fine tune?
Any suggestion will be appreciated. Thanks!

Answered by danieldk

Jan 23, 2023

Approaching this task a span extraction problem like as a sensible thing to do. It is hard to say ahead of time how many annotated examples you'd need to reach an acceptable precision/recall, since it depends on much on the variability in the job responsibility descriptions and whether they are embedded in predictable contexts.

In general, we'd recommend you to iterate on your data and to set up things to make it easy to do so. E.g. using Prodigy can speed up annotation by pre-annotating examples using a model trained with the annotations you made so far, which also gives you an idea of how well a model does up to that point.

With ~1000 examples, it's at least possible to make a reasonabl…

View full answer

danieldk · 2023-01-23T13:36:10Z

danieldk
Jan 23, 2023

Approaching this task a span extraction problem like as a sensible thing to do. It is hard to say ahead of time how many annotated examples you'd need to reach an acceptable precision/recall, since it depends on much on the variability in the job responsibility descriptions and whether they are embedded in predictable contexts.

In general, we'd recommend you to iterate on your data and to set up things to make it easy to do so. E.g. using Prodigy can speed up annotation by pre-annotating examples using a model trained with the annotations you made so far, which also gives you an idea of how well a model does up to that point.

With ~1000 examples, it's at least possible to make a reasonably-sized 80/20 train/dev split to gauge the precision/recall of the model. We would also recommend using spacy debug data once you have annotations. Using the Span and Boundary Distinctiveness measures that from spacy debug data, you can gauge how consistent the tokens in the annotated spans are.

I am not aware of models that are pretrained for this task specifically, but if you happen to have a large corpus of in-domain text, you could use spacy pretrain to train vectors tailored for the task, which may give an improvement over the standard word vectors that we ship for a particular language. If inference speed is not an issue, you could also experiment with transformer models (you can use most encoders from Huggingface Hub through spacy-transformers).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Extracting job responsibilities from job ads #12133

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Extracting job responsibilities from job ads #12133

Uh oh!

Salekeen Jan 19, 2023

Replies: 1 comment

Uh oh!

danieldk Jan 23, 2023

Salekeen
Jan 19, 2023

danieldk
Jan 23, 2023