Extracting job responsibilities from job ads #12133
-
Hello, for my job, I have to extract job responsibilities from job ads. I'm thinking of approaching it as a span extraction problem. Where I'm gonna label the job responsibility span manually for around 1000 samples. And use supervised learning. Is there any better way to approach this problem? Is there any pretrained model I can use to fine tune? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Approaching this task a span extraction problem like as a sensible thing to do. It is hard to say ahead of time how many annotated examples you'd need to reach an acceptable precision/recall, since it depends on much on the variability in the job responsibility descriptions and whether they are embedded in predictable contexts. In general, we'd recommend you to iterate on your data and to set up things to make it easy to do so. E.g. using Prodigy can speed up annotation by pre-annotating examples using a model trained with the annotations you made so far, which also gives you an idea of how well a model does up to that point. With ~1000 examples, it's at least possible to make a reasonably-sized 80/20 train/dev split to gauge the precision/recall of the model. We would also recommend using I am not aware of models that are pretrained for this task specifically, but if you happen to have a large corpus of in-domain text, you could use |
Beta Was this translation helpful? Give feedback.
Approaching this task a span extraction problem like as a sensible thing to do. It is hard to say ahead of time how many annotated examples you'd need to reach an acceptable precision/recall, since it depends on much on the variability in the job responsibility descriptions and whether they are embedded in predictable contexts.
In general, we'd recommend you to iterate on your data and to set up things to make it easy to do so. E.g. using Prodigy can speed up annotation by pre-annotating examples using a model trained with the annotations you made so far, which also gives you an idea of how well a model does up to that point.
With ~1000 examples, it's at least possible to make a reasonabl…