spaCy doc creation and entity recognition #11719
-
i have created a pandas dataframe, i'm currently facing a problem on how to extract specific strings from the dataframe. given that the dataframe below is my first row with index "0", from the pandas dataframe, how do i convert the pandas dataframe to spaCy document so that i can extract "PERSON" entities for each column from the dataframe.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @Frazer-Nyambe , You need to iterate over your pandas columns first, maybe by using df.iterrows or df.items. Also, if you already have the raw text, you don't have to convert it into a DataFrame. You can work with the texts as a string right away. To convert it into a spaCy Doc. you can pass it to a pipeline. You also need to provide a model (e.g. import spacy
nlp = spacy.load("en_core_web_lg")
for text in texts:
doc = nlp.pipe(text) I highly-recommend looking at the getting started and API documentation for more info. |
Beta Was this translation helpful? Give feedback.
Hi @Frazer-Nyambe ,
You need to iterate over your pandas columns first, maybe by using df.iterrows or df.items. Also, if you already have the raw text, you don't have to convert it into a DataFrame. You can work with the texts as a string right away.
To convert it into a spaCy Doc. you can pass it to a pipeline. You also need to provide a model (e.g.
en_core_web_lg
) like so:I highly-recommend looking at the getting started and API documentation for more info.