No noun_chunks for Portuguese? If not, how can I adapt Spanish's? #9532
-
Hi everyone! I'm new to spacy and I would like to use the noun_chunks functionality. Unfortunately I received the following error: Is the error correct in that this functionality doesn't exist for Portuguese yet? If so, how could I go about adapting the Spanish one? What I need is to feed sentences such as (except in Portuguese):
and run them through some function that will spit out something like this with added commas:
What is the best way to accomplish this? Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
If it's a useful starting point, you can copy the code from the Spanish About your example sentences though - those won't work. If you feed them to the English models you'll see that the noun chunks aren't right. Noun chunks isn't designed to deal with unpunctuated arbitrary sequences of words, it's designed to pull out noun phrases from normal sentences. So noun chunks may not be what you want for your problem at all. If we ignore noun chunks and focus on your segmentation problem, I'm not really sure how you'd model it. Maybe you could use the trainable SentenceRecognizer, but I'm not sure how reliably you could train a model with this kind of data. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for your reply! I'll look for an alternative. While I have you, please allow me to ask another newb question. Which spaCy (or other library's) feature would you recommend if I had, say, 20,000 titles between 90-200 characters in length that have been shortened down to 60 or fewer by humans and wanted to train a machine learning algorithm to do so automatically for new titles? Does spaCy have such ML capabilities? If not, are you aware of something that does? Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for all your help! I'll look into that as well. |
Beta Was this translation helpful? Give feedback.
-
I made this PR on request 🙂 #9559 I built Portuguese noun chunker onto Spanish noun chunker, this is the PR I made for Spanish: #9537 |
Beta Was this translation helpful? Give feedback.
If it's a useful starting point, you can copy the code from the Spanish
syntax_iterators.py
and test it on Portuguese If it works you're done, but you'll likely have to make some adjustments. What you should do is come up with example sentences and expected output and adjust the code until you get the output you want. It's hard to be more specific than that.About your example sentences though - those won't work. If you feed them to the English models you'll see that the noun chunks aren't right. Noun chunks isn't designed to deal with unpunctuated arbitrary sequences of words, it's designed to pull out noun phrases from normal sentences. So noun chunks may not be what you want for your p…