No noun_chunks for Portuguese? If not, how can I adapt Spanish's? #9532

PythonMillionaire · 2021-10-23T09:27:29Z

PythonMillionaire
Oct 23, 2021

Hi everyone!

I'm new to spacy and I would like to use the noun_chunks functionality. Unfortunately I received the following error:
"NotImplementedError: [E894] The 'noun_chunks' syntax iterator is not implemented for language 'pt'."

Is the error correct in that this functionality doesn't exist for Portuguese yet? If so, how could I go about adapting the Spanish one?

What I need is to feed sentences such as (except in Portuguese):

50/100pcs Kraft Paper Bag Gift Bags Packaging Biscuit Candy Food Cookie Bread Seen Snacks Baking Takeaway Bags
Wholesale 2019 New Fashion 3D Mitsubishi Hat Cap Car logo MOTO GP Racing F1 Baseball Cap Hat Adjustable Casual Trucket Hat

and run them through some function that will spit out something like this with added commas:

50/100pcs Kraft Paper Bag, Gift Bags, Packaging, Biscuit, Candy, Food, Cookie, Bread, Seen Snacks, Baking, Takeaway Bags
Wholesale 2019, New Fashion, 3D Mitsubishi Hat, Cap, Car logo, MOTO GP Racing, F1, Baseball Cap, Hat, Adjustable, Casual, Trucket Hat

What is the best way to accomplish this?

Thanks a lot!

Answered by polm

Oct 24, 2021

If it's a useful starting point, you can copy the code from the Spanish syntax_iterators.py and test it on Portuguese If it works you're done, but you'll likely have to make some adjustments. What you should do is come up with example sentences and expected output and adjust the code until you get the output you want. It's hard to be more specific than that.

About your example sentences though - those won't work. If you feed them to the English models you'll see that the noun chunks aren't right. Noun chunks isn't designed to deal with unpunctuated arbitrary sequences of words, it's designed to pull out noun phrases from normal sentences. So noun chunks may not be what you want for your p…

View full answer

polm · 2021-10-24T03:31:59Z

polm
Oct 24, 2021

If it's a useful starting point, you can copy the code from the Spanish syntax_iterators.py and test it on Portuguese If it works you're done, but you'll likely have to make some adjustments. What you should do is come up with example sentences and expected output and adjust the code until you get the output you want. It's hard to be more specific than that.

About your example sentences though - those won't work. If you feed them to the English models you'll see that the noun chunks aren't right. Noun chunks isn't designed to deal with unpunctuated arbitrary sequences of words, it's designed to pull out noun phrases from normal sentences. So noun chunks may not be what you want for your problem at all.

If we ignore noun chunks and focus on your segmentation problem, I'm not really sure how you'd model it. Maybe you could use the trainable SentenceRecognizer, but I'm not sure how reliably you could train a model with this kind of data.

0 replies

PythonMillionaire · 2021-10-24T04:06:14Z

PythonMillionaire
Oct 24, 2021
Author

Thank you so much for your reply! I'll look for an alternative.

While I have you, please allow me to ask another newb question. Which spaCy (or other library's) feature would you recommend if I had, say, 20,000 titles between 90-200 characters in length that have been shortened down to 60 or fewer by humans and wanted to train a machine learning algorithm to do so automatically for new titles? Does spaCy have such ML capabilities? If not, are you aware of something that does?

Thanks again!

1 reply

polm Oct 24, 2021

You want to do summarization? That's out of scope for spaCy, I'm not sure what to recommend. It sounds like a basic seq2seq problem.

PythonMillionaire · 2021-10-24T04:28:05Z

PythonMillionaire
Oct 24, 2021
Author

Thank you so much for all your help! I'll look into that as well.

0 replies

DuyguA · 2021-10-31T21:41:45Z

DuyguA
Oct 31, 2021

I made this PR on request 🙂 #9559

I built Portuguese noun chunker onto Spanish noun chunker, this is the PR I made for Spanish: #9537
It includes nice figures 😉 , hope it gives ideas how parsing is done.

1 reply

PythonMillionaire Oct 31, 2021
Author

Wow, thanks so much!! I really appreciate it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

No noun_chunks for Portuguese? If not, how can I adapt Spanish's? #9532

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

No noun_chunks for Portuguese? If not, how can I adapt Spanish's? #9532

Uh oh!

PythonMillionaire Oct 23, 2021

Replies: 4 comments · 2 replies

Uh oh!

polm Oct 24, 2021

Uh oh!

PythonMillionaire Oct 24, 2021 Author

Uh oh!

polm Oct 24, 2021

Uh oh!

PythonMillionaire Oct 24, 2021 Author

Uh oh!

DuyguA Oct 31, 2021

Uh oh!

PythonMillionaire Oct 31, 2021 Author

PythonMillionaire
Oct 23, 2021

Replies: 4 comments 2 replies

polm
Oct 24, 2021

PythonMillionaire
Oct 24, 2021
Author

PythonMillionaire
Oct 24, 2021
Author

DuyguA
Oct 31, 2021

PythonMillionaire Oct 31, 2021
Author