How to improve dependency parses for text without punctuation #12563

TimoSowa · 2023-04-21T19:34:09Z

TimoSowa
Apr 21, 2023

Hi,

when using the dependency parser (German large model), the correctness of the results seem to be strongly dependent on punctuation. When applied to sentences/texts without punctuation, as for instance ASR results for spoken dialogue, subordinate clauses, such as relative clauses, are not correctly assigned to the noun, but to the verb. I am aware that this may be an effect of the models being trained on written text including puctuation.

I am able to alleviate the issue by running a model that adds missing punctuation first, but ofc this eats some additional time which is undesirable for conversational systems. Is there any way to make the original model "robust" against missing punctuation without using this kind of preprocessing (some model configuration, retraining etc.)?

Thanks in advance for any advice,
Timo

Answered by rmitsch

Apr 24, 2023

Hi @TimoSowa!

I am aware that this may be an effect of the models being trained on written text including puctuation.

Yeah, punctuation likely has a significant impact on the performance of the dependency parser.

Is there any way to make the original model "robust" against missing punctuation without using this kind of preprocessing (some model configuration, retraining etc.)?

Other that what you are already doing (restoring the punctuation in a preprocessing step) you could train with custom data or custom data augmentation. There's nothing in the configuration that would make up for for the lack of punctuation in the training data I'm afraid.

View full answer

rmitsch · 2023-04-24T08:46:28Z

rmitsch
Apr 24, 2023
Maintainer

Hi @TimoSowa!

I am aware that this may be an effect of the models being trained on written text including puctuation.

Yeah, punctuation likely has a significant impact on the performance of the dependency parser.

Is there any way to make the original model "robust" against missing punctuation without using this kind of preprocessing (some model configuration, retraining etc.)?

Other that what you are already doing (restoring the punctuation in a preprocessing step) you could train with custom data or custom data augmentation. There's nothing in the configuration that would make up for for the lack of punctuation in the training data I'm afraid.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to improve dependency parses for text without punctuation #12563

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to improve dependency parses for text without punctuation #12563

Uh oh!

TimoSowa Apr 21, 2023

Replies: 1 comment

Uh oh!

rmitsch Apr 24, 2023 Maintainer

TimoSowa
Apr 21, 2023

rmitsch
Apr 24, 2023
Maintainer