How to improve dependency parses for text without punctuation #12563
-
Hi, when using the dependency parser (German large model), the correctness of the results seem to be strongly dependent on punctuation. When applied to sentences/texts without punctuation, as for instance ASR results for spoken dialogue, subordinate clauses, such as relative clauses, are not correctly assigned to the noun, but to the verb. I am aware that this may be an effect of the models being trained on written text including puctuation. I am able to alleviate the issue by running a model that adds missing punctuation first, but ofc this eats some additional time which is undesirable for conversational systems. Is there any way to make the original model "robust" against missing punctuation without using this kind of preprocessing (some model configuration, retraining etc.)? Thanks in advance for any advice, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @TimoSowa!
Yeah, punctuation likely has a significant impact on the performance of the dependency parser.
Other that what you are already doing (restoring the punctuation in a preprocessing step) you could train with custom data or custom data augmentation. There's nothing in the configuration that would make up for for the lack of punctuation in the training data I'm afraid. |
Beta Was this translation helpful? Give feedback.
Hi @TimoSowa!
Yeah, punctuation likely has a significant impact on the performance of the dependency parser.
Other that what you are already doing (restoring the punctuation in a preprocessing step) you could train with custom data or custom data augmentation. There's nothing in the configuration that would make up for for the lack of punctuation in the training data I'm afraid.