Sentences within a dataset

I am working on a dataset quite "noisy", so it's very difficult to exactly detect a sentence (for example, I have a lot of abbreviation with points, so these points are detected as the end of phrases).

Do you think that having many short sentences (often with just 3 words) could compromise the algorithm performances? Is it important to preserve the information about words belonging to a sentence?

PS: Furthermore, if the punctuation is filtered, the information about a "phrase" is completely lost, as documents became a bag of words, could it work also in this case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentences within a dataset #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sentences within a dataset #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions