Spaces are skipped when \n next to a word without a space and thus pipelines don't detect some features.



## Description
When the endline marker \n is located just behind a word without a space, the \n and the following space ("\n ")are detected as a unique token, tagged as SPACE and are skipped because `ignore_space_tokens=True` in some pipelines.  Thus, after normalization the word before and the word after \n are concatenated and pipelines can no longer detect the word after. In the code, the pipeline eds.diabetes() don't detect the word "diabète"  and the following code using get_text() explains why.

## How to reproduce the bug



```python

import edsnlp, edsnlp.pipes as eds

txt="problématique\n Diabète de type 1 depuis 5 ans chez une enfant de 7 ans"
nlp=edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.diabetes())
doc=nlp(txt)
for ent in doc.ents:
    print(ent.text, ent.label_)

--> Nothing in the terminal

 

from edsnlp.utils.doc_to_text import get_text

get_text(doc, attr="NORM", ignore_excluded=True, ignore_space_tokens=True)

--> 'problematiquediabete de type 1 depuis 5 ans chez une enfant de 7 ans'

## Your Environment

- Operating System: Windows11
- Python Version Used: 3.10.16
- EDS-NLP Version Used: 0.17.1



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spaces are skipped when \n next to a word without a space and thus pipelines don't detect some features. #422

Description

How to reproduce the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spaces are skipped when \n next to a word without a space and thus pipelines don't detect some features. #422

Description

Description

How to reproduce the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions