Neon-NLP/3_NLP_Pipeline.md at main · Sujit224/Neon-NLP

Once tokenization is done the text has surpass number of sequential steps in a pipeline as shown in the image below.

![[NLP_Pipeline_flowdiagram.png]]

An NLP pipeline is a series of steps used to transform raw, unstructured text into a structured format that a machine can understand and analyze. Think of it like a factory assembly line: raw material goes in one end, and refined data comes out the other.

The NLP pipeline is initially empty and has only the Tokenizer

nlp = spacy.blank("en")
doc = nlp("Hey everyone welcome to the NLP pipeline.")

for token in doc:
    print(token)

#Output
Hey
everyone
welcome
to
the
NLP
pipeline
.

nlp.pipe_names
# Output: []

We can either add attributes manually into the pipeline or use a pretrained pipeline from spacy.

nlp = spacy.load("en_core_web_sm")
# A pretrained pipeline

nlp.pipe_names
# Output: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

Loading specific custom attributes into the pipeline

source_nlp = spacy.load("en_core_web_sm")

nlp = spacy.blank("en")
nlp.add_pipe("ner",source = source_nlp) # Thus you can source it form another pipeline 
nlp.pipe_names
# Output: ['ner']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

3_NLP_Pipeline.md

Latest commit

History

3_NLP_Pipeline.md

File metadata and controls