This repository provides hands-on implementations of essential Natural Language Processing (NLP) techniques using Machine Learning. It covers fundamental preprocessing steps, feature extraction methods, and advanced word embedding techniques to help build robust NLP models.
-
Tokenization – Splitting text into meaningful units like words and sentences.
-
Text Preprocessing - Stemming, Lemmatization, and Stopwords – Cleaning text data by normalizing words and removing irrelevant tokens.
-
Parts of Speech Tagging - Identifying the grammatical category of words in a sentence.
-
Named Entity Recognition - NER – Extracting key entities like names, locations, and date/time from text.
-
Bow of Words – Representing text numerically in vector form using word frequencies.
-
Word2Vec Implementation – Learning word embeddings for capturing semantic relationships between words.
Ensure you have the following Python libraries installed before running the notebooks:
pip install nltk pandas scikit-learn numpy gensim
Clone this repository and explore the notebooks to learn key NLP techniques through practical examples.