The NLP Toolkit is a Streamlit-based web application for performing basic text processing and analysis using Python. It leverages NLTK, SpaCy, and other NLP libraries to provide features such as tokenization, stemming, lemmatization, POS tagging, n-grams, Bag-of-Words, TF-IDF, parsing, Named Entity Recognition (NER), sentiment analysis, and visualization.
- Tokenization (words and sentences)
- Stopwords removal
- Stemming (Porter, Lancaster, Snowball)
- Lemmatization
- POS tagging
- N-grams generation
- Bag-of-Words representation
- TF-IDF representation
- Dependency parsing
- Named Entity Recognition (NER)
- Sentiment analysis
- Word frequency plots and WordClouds
- Download original and processed tokens
- Clone the repository:
git clone <repository_url>
cd <repository_folder>- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Download the SpaCy English model:
python -m spacy download en_core_web_sm- Run the Streamlit app:
streamlit run app.py-
In the sidebar, choose your input source:
- Paste text
- Upload
.txtor.csvfile (text should be in the first column)
-
Select the desired NLP options and parameters:
- Tokenization
- Stopwords removal
- Stemming / Lemmatization
- POS tagging
- N-grams
- Bag-of-Words / TF-IDF
- Parsing / NER
- Sentiment analysis
-
Click Run NLP to process the text.
-
Visualizations and downloadable token files are available after processing.
See requirements.txt for all dependencies.
The app automatically downloads required NLTK datasets if not present:
- punkt
- averaged_perceptron_tagger
- wordnet
- omw-1.4
- stopwords
MIT License