Next Word Prediction using LSTM

A Natural Language Processing (NLP) project that predicts the next word in a sentence using a Long Short-Term Memory (LSTM) neural network.

The application is deployed using Streamlit, allowing users to input a sequence of words and receive the predicted next word based on the trained model.

Live link :- https://alok-nextwordpredictor-goku.streamlit.app/

Application Preview

Project Overview

Next-word prediction is an important task in language modeling. It is widely used in modern applications such as:

Google keyboard suggestions
ChatGPT-style text generation
Email autocomplete
Speech recognition systems

This project trains an LSTM-based neural network on a text dataset to learn word sequences and predict the most probable next word.

How It Works

The system follows the standard NLP deep learning pipeline:

Text Dataset ↓ Text Cleaning & Tokenization ↓ Word Indexing ↓ Sequence Generation ↓ Padding Sequences ↓ Embedding Layer ↓ LSTM Layer ↓ Dense + Softmax ↓ Next Word Prediction

Model Architecture

Input Layer ↓ Embedding Layer ↓ LSTM Layer ↓ Dense Layer (Softmax)

Embedding Layer

Converts words into dense vector representations so the neural network can understand semantic relationships.

Example representation:

hello → [0.21, -0.4, 0.87, ...] world → [0.12, 0.56, -0.19, ...]

LSTM Layer

LSTM captures long-term dependencies in sequences.

It works using gates:

Forget Gate (ft) – decides what information to discard
Input Gate (it) – decides what new information to store
Cell State (Ct) – long-term memory of the network
Output Gate (ot) – controls the final output

Dense + Softmax

The final dense layer outputs probabilities for every word in the vocabulary.

Example:

Input: hello world

Predicted probabilities:

to → 0.41 is → 0.22 and → 0.17 the → 0.12

The word with the highest probability is selected as the next word.

Tech Stack

Python
TensorFlow / Keras
LSTM Neural Networks
Streamlit
NumPy
Pickle

Project Structure

next_word_predictor/
│
├── app.py
├── model.h5
├── tokenizer.pickle
├── requirements.txt
│
├── screenshots/
│   └── image.png
│
└── README.md

Installation

1. Clone the repository

git clone https://github.com/Alok-kumar-priyadarshi/next-word-predictor.git
cd next-word-predictor

2. Create a virtual environment

python -m venv venv

Activate it:

Windows

venv\Scripts\activate

Mac / Linux

source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Run the application

streamlit run app.py

Example

Input:

hello world

Output:

Next word: to

Learning Outcomes

This project demonstrates:

Sequence modeling using LSTM
NLP preprocessing
Tokenization and padding
Language modeling
Deploying ML models using Streamlit

Future Improvements

Possible improvements include:

Using Bidirectional LSTM
Training on a larger dataset
Showing Top-5 predictions
Deploying with Docker
Replacing LSTM with Transformer models

Ethical Considerations

When developing NLP systems:

Ensure training data does not contain harmful bias
Avoid generating misleading or harmful content
Make clear that predictions are statistical probabilities, not factual statements

Author

Alok Kumar Priyadarshi

Computer Science Student interested in Artificial Intelligence, Machine Learning, and Generative AI

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
screenshots		screenshots
README.md		README.md
app.py		app.py
experiments2.ipynb		experiments2.ipynb
next_word_lstm_2.h5		next_word_lstm_2.h5
requirements.txt		requirements.txt
sherlock.txt		sherlock.txt
tokenizer2.pickle		tokenizer2.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next Word Prediction using LSTM

Live link :- https://alok-nextwordpredictor-goku.streamlit.app/

Application Preview

Project Overview

How It Works

Model Architecture

Embedding Layer

LSTM Layer

Dense + Softmax

Tech Stack

Project Structure

Installation

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Run the application

Example

Learning Outcomes

Future Improvements

Ethical Considerations

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Next Word Prediction using LSTM

Live link :- https://alok-nextwordpredictor-goku.streamlit.app/

Application Preview

Project Overview

How It Works

Model Architecture

Embedding Layer

LSTM Layer

Dense + Softmax

Tech Stack

Project Structure

Installation

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Run the application

Example

Learning Outcomes

Future Improvements

Ethical Considerations

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages