This project implements a Next Word Prediction model using LSTM (Long Short-Term Memory) networks trained on a sample text corpus. It allows users to input a sequence of words and predict the most likely next word using a trained deep learning model. The application is deployed using Streamlit.
- Trained on Shakespeare's Hamlet (sample dataset:
hamlet.txt) - Utilizes LSTM architecture for sequential text prediction
- Includes tokenizer for text preprocessing
- Deployed using Streamlit with real-time predictions
- Robust prediction pipeline using
tokenizer.pickleandnext_word_lstm.h5
.
├── app.py # Streamlit app for predicting next word
├── experiments.ipynb # Training, tuning, and evaluation notebook
├── hamlet.txt # Training corpus (Shakespeare's Hamlet)
├── next\_word\_lstm.h5 # Trained LSTM model
├── tokenizer.pickle # Tokenizer used for text preprocessing
├── README.md # Project documentation
git clone https://github.com/yourusername/Predicting_Next_Word.git
cd Predicting_Next_Wordpip install -r requirements.txt(If requirements.txt is not available, manually install: tensorflow, streamlit, numpy, pandas)
streamlit run app.py- User enters a sequence of words.
- The input is tokenized and padded.
- The LSTM model (
next_word_lstm.h5) predicts the next word's token. - Token is converted back to a word using the
tokenizer.pickle. - The predicted word is shown on the interface.
Training and experiments are documented in experiments.ipynb. The training flow involves:
- Loading text corpus (
hamlet.txt) - Tokenizing sequences
- Building padded sequences
- Training an LSTM model with early stopping
- File:
hamlet.txt - Source: Public domain literary text
- Preprocessed and tokenized for LSTM input
- Python 3.8+
- TensorFlow 2.x
- Streamlit
- NumPy, Pandas
Note: The app disables GPU/Metal acceleration by design for compatibility.
This project is fully compatible with:
- Streamlit Cloud
- Heroku
- Local deployment
Just make sure to include all required files and environment specs (requirements.txt, optionally runtime.txt).