This project leverages LSTM (Long Short-Term Memory) networks for time-series sales prediction while utilizing PySpark Streaming for real-time data processing.
The used data is Store Item on kaggle.
- Deep Learning:
TensorFlow/Keras - Big Data & Streaming: Apache
PySpark - Data Processing & Analysis:
Pandas,NumPy,Scikit-learn - Visualization:
Matplotlib,Plotly
- Data Preprocessing:
- Load and clean the dataset using PySpark.
- Split data into training and testing sets.
- Model Development:
- Build an LSTM model using Keras.
- Train the model on historical sales data.
- Evaluation & Visualization:
- Evaluate model performance using MSE (Mean Squared Error).
- Visualize predictions with Matplotlib & Plotly.
- Real-Time Streaming & Prediction:
- Implement PySpark Streaming to process incoming sales data.
- Dynamically update predictions based on new inputs.