S&P 500 Stock Price Prediction Using LSTM

Overview

This project implements an improved Long Short-Term Memory (LSTM) model to predict the next day's closing price of the S&P 500 index. The notebook LSTM_Stock_Prediction2.ipynb fetches historical S&P 500 data, preprocesses it, trains an LSTM model, evaluates its performance, and makes a prediction for the next trading day. Key improvements include reduced model complexity, adjusted sequence length, custom learning rate, increased regularization, and early stopping to address high validation loss.

Features

Data Source: Historical S&P 500 data fetched using the yfinance library.
Preprocessing: Data scaling with MinMaxScaler and sequence creation for LSTM input (sequence length: 30 days).
Model: Simplified LSTM architecture with two layers (32 units each), increased dropout (0.3), and a custom learning rate (0.0005) for the Adam optimizer.
Training: Includes early stopping to prevent overfitting, with a validation split of 0.1.
Evaluation: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics for model performance.
Prediction: Predicts the next trading day's closing price and visualizes actual vs. predicted prices.

Prerequisites

Python 3.10 or higher
Jupyter Notebook or Google Colab environment
GPU support (optional, but recommended for faster training)

Required Libraries

Install the necessary Python libraries using pip:

pip install yfinance pandas numpy matplotlib scikit-learn tensorflow

Usage

Clone or Download the Notebook:
- Download LSTM_Stock_Prediction2.ipynb to your local machine or open it in Google Colab.
Install Dependencies:
- Ensure all required libraries are installed (see Prerequisites).
Run the Notebook:
- Open the notebook in Jupyter Notebook or Google Colab.
- Execute each cell sequentially to:
  - Fetch S&P 500 historical data.
  - Preprocess the data.
  - Train the LSTM model.
  - Evaluate the model with MAE and RMSE.
  - Visualize actual vs. predicted prices.
  - Predict the next trading day's closing price.
Check Outputs:
- The notebook will display:
  - Training and validation loss plot.
  - Actual vs. predicted prices plot.
  - MAE and RMSE values.
  - The predicted closing price for the next trading day (e.g., 5699.39 for the day after June 6, 2025).

Data

Source: Yahoo Finance (via yfinance library).
Time Range: From December 30, 1927, to June 6, 2025.
Features Used: Closing price (Close) for prediction, with Open, High, Low, and Volume available in the dataset.
Note: The notebook uses the closing price for prediction, scaled to the range [0, 1].

Model Details

Architecture:
- Two LSTM layers with 32 units each.
- Dropout layers (0.3) after each LSTM layer for regularization.
- Two dense layers (16 units and 1 unit) for output.
Training:
- Sequence length: 30 days.
- Batch size: 64.
- Epochs: 50 (with early stopping, patience=10).
- Optimizer: Adam with a learning rate of 0.0005.
- Loss function: Mean Squared Error (MSE).
Performance:
- MAE: 53.37 (on the test set).
- RMSE: 86.70 (on the test set).

Results

Evaluation:
- The model achieves an MAE of 53.37 and an RMSE of 86.70 on the test set, indicating the average prediction error in S&P 500 points.
- Predictions for recent dates (e.g., June 2–6, 2025) show the model underestimates the actual closing prices by about 250–300 points.
Future Prediction:
- Predicted closing price for the next trading day (after June 6, 2025): 5699.39.

Limitations

Rate Limiting: Yahoo Finance may impose rate limits on API requests, which can cause YFRateLimitError. Consider caching data locally or using an alternative data source like Alpha Vantage.
Model Performance: The model's predictions lag behind actual prices, suggesting potential improvements in feature engineering (e.g., adding technical indicators) or model architecture (e.g., using attention mechanisms).
Data Noise: Stock price data is noisy, and the model may struggle with sudden market shifts.

Future Improvements

Alternative Data Sources: Use Alpha Vantage or Financial Modeling Prep to avoid Yahoo Finance rate limits.
Feature Engineering: Incorporate additional features like Open, High, Low, Volume, or technical indicators (e.g., RSI, MACD).
Model Enhancements: Add attention mechanisms, use a GRU instead of LSTM, or experiment with different sequence lengths.
Hyperparameter Tuning: Perform grid search to optimize learning rate, batch size, and model architecture.

Troubleshooting

Rate Limit Errors:
- Add caching to save data locally after the first fetch.
- Switch to Alpha Vantage (requires an API key).
High Loss:
- Increase regularization (e.g., dropout or L2).
- Reduce model complexity (e.g., fewer LSTM units).
- Add data smoothing (e.g., moving averages).
TensorFlow GPU Issues:
- Ensure your GPU drivers and CUDA toolkit are correctly installed.
- Fall back to CPU if GPU setup fails (handled automatically in the notebook).

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as needed.

Contact

For questions or suggestions, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
LSTM_Stock_Prediction.ipynb		LSTM_Stock_Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S&P 500 Stock Price Prediction Using LSTM

Overview

Features

Prerequisites

Required Libraries

Usage

Data

Model Details

Results

Limitations

Future Improvements

Troubleshooting

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S&P 500 Stock Price Prediction Using LSTM

Overview

Features

Prerequisites

Required Libraries

Usage

Data

Model Details

Results

Limitations

Future Improvements

Troubleshooting

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages