Linguistic Alpha: Earnings Call Analysis Engine

This project leverages Natural Language Processing (NLP) and Machine Learning to analyze the language of S&P 500 earnings call transcripts and predict future stock volatility. It features an end-to-end pipeline that automates data collection, feature engineering, model training, and performance evaluation.

The key finding of this project is the successful creation of a Volatility Prediction Model that demonstrated a strong, verifiable ability to predict future risk based on the linguistic patterns in corporate earnings calls.

Features

Automated Data Pipeline: Downloads and processes years of earnings call transcripts from the kurry/sp500_earnings_transcripts dataset on Hugging Face.
Advanced Feature Engineering: Calculates dozens of linguistic features, including sentiment, complexity, and risk keyword density, and normalizes them using Z-scores to measure deviation from a company's historical average.
Dual Predictive Models: Trains two separate ensemble machine learning models:
- Volatility Prediction Model (Successful): Predicts whether the next quarter will be a high or low volatility period with high accuracy.
- Return Direction Model (Experimental): Predicts whether the next quarter's stock return will be positive or negative.
Rigorous Backtesting: Utilizes a strict temporal hold-out set (2024 data) to evaluate model performance on completely unseen data.
Interactive Dashboard: A multi-page Streamlit application to visualize the linguistic features and the final model performance metrics.

Project Structure

linguistic-alpha/
├── analysis/
│   ├── data_loader.py         # Downloads and processes transcript data
│   ├── transcript_feature_engineering.py # Calculates linguistic features
│   ├── model_training.py      # Trains and saves the ML models
│   └── backtest.py            # Evaluates models on hold-out data
├── dashboard/
│   ├── app.py                 # Main Streamlit app
│   └── pages/                 # Dashboard pages for analysis and model performance
├── output/                    # Stores generated data and trained models
├── run_pipeline.py            # Main script to run the entire pipeline
├── requirements.txt
└── README.md

How to Run

Clone the Repository:

git clone <your-repo-url>
cd linguistic-alpha

Create and Activate a Virtual Environment:

python -m venv venv
source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Full Pipeline: This command will download the data, perform feature engineering, train the models, and run the backtest. This will take some time as it processes over a decade of data.
```
python run_pipeline.py
```
Launch the Dashboard: After the pipeline completes, you can view the results.
```
streamlit run dashboard/app.py
```

Key Findings

The primary success of this project is the Volatility Prediction Model, which achieved an Accuracy of 81.25% and an AUC Score of 0.85 on the 2024 hold-out set. This demonstrates a strong predictive signal in linguistic data for forecasting future market risk. The Return Prediction Model did not show significant predictive power, confirming the difficulty of predicting stock direction.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
analysis		analysis
dashboard		dashboard
nltk_data		nltk_data
output		output
parser		parser
scraper		scraper
utils		utils
.gitignore		.gitignore
README.md		README.md
download_nltk_data.py		download_nltk_data.py
project_summary.md		project_summary.md
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linguistic Alpha: Earnings Call Analysis Engine

Features

Project Structure

How to Run

Key Findings

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linguistic Alpha: Earnings Call Analysis Engine

Features

Project Structure

How to Run

Key Findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages