Welcome to the MLH Stock Sentiment Analyzer, a Python-based project developed by Aditya for a Major League Hacking (MLH) submission. This tool scrapes news headlines for a curated list of stock tickers from the Tech, Real Estate, and Energy sectors, performs sentiment analysis using the FinBERT model, and outputs the results to a CSV file. The project aims to provide insights into market sentiment based on recent news articles, making it valuable for financial analysis and research.
- Web Scraping: Collects news articles for 60 stock tickers from Business Insider's markets section.
- Sentiment Analysis: Uses the FinBERT model to classify headlines as positive, negative, or neutral with confidence scores.
- Comprehensive Data: Extracts ticker, datetime, title, source, and link for each article, alongside sentiment results.
- Output: Saves results to
articles_with_sentiment.csv
for easy analysis. - GPU Support: Leverages GPU acceleration for faster sentiment analysis if available.
- Python 3.8 or higher
- Internet connection (for web scraping and downloading the FinBERT model)
- Optional: GPU for faster sentiment analysis with PyTorch
Follow these steps to set up the project locally:
-
Clone the Repository:
git clone https://github.com/Aditya04-code/mlh-stock-sentiment-analyzer.git cd mlh-stock-sentiment-analyzer
-
Create a Virtual Environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies: Install the required Python packages listed in
requirements.txt
:pip install -r requirements.txt
The
requirements.txt
contains:requests beautifulsoup4 pandas transformers torch
-
Run the Script: Execute the main script to scrape news and perform sentiment analysis:
python news_scraper_with_sentiment.py
-
What It Does:
- Scrapes up to 60 pages of news articles per ticker from Business Insider.
- Analyzes the sentiment of each headline using FinBERT.
- Saves the results to
articles_with_sentiment.csv
in the project directory.
-
Output Format: The
articles_with_sentiment.csv
file includes:ticker
: Stock ticker symbol (e.g., AAPL, XOM)datetime
: Article publication date and timetitle
: News headlinesource
: News source (e.g., Reuters, Bloomberg)link
: URL to the full articletop_sentiment
: Sentiment label (positive, negative, neutral)sentiment_score
: Confidence score (0 to 1)
-
Example Output:
ticker,datetime,title,source,link,top_sentiment,sentiment_score AAPL,2025-05-10T12:00:00,Apple Announces New iPhone,Reuters,https://example.com,positive,0.87 XOM,2025-05-10T13:00:00,Oil Prices Fall Amid Oversupply,Bloomberg,https://example.com,negative,0.76
-
Testing Tip: To reduce runtime during testing, edit
news_scraper_with_sentiment.py
and setmax_pages=5
in thescrape_news
function.
mlh-stock-sentiment-analyzer/
├── news_scraper_with_sentiment.py # Main script for scraping and sentiment analysis
├── requirements.txt # Python dependencies
├── articles_with_sentiment.csv # Output CSV (generated after running)
├── README.md # This documentation
├── LICENSE # MIT License file
└── .gitignore # Ignores virtual env, CSV, and pycache
When you run the script, you’ll see progress updates like:
Scraping news for ticker: AAPL
Page 1 for ticker AAPL processed. Articles scraped: 10
...
No more articles found for ticker AAPL on page 20.
...
Processed row 1: Apple Announces New iPhone
...
Sentiment analysis completed for 600 rows.
Data with sentiment saved to: articles_with_sentiment.csv
- Runtime: Scraping 60 pages per ticker and analyzing sentiments can take significant time. A GPU can speed up the FinBERT model.
- Web Scraping Limits: Excessive scraping may trigger rate limits on Business Insider. If issues arise, add a delay (e.g.,
time.sleep(1)
) in thescrape_news
function. - Model Download: The FinBERT model is downloaded automatically on the first run, requiring an internet connection.
- Output File: The
articles_with_sentiment.csv
file is overwritten each time the script runs. Back up previous results if needed.
We welcome contributions to improve this project! To contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/your-feature
). - Commit your changes (
git commit -m "Add your feature"
). - Push to the branch (
git push origin feature/your-feature
). - Open a pull request.
Please adhere to PEP 8 style guidelines and update documentation as needed.
This project is licensed under the MIT License. See the LICENSE file for details.
- Created for a Major League Hacking (MLH) hackathon submission.
- Powered by the FinBERT model from ProsusAI for sentiment analysis.
- Data sourced from Business Insider’s markets news section.
For questions, bug reports, or suggestions, please open an issue on this GitHub repository or contact Aditya via GitHub.
Built with ❤️ for MLH