📚 View Full Documentation | 🚀 Quick Start | 🐳 Docker Guide | 📋 Project Structure
A web scraper built with Scrapy that extracts live stock prices from the Nairobi Stock Exchange (NSE). The scraped prices are stored in a MongoDB Atlas database using PyMongo, and Atlas Charts is used to visualize the data.
The accompanying article can be found here.
The actual platform we are scraping is afx website.
- Python 3.11 or higher (tested with 3.11, 3.12, 3.13) and pip
- An Africa's Talking account
- API Key and username from your account. Create an app and note the API key
- A MongoDB Atlas account (free tier available)
- Create a cluster and note the connection string
Clone the repository:
git clone https://github.com/KenMwaura1/nse-stock-scraper
cd nse-stock-scraperCreate and activate a virtual environment:
python -m venv env
source env/bin/activateAlternatively, if you're using pyenv:
pyenv virtualenv nse_scraper
pyenv activate nse_scraperInstall the required dependencies:
pip install -r requirements.txtCreate an environment file for your credentials:
cd nse_scraper
touch .envAdd your credentials (API keys, MongoDB connection string, etc.) to the .env file. You can reference the example file for the required variables.
Alternatively, copy the example environment file:
cp .env.example .envThen edit .env with your credentials.
- MONGODB_URI - MongoDB Atlas connection string (with credentials)
- MONGODB_DATABASE - MongoDB database name (default:
nse_data) - at_username - Africa's Talking account username
- at_api_key - Africa's Talking API key
- mobile_number - Phone number for notifications (format:
+254XXXXXXXXXfor Kenya)
From the project root directory, run the scraper:
scrapy crawl afx_scraperTo output results to a JSON file for preview:
scrapy crawl afx_scraper -o test.jsonThe scraper will:
- Fetch stock data from the AFX website
- Parse stock prices, symbols, and changes
- Store data in MongoDB Atlas with a timestamp
- Prevent duplicate entries using unique indexes
Prerequisites: Docker and docker-compose installed
Setup:
# Create environment file
cp .env.docker .env
# Edit .env with your credentials
nano .envRun the scraper:
# Start MongoDB and run scraper
docker-compose up --build
# Run in background
docker-compose up -d
# View logs
docker-compose logs -f scraper
# Stop services
docker-compose downRun individual commands:
# Run scraper with debug logging
docker-compose run --rm scraper crawl afx_scraper --loglevel=DEBUG
# Run notifications
docker-compose run --rm scraper python nse_scraper/stock_notification.py
# Access MongoDB shell
docker-compose exec mongodb mongoshDocker Tips:
- MongoDB data persists in Docker volumes (
mongodb_data) - First run downloads ~500MB of images (Python, MongoDB)
- Subsequent runs are much faster
- For production, use environment variables instead of .env file
To automate text notifications when stock prices change, use the stock_notification.py script with a scheduler.
python nse_scraper/stock_notification.pyThis will:
- Query the latest stock data from MongoDB
- Check if the stock price meets the configured threshold (default: ≥ 38 KES)
- Send an SMS notification via Africa's Talking if the threshold is met
For Heroku deployments, use Advanced Scheduler:
- Install the Advanced Scheduler addon from your Heroku dashboard
- Click "Create trigger" and configure:
- Schedule: Daily at 11:00 AM
- Command:
python nse_scraper/stock_notification.py - Timezone: Africa/Nairobi (or your timezone)
- Test by clicking "Execute trigger" to verify notifications work
This project uses GitHub Actions to automatically check code quality, security, and functionality on every push and pull request.
Jobs:
- Lint - Checks code style with flake8, Black, and isort (Python 3.11)
- Security - Scans for vulnerabilities with Bandit and Safety (Python 3.11)
- Test - Tests across Python 3.11, 3.12, 3.13 with MongoDB 6.0, 7.0, 8.0 (9 test combinations)
- Build - Builds Docker image to ensure Dockerfile is valid
View Results:
- Go to Actions tab in GitHub to see workflow runs
- Click on a workflow run to see detailed job logs
- Pull requests show workflow status as checks
Dependabot automatically checks for outdated dependencies and creates pull requests with updates:
What it monitors:
- Python packages (weekly updates)
- GitHub Actions (weekly updates)
- Docker base images (weekly updates)
How it works:
- Dependabot creates a PR with dependency updates
- Automated tests run on the PR
- Review the changes and merge if tests pass
- Dependencies stay current and secure
Configuration: .github/dependabot.yml
For local development or other platforms, use tools like:
- Linux/Mac:
cronjobs - Windows: Task Scheduler
- Docker: Scheduled containers
- APScheduler: Python-based scheduling library
- Verify your
MONGODB_URIis correct and includes credentials - Check that your IP address is whitelisted in MongoDB Atlas (Network Access)
- Ensure the database name in
MONGODB_DATABASEexists
- Check that the target website structure hasn't changed
- Run with
LOG_LEVEL = "DEBUG"in settings to see detailed parsing info - Verify the CSS/XPath selectors match current HTML
- Verify Africa's Talking credentials are correct
- Check that your account has sufficient balance
- Ensure phone number format includes country code (e.g., +254XXXXXXXXX)
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Make your changes and test
- Submit a pull request
- Add multiple stock watchlists
- Implement price change threshold notifications
- Add data visualization dashboard
- Support multiple exchanges (ASE, BOURSE, etc.)
- Add unit tests for spider and pipeline
- Implement retry logic with exponential backoff
- Add Telegram/Slack notification options


