A Python tool for retrieving and storing current NASDAQ-100 constituents from Wikipedia.
This project scrapes the list of NASDAQ-100 companies from the Wikipedia page and saves the data in CSV and JSON formats. The tool uses multiple fallback strategies to ensure reliable data extraction.
This tool was created because I needed the NASDAQ-100 index composition for another project and thought it would be valuable to share this data source with the community. Rather than keeping it private, I decided to make it publicly available so others can benefit from automated access to current NASDAQ-100 constituent data.
The goal is to provide a reliable, automated way to access this financial data that updates regularly and can be easily integrated into other projects, research, or analysis workflows.
- Multiple extraction methods: Uses both
pandas.read_html()andBeautifulSoupas fallback - Robust error handling: Automatic retry attempts on failures
- Data validation: Checks completeness and correctness of extracted data
- Multiple output formats: Saves data as both CSV and JSON
- Logging: Detailed logging of all operations
- Data cleaning: Automatic cleaning of whitespace and formatting
- Clone the repository:
git clone https://github.com/Gary-Strauss/NASDAQ100_Constituents
cd NASDAQ100_Constituents- Install dependencies:
pip install -r requirements.txtRun the script directly:
python nasdaq100_scraper.pyThe tool will automatically:
- Retrieve NASDAQ-100 data from Wikipedia
- Validate and clean the data
- Save results to
data/nasdaq100_constituents.csvanddata/nasdaq100_constituents.json - Display a summary of the first 5 entries
This repository automatically updates the NASDAQ-100 data monthly using GitHub Actions:
- Schedule: 1st of every month at 10:00 UTC
- Manual trigger: Available via GitHub Actions tab
- Automatic releases: Creates tagged releases when data changes
You can directly access the latest data from GitHub:
CSV Format:
https://raw.githubusercontent.com/Gary-Strauss/nasdaq100-scraper/main/data/nasdaq100_constituents.csv
JSON Format:
https://raw.githubusercontent.com/Gary-Strauss/nasdaq100-scraper/main/data/nasdaq100_constituents.json
import pandas as pd
import requests
# Load latest CSV data directly from GitHub
csv_url = "https://raw.githubusercontent.com/Gary-Strauss/nasdaq100-scraper/main/data/nasdaq100_constituents.csv"
df = pd.read_csv(csv_url)
# Or load JSON data
json_url = "https://raw.githubusercontent.com/Gary-Strauss/nasdaq100-scraper/main/data/nasdaq100_constituents.json"
response = requests.get(json_url)
data = response.json()- CSV format (
data/nasdaq100_constituents.csv): Tabular representation for Excel/spreadsheet programs - JSON format (
data/nasdaq100_constituents.json): Structured data for programmatic use
The extracted data contains the following columns:
- Ticker: Company stock symbol
- Company: Full company name
- GICS_Sector: Global Industry Classification Standard sector
- GICS_Sub_Industry: GICS sub-industry
The tool currently extracts 101 companies, including:
- Apple Inc. (AAPL) - Information Technology
- Microsoft (MSFT) - Information Technology
- Amazon (AMZN) - Consumer Discretionary
- Nvidia (NVDA) - Information Technology
- Meta Platforms (META) - Communication Services
- Pandas method: First attempts
pandas.read_html()for fast table extraction - BeautifulSoup fallback: Uses BeautifulSoup when pandas method fails
- Intelligent column detection: Automatic identification of relevant table columns
- Retry mechanism: Up to 3 retry attempts on network errors
- Checks for at least 90 companies (typically ~100-101)
- Validates all required columns
- Cleans whitespace and formatting errors
- Ticker validation (1-5 uppercase letters)
pandas>=1.3.0: Data manipulation and CSV exportrequests>=2.25.0: HTTP requestsbeautifulsoup4>=4.9.0: HTML parsing as fallbacklxml>=4.6.0: XML/HTML parser for pandashtml5lib>=1.1: Additional HTML parser
The data is retrieved from the Wikipedia "NASDAQ-100" page:
- Primary Source: Wikipedia - NASDAQ-100
- Original Data Source: Wikipedia references the official NASDAQ composition from NASDAQ NDX Index (as of 2025-06-22)
- License: Wikipedia content is available under the Creative Commons Attribution-ShareAlike License 3.0 (CC BY-SA 3.0)
- Data originates from Wikipedia and is subject to CC BY-SA 3.0 license
- When redistributing, Wikipedia must be credited as the source
- Derivative works must be published under the same license
- Data is provided "as is" without warranty for completeness or accuracy
- For financial decisions, please consult official sources
The data flow is: NASDAQ Official → Wikipedia → This Tool
- NASDAQ maintains the official index composition at nasdaq.com
- Wikipedia editors update their page based on official NASDAQ data
- This tool extracts the data from Wikipedia for programmatic use
- Network errors: The tool automatically retries on temporary connection problems
- Table structure changed: If Wikipedia page changes, column detection logic may need adjustment
- Missing dependencies: Ensure all packages from
requirements.txtare installed
The tool logs all steps in detail. For issues, check console output for specific error messages.
2025-06-22 08:27:58,468 - INFO - Attempt 1 of 3
2025-06-22 08:27:58,468 - INFO - Trying to retrieve data with pandas.read_html()...
2025-06-22 08:27:58,750 - WARNING - No suitable Components table found with pandas
2025-06-22 08:27:58,750 - INFO - Falling back to BeautifulSoup...
2025-06-22 08:27:59,078 - INFO - DataFrame validation successful
2025-06-22 08:27:59,078 - INFO - Successfully retrieved 101 components with BeautifulSoup
nasdaq100-scraper/
├── .github/
│ └── workflows/
│ └── update-nasdaq100.yml # GitHub Actions workflow
├── nasdaq100_scraper.py # Main script
├── requirements.txt # Python dependencies
├── README.md # This file
└── data/ # Output directory
├── nasdaq100_constituents.csv
└── nasdaq100_constituents.json
Improvements and bug fixes are welcome! Please create a pull request or open an issue.
This tool is for informational purposes only. The data comes from Wikipedia and may be incomplete or outdated. For investment decisions, please consult official financial sources such as NASDAQ or Bloomberg.