This project consists of two main components:
- A GitHub scraper that collects repository and contributor data
- A web dashboard to visualize the collected data
- Python 3.8+
- Node.js 14+
- GitHub Personal Access Token
-
Create a
.envfile in the root directory based on the.env.exampletemplate:GITHUB_TOKEN=your_github_personal_access_token_here -
Install Python dependencies:
pip install -r requirements.txt
-
Run the GitHub scraper to collect data:
python github_scraper.pyThis will:
- Search GitHub for repositories matching the specified keywords
- Collect repository details
- Gather information about contributors to these repositories
- Save all data to CSV files in the
github_datadirectory
-
Start the API server:
python dashboard_api.py -
In a separate terminal, navigate to the dashboard directory and install dependencies:
cd dashboard npm install -
Start the Next.js development server:
npm run dev -
Open your browser and navigate to
http://localhost:3000to view the dashboard
- Overview statistics of repositories and contributors
- Language distribution visualization
- Topic cloud showing popular topics
- Top repositories by star count
- Top contributors by follower count and contributions
You can customize the keywords used for repository search by modifying the keywords list in the main() function of github_scraper.py.
The scraper generates the following CSV files in the github_data directory:
repositories_[timestamp].csv: Basic repository informationrepositories_detailed_[timestamp].csv: Detailed repository informationcontributors_[timestamp].csv: Contributor information
These files are automatically used by the dashboard to visualize the data.