GitHub Repository and Contributor Scraper

This project consists of two main components:

A GitHub scraper that collects repository and contributor data
A web dashboard to visualize the collected data

Setup

Prerequisites

Python 3.8+
Node.js 14+
GitHub Personal Access Token

Environment Setup

Create a .env file in the root directory based on the .env.example template:
```
GITHUB_TOKEN=your_github_personal_access_token_here
```
Install Python dependencies:
```
pip install -r requirements.txt
```

Running the Scraper

Run the GitHub scraper to collect data:
```
python github_scraper.py
```
This will:
- Search GitHub for repositories matching the specified keywords
- Collect repository details
- Gather information about contributors to these repositories
- Save all data to CSV files in the github_data directory

Running the Dashboard

Start the API server:
```
python dashboard_api.py
```
In a separate terminal, navigate to the dashboard directory and install dependencies:
```
cd dashboard
npm install
```
Start the Next.js development server:
```
npm run dev
```
Open your browser and navigate to http://localhost:3000 to view the dashboard

Dashboard Features

Overview statistics of repositories and contributors
Language distribution visualization
Topic cloud showing popular topics
Top repositories by star count
Top contributors by follower count and contributions

Customization

You can customize the keywords used for repository search by modifying the keywords list in the main() function of github_scraper.py.

Data Files

The scraper generates the following CSV files in the github_data directory:

repositories_[timestamp].csv: Basic repository information
repositories_detailed_[timestamp].csv: Detailed repository information
contributors_[timestamp].csv: Contributor information

These files are automatically used by the dashboard to visualize the data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dashboard		dashboard
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
dashboard_api.py		dashboard_api.py
github_scraper.py		github_scraper.py
github_scraper_resumable.py		github_scraper_resumable.py
requirements.txt		requirements.txt
restart_dashboard.sh		restart_dashboard.sh
resume_scraping.sh		resume_scraping.sh
scraper.log		scraper.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Repository and Contributor Scraper

Setup

Prerequisites

Environment Setup

Running the Scraper

Running the Dashboard

Dashboard Features

Customization

Data Files

About

Uh oh!

Releases

Packages

Uh oh!

Languages

j-sp4/candidates-gh

Folders and files

Latest commit

History

Repository files navigation

GitHub Repository and Contributor Scraper

Setup

Prerequisites

Environment Setup

Running the Scraper

Running the Dashboard

Dashboard Features

Customization

Data Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages