This project provides an end-to-end solution for analyzing and forecasting trends in GitHub repositories. Using historical data, the tool generates insights into repository activity, such as issue creation patterns, resolutions, and trends, and applies forecasting models to predict future repository dynamics
- Fetch and preprocess data from GitHub repositories (e.g., issues, pull requests).
- Analyze key metrics like the number of open/closed issues, average resolution times, etc.
- Visualize trends using time series plots and interactive dashboards.
- Apply machine learning models (e.g., ARIMA, LSTM) to forecast future repository activity.
- Generate actionable insights to help maintainers and contributors manage repositories effectively.
- In the Analytics file I have used OpenAI's text-embedding-ada-002 model to generate embeddings for the search queries and the semantic search is performed using Elasticsearch.
- Data Retrieval: Uses the GitHub API to retrieve issues data for the past two months and two years (for different analyses).
- Visualization: Creates charts to visualize historical trends in issues.
- Forecasting: Implements time-series forecasting models to predict future issue trends.
- Repositories Tracked:
- Retrieve Issues Data: Fetch issue information for the above repositories over the last two months (semantic search) and two years (visualization).
- Generate Visualizations: Plot issue trends using Python libraries.
- Forecast Future Issues: Apply forecasting techniques to predict future issue counts.
- Python 3.8+
- GitHub personal access token (required for API authentication)
- Python libraries:
pandas
,matplotlib
,seaborn
,statsmodels
,prophet
,requests
,tensorflow
,scikit-learn
- Ensure that all required libraries are installed before executing this notebook.
- Plots and forecast results are included within the Jupyter notebook.
- Backend - Flask To run the backend, navigate to the Flask directory and execute the following command in a separate terminal:
python App.py
- Frontend - React
Install the necessary node_modules packages by executing:
npm install
Start the frontend server locally by running:
npm start
(This should be executed in a separate terminal)
This would start a web server with all the forecasting results.
Part 2: File - Analytics.ipynb Updates in the Directory - 'GitHub_Issues_ES_Docker_OpenAI'
Uploaded these changes to Elasticsearch using Docker. Subsequently, the notebook Analytics.ipynb retrieves the Top 5 most similar issues from Elasticsearch.
- Clone the repository:
git clone https://github.com/sohamvsonar/github-issues-forecasting.git cd github-issues-forecasting
- Add your GitHub personal access token:
Update the
GITHUB_TOKEN
variable in the notebook or environment variables. - Open the Jupyter Notebook:
jupyter notebook GitHub_Repos_Issues_Forecasting.ipynb
- Run the notebook to analyze and forecast GitHub repository issues.
- Jupyter Notebook:
GitHub_Repos_Issues_Forecasting.ipynb
containing the source code and outputs. - Analysis Report: A detailed PDF or HTML document summarizing the findings and results.
- Clear visualizations of issue trends over the past two years.
- Accurate forecasts of future issue trends for better repository management.
This project is licensed under the MIT License. See the LICENSE
file for details.
Contributions are welcome! Feel free to submit a pull request or open an issue for suggestions.
- GitHub API Documentation
- Python libraries:
matplotlib
,pandas
,statsmodels
,prophet
,LSTM
Developed by Soham Sonar.