GitHub Repository Crawler

This utility, inspired by GPT-Crawler, is designed primarily for training custom OpenAI GPTs. It automates the process of fetching relevant files from specified GitHub repositories. These files can then be used as a dataset for training custom GPT models, allowing for more focused and domain-specific language understanding.

Features

Fetches files from a GitHub repository based on user-defined patterns.
Supports regular expressions for precise file matching.
Retrieves files from specific branches.
Saves the content in a local JSON file for easy access and training usage.

Prerequisites

Node.js and npm installed.
A GitHub personal access token.

Installation

Clone this repository or download the script. Then, run the following command to install dependencies:

npm install

Configuration

The script uses environment variables for configuration. Create a .env file in the root directory with the following content:

GITHUB_TOKEN=your_github_token

Replace your_github_token with your personal GitHub token.

Usage

Running the script:

with npx
```
npx -y tsx
```
with package.json
```
npm run start
```

When prompted, enter the repository details (owner, repo, branch) and the pattern for the files you wish to fetch.

Alternatively, you can set these details in the .env and config.json files.

Output

The script will output a JSON file (output.json) containing the URL and content of each fetched file. This file can then be used as input for training custom GPT models.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
.gitignore		.gitignore
README.md		README.md
config.ts		config.ts
main.ts		main.ts
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Repository Crawler

Features

Prerequisites

Installation

Configuration

Usage

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ExaDev/gpt-git-knowledge

Folders and files

Latest commit

History

Repository files navigation

GitHub Repository Crawler

Features

Prerequisites

Installation

Configuration

Usage

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages