🤖 🔥 GitHub Repo GPT Scraper 🔥 🤖

Welcome to the GitHub Repo GPT Scraper! This powerful tool is designed to help you effortlessly scrape GitHub repositories in order to create an OpenAI GPT based on your code! It works with either a public GitHub repository URL or a local directory (defaulting to the cwd if no URL is passed).

Getting Started

Prerequisites

Node.js installed.

Usage

Scrape a GitHub Repository:
```
npx github-repo-gpt-scraper --url=https://github.com/user/repo --out=repo.json
```
Replace https://github.com/user/repo with the URL of the repository you wish to scrape.
Scrape the Current Working Directory:
```
npx github-repo-gpt-scraper --out=repo.json
```
This will scrape all the files in your current directory, excluding gitignored files per the .gitignore file in cwd, and excluding common lockfiles and binary files.
Filter Files with Include and Exclude Options:

Use the --include option to specify a glob pattern for files you want to include. Use the --exclude option to specify a glob pattern for files you want to exclude.

Example:

npx github-repo-gpt-scraper --include="src/**/*.ts" --out=repo.json

Or:

npx github-repo-gpt-scraper --exclude="tests/**" --out=repo.json

Create a GPT Using the Scraped Data:

Visit https://chat.openai.com/create and click the "Configure" tab.
Under "Knowledge," click "Upload files" and select the JSON file output by the scraper.

Add the following basic instructions to the "Instructions" field:

You are the creator of the codebase documented in the attached file and an expert in all of its code and the dependencies it uses. All of the user's question will relate to this code, so reference it heavily. Give factual, detailed answers and help the user make updates to the code in as efficient a manner possible while explaining more complex points to them along the way.

The simple instructions above cover the essentials and seem to work pretty well, but feel free to experiment with your own!

Output

The tool outputs a JSON file (repo.json in the above examples) containing the path, URL, and content of each file scraped. I haven't yet experimented with different ways of formatting the file data (or adding supplemental info) and their impact on GPTs, but I'd be eager to hear about anyone's findings if they do so!

Contribute

Contributions are welcome! Open a PR 😎

License

This project is licensed under the MIT License.

Happy Scraping and GPTs'ing! 🚀🤖

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
index.ts		index.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 🔥 GitHub Repo GPT Scraper 🔥 🤖

Getting Started

Prerequisites

Usage

Output

Contribute

License

About

Releases

Packages

Languages

granmoe/github-repo-gpt-scraper

Folders and files

Latest commit

History

Repository files navigation

🤖 🔥 GitHub Repo GPT Scraper 🔥 🤖

Getting Started

Prerequisites

Usage

Output

Contribute

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages