Skip to content

sandy-sp/gittxt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,068 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 AI-Ready Text Extractor for Git Repos | CLI tool for dataset prep, summaries, reverse engineering & bundling

🚀 Gittxt: Get Text from Git — Optimized for AI

Docs Python Version PyPI version Release Tested with Pytest PyPI Downloads GitHub repo size GitHub top language Build Status Made for LLMs License

Gittxt is an open-source tool that transforms GitHub repositories into LLM-compatible datasets.

Perfect for developers, data scientists, and AI engineers, Gittxt helps you extract and structure .txt, .json, .md content into clean, analyzable formats for use in:

  • Prompt engineering
  • Fine-tuning & retrieval
  • Codebase summarization
  • Open-source LLM workflows

💡 Why Gittxt?

Large Language Models often expect input in very specific formats. Many tools (e.g., ChatGPT, Gemini, Ollama) struggle with arbitrary GitHub URLs, complex folders, or non-text assets.

Gittxt bridges this gap by:

  • Extracting all usable text from a repo
  • Organizing it for easy ingestion by LLMs
  • Offering structured .txt, .json, .md, .zip outputs
  • Giving you full control with filtering, formatting, and plugin support

✨ Features at a Glance

  • ✅ Text extractor for code, docs, config files
  • ✅ Output: .txt, .json, .md, .zip
  • ✅ CLI and plugin system (FastAPI, Streamlit)
  • ✅ AI-ready summaries (OpenAI / Ollama)
  • ✅ Reverse engineer .txt/.json reports back into repo structure
  • .gittxtignore support
  • ✅ Async scanning for large projects
  • ✅ Works offline and in constrained compute environments

📁 Output Types

outputs/
├── txt/         # Plain text report
├── json/        # Structured metadata
├── md/          # Markdown-formatted summary
└── zip/         # Bundled results + manifest

🚀 Quickstart

Install

pip install gittxt

Run your first scan

gittxt scan https://github.com/sandy-sp/gittxt --output-format txt,json --lite --zip

Reverse engineer a summary

gittxt re outputs/project.md -o ./restored

🌐 Explore the Visual Web App

Try the hosted version (no install required!)

👉 Launch Streamlit App


📈 Gittxt for AI Workflows

  • Use it to build structured input for LLMs
  • Ideal for prompt chaining, document agents, code summarization
  • Helps transform messy repos into single-file, AI-consumable reports

📖 Full Documentation

All CLI flags, plugins, formats, and filters are documented here:

📚 Explore Gittxt Docs


🔧 Plugin Support

Gittxt supports modular plugins:

  • gittxt-api: Run via FastAPI backend
  • gittxt-streamlit: Interactive dashboard

Install & run with:

gittxt plugin install gittxt-streamlit
gittxt plugin run gittxt-streamlit

🧠 Built for Developers & AI Engineers

Created by Sandeep Paidipati, Gittxt was born out of a need to:

  • Quickly preview and summarize GitHub repos with LLMs
  • Avoid manual copying, filtering, and converting files
  • Create AI-ready datasets for learning and experimentation

🙏 Support the Project

  • ⭐️ Star this repo if it helped you
  • 🧵 Share it with your dev/AI community
  • 🤝 Contact me for collaboration or sponsorship

🔒 License

MIT License © Sandeep Paidipati


GittxtGet Text from Git — Optimized for AI

About

Gittxt is an AI-focused CLI and plugin tool for extracting, filtering, and packaging text from GitHub repos. Build LLM-compatible datasets, prep code for prompt engineering, and power AI workflows with structured .txt, .json, .md, or .zip outputs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages