Flash 🕸️

Flash is a fast and lightweight web crawler written in Go. It recursively visits web pages, extracts links, and can be configured for SEO auditing, content discovery, or data collection.

🚀 Features

⚡ Fast Concurrent Crawling: Efficiently processes multiple pages
🌐 Recursive URL Discovery: Automatically follows links to explore websites
🧩 Simple & Modular: Easy to extend with your own functionality
📊 URL Normalization: Handles various URL formats consistently
🐳 Docker Support: Ready for containerized deployments

📋 Requirements

Go 1.24 or later
Docker (optional for containerization)

🛠️ Installation

From Source

# Clone the repository
git clone https://github.com/sudonitj/Flash.git
cd Flash

# Install dependencies
go mod tidy

# Build the application
go build -o flash

With Docker

# Build the Docker image
docker build -t flash-crawler .

💻 Usage

Basic Usage

# Run from compiled binary
./flash https://example.com

# Or using go run
go run main.go https://example.com

Docker Usage

# Run with Docker
docker run --rm flash-crawler https://example.com

📂 Project Structure

flash/
├── main.go                  # Entry point
├── crawler/                 # Core crawler package
│   ├── crawler.go           # Crawling logic
│   ├── get_urls.go          # HTML parsing for URLs
│   ├── normalize_url.go     # URL normalization
│   ├── get_url_test.go      # Tests for URL extraction
│   └── normalize_url_test.go# Tests for normalization
├── go.mod                   # Module definition
├── go.sum                   # Dependency checksums
├── Dockerfile               # Docker configuration
└── README.md                # This file

⚙️ How It Works

The crawler starts with a given URL (the seed)
It visits the page, extracts all links from the HTML
Links are normalized to prevent duplicates
Each discovered URL is added to the queue if not already visited
The process repeats until no more URLs are left to visit

Flash uses the standard Go library for HTTP requests and HTML parsing, making it lightweight with minimal dependencies.

🧪 Running Tests

# Run all tests
go test ./...

# Run tests for a specific package
go test ./crawler

🔄 Extending Flash

Flash is designed to be modular. You can extend its functionality by:

Adding more analysis during crawling
Implementing depth limits
Adding domain filtering
Implementing rate limiting
Adding data extraction capabilities

📊 Example Use Cases

SEO Auditing: Find broken links and analyze site structure
Content Discovery: Map out all accessible pages on a website
Data Collection: Extract specific types of content from pages
Site Monitoring: Track changes to pages over time

🛡️ Responsible Usage

When using Flash, please:

Respect website terms of service
Consider adding rate limiting to avoid overloading servers
Add support for robots.txt to respect crawling directives
Only crawl sites you have permission to access

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Go community for excellent HTTP and HTML parsing libraries
All contributors who help improve Flash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flash 🕸️

🚀 Features

📋 Requirements

🛠️ Installation

From Source

With Docker

💻 Usage

Basic Usage

Docker Usage

📂 Project Structure

⚙️ How It Works

🧪 Running Tests

🔄 Extending Flash

📊 Example Use Cases

🛡️ Responsible Usage

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
crawler		crawler
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

sudonitj/flash

Folders and files

Latest commit

History

Repository files navigation

Flash 🕸️

🚀 Features

📋 Requirements

🛠️ Installation

From Source

With Docker

💻 Usage

Basic Usage

Docker Usage

📂 Project Structure

⚙️ How It Works

🧪 Running Tests

🔄 Extending Flash

📊 Example Use Cases

🛡️ Responsible Usage

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages