Sitemap Crawler is a simple web-based tool built with Node.js, Express, and SimpleCrawler. It allows users to input a URL, crawl the website, and generate lists of successful and failed URLs. The results can be downloaded for further analysis.
- Web-based UI for entering a URL and starting a crawl.
- Uses SimpleCrawler to fetch and analyze pages.
- Lists successful and failed URLs separately.
- Allows downloading results for further analysis.
- Built with Node.js, Express, and Bootstrap for easy use and customization.
Ensure you have Node.js installed on your system.
git clone https://github.com/Nuraj250/sitemap-crawler.git
cd sitemap-crawlernpm installnode app.jsBy default, the server runs on http://localhost:3000.
- Open
index.htmlin your browser. - Enter a URL in the input field.
- Click "Start Crawl" to begin.
- The progress will be displayed in a modal.
- Once completed, view the results and download the successful or failed URLs.
- Node.js - Backend framework
- Express.js - Server setup
- SimpleCrawler - Web crawling library
- Bootstrap - UI framework
- JavaScript - Frontend scripting
Feel free to fork this repository and submit pull requests with improvements or additional features.
This project is licensed under the MIT License.