Web Crawler HTTP Project

"Just a simple tool that finds links so you don't have to."

What's This?

This web crawler scans websites and maps out all their links. It sticks to the domain you give it (because jumping between sites would be rude). I made this to learn about web scraping and to save myself some time when exploring site structures.

Tech I Used 💻

Node.js - For handling asynchronous operations and running the crawler
JSDOM - Parses HTML without needing a browser
Jest - Makes sure my code actually works before I break the internet
URL API - Way better than trying to write regex for URLs (trust me, I tried)

How To Use It

# Get the code
git clone https://github.com/JohnRaivenOlazo/web-crawler.git

# Install what it needs
npm install

# Run it on any website
npm start https://example.com

What It Does 👨‍💻

Visits the website you specify
Finds all links that stay on the same domain
Keeps track of how many times each link appears
Shows you a list sorted by popularity
Saves you hours of manual clicking and tracking

Testing It �

npm test

The tests check things like making sure URLs like https://raiven.com/path/ and https://raiven.com/path are treated the same (because trailing slashes are annoying).

How It Works

1. Clean up URLs so they're consistent
2. Visit pages and collect their links
3. Sort everything and show results

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
crawl.js		crawl.js
crawl.test.js		crawl.test.js
main.js		main.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
report.js		report.js
report.test.js		report.test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler HTTP Project

What's This?

Tech I Used 💻

How To Use It

What It Does 👨‍💻

Testing It �

How It Works

About

Uh oh!

Releases

Packages

Languages

johnraivenolazo/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler HTTP Project

What's This?

Tech I Used 💻

How To Use It

What It Does 👨‍💻

Testing It �

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages