Navi

Navigate the cutest search engine ever!

What is Navi?

Navi is a simple, modular search engine that demonstrates the core components of modern search systems. It features a web crawler, indexer, ranking engine, and query processor, all integrated through a fast and responsive web interface.

Features

Crawler

Begins from a seed set of URLs and recursively fetches HTML pages.
Parses and follows hyperlinks while respecting robots.txt.
Supports multi-threaded crawling with configurable thread count.
Avoids duplicate pages using URL normalization and compact string matching.
Saves crawling progress to resume on failure.
Gathers approximately 6000 HTML pages for indexing.

Indexer

Extracts and stores terms from HTML documents, distinguishing between:
- Title
- Headings
- Body
Indexed data is stored persistently in MongoDB for efficient access.
Supports incremental updates for new crawled pages.
Designed for fast retrieval of matching documents and field-based term weighting.
Indexing time: ~10 minutes for 6000 documents.

Ranker

Combines two scoring mechanisms for robust ranking:

BM25F (Fielded BM25) – Relevance-based scoring
- Field Weighting: Assigns different importance (weights) to each field
  - Title: 2.0
  - Heading: 1.5
  - Body: 1.0
- Term Frequency Normalization: Adjusts term frequency (TF) per field to account for varying field lengths, preventing bias toward longer fields.
- Field Length Normalization: Uses field-specific length normalization parameters to adjust for differences in field verbosity.
PageRank – Popularity-based scoring
- Computes global importance of pages based on link structure.
- Used to boost commonly cited or authoritative sources.

The final score is a hybrid of relevance and popularity, improving both precision and trustworthiness of results.

Query Engine

Supports single-word, multi-word, and phrase searches (with quotation marks).
Applies stemming for better query matching (e.g., “travel” matches “traveler”).
Displays:
- Page title
- URL
- Snippet with query terms in bold
Efficient performance:
- Search time: ~0.5 seconds per query.
Includes:
- Pagination
- Popular query suggestions (autocomplete)
- Boolean search: AND, OR, NOT operators (max 2 per query)

Tech Stack

Backend: Spring Boot
Frontend: React.js, Figma
Database: MongoDB

How to Run

Clone the repository

git clone https://github.com/AmiraKhalid04/Navi.git cd Navi
Run the Backend

cd backend/navi ./mvnw spring-boot:run
Run the Frontend

cd frontend npm install npm start
Start Crawling & Indexing
- Run the crawler to fetch pages.
- Run the indexer to store parsed terms and metadata.
Search
- Access the web interface at http://localhost:3000
- Enter search queries and explore ranked results.

Contributors


Amira Khalid	Esraa Hassan	Alyaa Ali	Hagar Abdelsalam

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.vscode		.vscode
backend/navi		backend/navi
docs		docs
frontend/navi/interface		frontend/navi/interface
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
field_counts.json		field_counts.json
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Navi

What is Navi?

Features

Crawler

Indexer

Ranker

Query Engine

Tech Stack

How to Run

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Navi

What is Navi?

Features

Crawler

Indexer

Ranker

Query Engine

Tech Stack

How to Run

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages