Navi is a simple, modular search engine that demonstrates the core components of modern search systems. It features a web crawler, indexer, ranking engine, and query processor, all integrated through a fast and responsive web interface.
-
Begins from a seed set of URLs and recursively fetches HTML pages.
-
Parses and follows hyperlinks while respecting
robots.txt. -
Supports multi-threaded crawling with configurable thread count.
-
Avoids duplicate pages using URL normalization and compact string matching.
-
Saves crawling progress to resume on failure.
-
Gathers approximately 6000 HTML pages for indexing.
-
Extracts and stores terms from HTML documents, distinguishing between:
-
Title
-
Headings
-
Body
-
-
Indexed data is stored persistently in MongoDB for efficient access.
-
Supports incremental updates for new crawled pages.
-
Designed for fast retrieval of matching documents and field-based term weighting.
-
Indexing time: ~10 minutes for 6000 documents.
Combines two scoring mechanisms for robust ranking:
-
BM25F (Fielded BM25) – Relevance-based scoring
-
Field Weighting: Assigns different importance (weights) to each field
-
Title:
2.0 -
Heading:
1.5 -
Body:
1.0
-
-
Term Frequency Normalization: Adjusts term frequency (TF) per field to account for varying field lengths, preventing bias toward longer fields.
-
Field Length Normalization: Uses field-specific length normalization parameters to adjust for differences in field verbosity.
-
-
PageRank – Popularity-based scoring
-
Computes global importance of pages based on link structure.
-
Used to boost commonly cited or authoritative sources.
-
The final score is a hybrid of relevance and popularity, improving both precision and trustworthiness of results.
-
Supports single-word, multi-word, and phrase searches (with quotation marks).
-
Applies stemming for better query matching (e.g., “travel” matches “traveler”).
-
Displays:
-
Page title
-
URL
-
Snippet with query terms in bold
-
-
Efficient performance:
- Search time: ~0.5 seconds per query.
-
Includes:
-
Pagination
-
Popular query suggestions (autocomplete)
-
Boolean search:
AND,OR,NOToperators (max 2 per query)
-
-
Clone the repository
git clone https://github.com/AmiraKhalid04/Navi.git cd Navi -
Run the Backend
cd backend/navi ./mvnw spring-boot:run -
Run the Frontend
cd frontend npm install npm start -
Start Crawling & Indexing
-
Run the crawler to fetch pages.
-
Run the indexer to store parsed terms and metadata.
-
-
Search
-
Access the web interface at
http://localhost:3000 -
Enter search queries and explore ranked results.
-
| Amira Khalid | Esraa Hassan | Alyaa Ali | Hagar Abdelsalam |
