A web-based document search engine for TXT and PDF files using BM25 ranking, with lemmatization, synonym expansion, and visualizations.
Upload TXT or PDF documents. Tokenize and preprocess text using NLTK. Lemmatization handles plurals and variations. BM25 ranking computes relevance of documents for a query. Display top results with document previews and BM25 scores. Top 20 words bar chart: most frequent words across all documents.
Synonym expansion: query terms are expanded using NLTK WordNet Word cloud: visual representation of frequently occurring words.