Mini Search Engine

A simple, scalable, and colorful command-line search engine for text stories using an inverted index with lemmatization and stopword removal. Designed for easy extensibility and local deployment.

Features

Inverted Index: Fast word and multi-word search using lemmatization.
Stopword Removal: Ignores common English stopwords for smarter search.
Scalable: Add new .txt files to the documents/ folder and rerun to update the index.
Colorful Terminal Output: Results and prompts are color-coded for clarity.
Unique Results: Only unique document titles are shown, with a preview of the first line (up to 50 characters).

Usage

Install dependencies (in your virtual environment):
```
pip install -r requirements.txt
```
Run the search engine:
```
python src/main.py
```
Add your stories:
- Place your .txt files in the documents/ folder. Each file is a separate document.
Search:
- Enter a word or phrase at the prompt. The engine will show up to 2 unique matching documents, with the title and a preview.
- Type exit to quit.

Project Structure

Mini-Search-Engine/
├── documents/           # Your .txt story files go here
├── nltk_data/           # NLTK resources (auto-managed)
├── src/
│   ├── main.py          # Entry point
│   ├── document_manager.py
│   ├── inverted_index.py
│   └── utils/
│       └── terminal_utils.py
├── .gitignore
├── requirements.txt
└── README.md

Customization

Add more stories: Just drop new .txt files in documents/ and rerun.
Change preview length: Edit the preview length in src/main.py (default: 50 characters).
Change number of results: Edit the return value in DocumentManager.search() (default: 2).

Notes

All NLTK data is stored locally in nltk_data/ (see .gitignore).
Only English stopwords are used for filtering queries.
Document titles are shown in uppercase, with underscores and .txt removed.
Only the first line of each document is shown, up to 50 characters, with ... if longer.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini Search Engine

Features

Usage

Project Structure

Customization

Notes

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Mini Search Engine

Features

Usage

Project Structure

Customization

Notes

License