Lightweight desktop search engine built in C++ using wxWidgets for GUI and libcurl for fetching pages. Stores fetched pages as plain-text files and provides fast offline full-text search with context snippets — ideal as a recruiter-facing demo or a small offline search utility.
- Fetch HTTP(S) pages using libcurl and save as plain-text
.txtfiles (SHA-256 hashed filenames). - Offline full-text search across saved files with filename, line number, and snippet preview.
- Search options: case-sensitive toggle and whole-word matching.
- Simple 3-pane GUI (downloads, fetcher, search/results + preview) built with wxWidgets.
- No external DB — data stored under app data/downloads.
- Context actions: Open file, Copy snippet, Reveal in Explorer.
- Extensible: designed for adding multithreaded crawling, inverted index, or SQLite metadata.
Prerequisites
- C++17 toolchain (g++, clang, or MSVC)
- wxWidgets (development headers)
- libcurl (development headers)
- CMake (optional but recommended)
Linux (Ubuntu/Debian example)
sudo apt update
sudo apt install build-essential cmake libwxgtk3.0-gtk3-dev libcurl4-openssl-devmacOS (Homebrew)
brew install wxwidgets curl cmakeWindows (MSYS2 / MinGW recommended)
- Install MSYS2, then:
pacman -S mingw-w64-x86_64-toolchain mingw-w64-x86_64-wxWidgets mingw-w64-x86_64-curlBuild (CMake recommended)
git clone https://github.com/your-username/your-repo.git
cd your-repo
mkdir build && cd build
cmake ..
cmake --build . --config Release
# Resulting binary: search_engine (or search_engine.exe)Quick single-file build (example)
# Linux/macOS using wx-config
g++ main.cpp `wx-config --cxxflags --libs` -lcurl -std=c++17 -o search_engineOutput location
- Saved pages: <app_data>/downloads/page_.txt
Each file contains the original URL, fetch timestamp, and plain-text content.
Start the app (GUI)
./search_engineTypical workflow
- Enter a URL in the fetcher pane and click Fetch. The app downloads and saves the page.
- Switch to Search pane, type a keyword or phrase.
- Toggle case-sensitivity / whole-word options if needed.
- Click a result to view the file at the matching line. Use context menu to Open, Copy snippet, or Reveal in Explorer.
Command-line (headless fetch helper, if included)
# Example helper to fetch a URL and save as text (if provided)
./search_engine --fetch "https://example.com"Tips
- Files are incremental and immutable: duplicates detected via SHA-256.
- Use the downloads pane to remove or re-fetch pages.
Include these assets in the assets/ folder of the repo (or place repository screenshots under docs/ as shown):
- docs/screenshot_gui.png — full GUI screenshot (committed to docs/)
- assets/ui_fetcher.png — fetcher panel
- assets/ui_search.png — search & results
- assets/demo.gif — short GIF showing fetch → search → preview
Example markdown to show images:
Guidelines
- Fork the repo, create a feature branch: git checkout -b feat/short-desc
- Keep PRs focused and include a concise description of changes.
- Add tests for new logic where applicable (see Tests section).
- Follow existing code style and comment non-obvious logic.
- For UI changes, include updated screenshots or GIFs.
Suggested improvements
- Add inverted index for fast queries
- Parallel fetcher with polite rate-limiting
- Highlight matched snippets in preview
- Optional SQLite store for metadata and query stats
If tests are included, run them from the build directory:
# Example for CTest
ctest --output-on-failureRecommended test areas
- HTML → plain-text normalization
- SHA-256 filename generation (duplicate detection)
- Line-by-line search behavior: case/whole-word correctness
- File I/O: read/write integrity across platforms
Consider adding a small unit-test harness using Catch2 or GoogleTest for parser and search logic.
MIT License — see LICENSE file for full text.
If you want, I can produce a ready-to-copy CMakeLists.txt, a sample demo GIF script, or a concise CONTRIBUTING.md next.



