|
| 1 | +# 🕸️ wxWidgets Search Engine with Web Crawler |
| 2 | + |
1 | 3 | <p align="center"> |
2 | | - <h1>🕸️ wxWidgets Search Engine with Web Crawler</h1> |
3 | | - <p> |
4 | | - <a href="https://github.com/your-username/your-repo"><img alt="build" src="https://img.shields.io/badge/build-passing-brightgreen" /></a> |
5 | | - <a href="https://github.com/your-username/your-repo/actions"><img alt="ci" src="https://img.shields.io/github/actions/workflow/status/your-username/your-repo/ci.yml?branch=main&label=CI" /></a> |
6 | | - <a href="https://github.com/your-username/your-repo/releases"><img alt="release" src="https://img.shields.io/github/v/release/your-username/your-repo" /></a> |
7 | | - <a href="LICENSE"><img alt="license" src="https://img.shields.io/badge/license-MIT-blue" /></a> |
8 | | - </p> |
| 4 | + <a href="https://github.com/your-username/your-repo"><img alt="build" src="https://img.shields.io/badge/build-passing-brightgreen" /></a> |
| 5 | + <a href="https://github.com/your-username/your-repo/actions"><img alt="ci" src="https://img.shields.io/github/actions/workflow/status/your-username/your-repo/ci.yml?branch=main&label=CI" /></a> |
| 6 | + <a href="https://img.shields.io/badge/license-MIT-blue"><img alt="license" src="https://img.shields.io/badge/license-MIT-blue" /></a> |
| 7 | + <img alt="language" src="https://img.shields.io/badge/language-C%2B%2B-blue" /> |
9 | 8 | </p> |
10 | 9 |
|
11 | | -A lightweight C++ desktop application using wxWidgets and libcurl for downloading web pages and searching them locally. Ideal as a mini offline search engine and GUI demo for recruiters. |
12 | | - |
13 | | ---- |
14 | | - |
15 | | -## Table of Contents |
16 | | - |
17 | | -- Features |
18 | | -- Preview |
19 | | -- Tech Stack |
20 | | -- Quick Start |
21 | | -- Build & Run |
22 | | -- How It Works |
23 | | -- Code Overview |
24 | | -- Contributing |
25 | | -- Author & License |
26 | | -- Keywords |
| 10 | +Lightweight desktop search engine built in C++ using wxWidgets for GUI and libcurl for fetching pages. Stores fetched pages as plain-text files and provides fast offline full-text search with context snippets — ideal as a recruiter-facing demo or a small offline search utility. |
27 | 11 |
|
28 | 12 | --- |
29 | 13 |
|
30 | | -## Features |
| 14 | +## 🚀 Features |
31 | 15 |
|
32 | | -- Download web pages via libcurl and save as plain-text `.txt` files (hashed filenames). |
33 | | -- Offline full-text search across downloaded files with filename, line number, and context snippet. |
34 | | -- Simple, responsive GUI built with wxWidgets. |
35 | | -- Self-contained: no external DB — data saved under app data/downloads. |
36 | | -- Search options: case sensitivity and whole-word matching. |
37 | | -- Keyboard shortcuts and context actions (Open, Copy snippet, Reveal in Explorer). |
| 16 | +- Fetch HTTP(S) pages using libcurl and save as plain-text `.txt` files (SHA-256 hashed filenames). |
| 17 | +- Offline full-text search across saved files with filename, line number, and snippet preview. |
| 18 | +- Search options: case-sensitive toggle and whole-word matching. |
| 19 | +- Simple 3-pane GUI (downloads, fetcher, search/results + preview) built with wxWidgets. |
| 20 | +- No external DB — data stored under app data/downloads. |
| 21 | +- Context actions: Open file, Copy snippet, Reveal in Explorer. |
| 22 | +- Extensible: designed for adding multithreaded crawling, inverted index, or SQLite metadata. |
38 | 23 |
|
39 | 24 | --- |
40 | 25 |
|
41 | | -## Preview |
| 26 | +## 🛠️ Installation |
42 | 27 |
|
43 | | -<p align="center"> |
44 | | - <img src="assets/ui_fetcher.png" alt="Fetcher panel" width="520" /> |
45 | | - <br /> |
46 | | - <img src="assets/ui_search.png" alt="Search & Results panel" width="520" /> |
47 | | -</p> |
| 28 | +Prerequisites |
| 29 | +- C++17 toolchain (g++, clang, or MSVC) |
| 30 | +- wxWidgets (development headers) |
| 31 | +- libcurl (development headers) |
| 32 | +- CMake (optional but recommended) |
48 | 33 |
|
49 | | ---- |
| 34 | +Linux (Ubuntu/Debian example) |
| 35 | +```bash |
| 36 | +sudo apt update |
| 37 | +sudo apt install build-essential cmake libwxgtk3.0-gtk3-dev libcurl4-openssl-dev |
| 38 | +``` |
50 | 39 |
|
51 | | -## Tech Stack |
| 40 | +macOS (Homebrew) |
| 41 | +```bash |
| 42 | +brew install wxwidgets curl cmake |
| 43 | +``` |
52 | 44 |
|
53 | | -| Component | Technology | |
54 | | -| -------------- | ---------- | |
55 | | -| GUI | wxWidgets | |
56 | | -| HTTP Fetching | libcurl | |
57 | | -| File Handling | C++17 <filesystem> | |
58 | | -| Language | C++ (STL) | |
59 | | -| Build | g++, clang, or MSVC | |
| 45 | +Windows (MSYS2 / MinGW recommended) |
| 46 | +- Install MSYS2, then: |
| 47 | +```bash |
| 48 | +pacman -S mingw-w64-x86_64-toolchain mingw-w64-x86_64-wxWidgets mingw-w64-x86_64-curl |
| 49 | +``` |
60 | 50 |
|
61 | | ---- |
| 51 | +Build (CMake recommended) |
| 52 | +```bash |
| 53 | +git clone https://github.com/your-username/your-repo.git |
| 54 | +cd your-repo |
| 55 | +mkdir build && cd build |
| 56 | +cmake .. |
| 57 | +cmake --build . --config Release |
| 58 | +# Resulting binary: search_engine (or search_engine.exe) |
| 59 | +``` |
62 | 60 |
|
63 | | -## Quick Start (Developer-friendly) |
| 61 | +Quick single-file build (example) |
| 62 | +```bash |
| 63 | +# Linux/macOS using wx-config |
| 64 | +g++ main.cpp `wx-config --cxxflags --libs` -lcurl -std=c++17 -o search_engine |
| 65 | +``` |
64 | 66 |
|
65 | | -1. Clone the repo |
66 | | - git clone https://github.com/your-username/your-repo.git |
67 | | -2. Install dependencies: wxWidgets, libcurl, C++17 toolchain |
68 | | -3. Build and run (examples below) |
| 67 | +Output location |
| 68 | +- Saved pages: <app_data>/downloads/page_<sha256>.txt |
| 69 | +Each file contains the original URL, fetch timestamp, and plain-text content. |
69 | 70 |
|
70 | 71 | --- |
71 | 72 |
|
72 | | -## Build & Run |
| 73 | +## 💡 Usage |
73 | 74 |
|
74 | | -Linux / macOS (example) |
| 75 | +Start the app (GUI) |
75 | 76 | ```bash |
76 | | -sudo apt install libwxgtk3.0-gtk3-dev libcurl4-openssl-dev |
77 | | -g++ main.cpp `wx-config --cxxflags --libs` -lcurl -std=c++17 -o search_engine |
78 | 77 | ./search_engine |
79 | 78 | ``` |
80 | 79 |
|
81 | | -Windows (MinGW example) |
| 80 | +Typical workflow |
| 81 | +1. Enter a URL in the fetcher pane and click Fetch. The app downloads and saves the page. |
| 82 | +2. Switch to Search pane, type a keyword or phrase. |
| 83 | +3. Toggle case-sensitivity / whole-word options if needed. |
| 84 | +4. Click a result to view the file at the matching line. Use context menu to Open, Copy snippet, or Reveal in Explorer. |
| 85 | + |
| 86 | +Command-line (headless fetch helper, if included) |
82 | 87 | ```bash |
83 | | -g++ main.cpp -std=c++17 -IC:\wxWidgets\include -LC:\wxWidgets\lib -lwxmsw31u_core -lwxbase31u -lcurl -o search_engine.exe |
84 | | -search_engine.exe |
| 88 | +# Example helper to fetch a URL and save as text (if provided) |
| 89 | +./search_engine --fetch "https://example.com" |
85 | 90 | ``` |
86 | 91 |
|
87 | | -Output files: <app_data>/downloads/<sha256>.txt — each file stores original URL, fetch timestamp, and plain-text content. |
| 92 | +Tips |
| 93 | +- Files are incremental and immutable: duplicates detected via SHA-256. |
| 94 | +- Use the downloads pane to remove or re-fetch pages. |
88 | 95 |
|
89 | 96 | --- |
90 | 97 |
|
91 | | -## How It Works |
| 98 | +## 📸 Screenshots / Demo |
| 99 | + |
| 100 | +Include these assets in the `assets/` folder of the repo: |
| 101 | +- assets/ui_fetcher.png — fetcher panel |
| 102 | +- assets/ui_search.png — search & results |
| 103 | +- assets/demo.gif — short GIF showing fetch → search → preview |
92 | 104 |
|
93 | | -1. Enter URL → app downloads HTML via libcurl. |
94 | | -2. Save Page → written as `page_<sha256>.txt` in downloads folder. |
95 | | -3. Search → type a keyword/phrase; app scans `.txt` files and lists matches with filename, line number, and snippet. |
96 | | -4. Preview → click result to open file at matching line in preview pane. |
| 105 | +Example markdown to show images: |
| 106 | +<p align="center"> |
| 107 | + <img src="assets/ui_fetcher.png" alt="Fetcher panel" width="520" /> |
| 108 | + <br/> |
| 109 | + <img src="assets/ui_search.png" alt="Search & Results" width="520" /> |
| 110 | + <br/> |
| 111 | + <img src="assets/demo.gif" alt="Demo" width="600" /> |
| 112 | +</p> |
97 | 113 |
|
98 | 114 | --- |
99 | 115 |
|
100 | | -## Code Overview |
| 116 | +## 👨💻 Contributing |
101 | 117 |
|
102 | | -- OnCrawl() |
103 | | - - Performs HTTP GET with libcurl, normalizes HTML to text, computes SHA-256 filename, and saves the file. |
104 | | -- OnSearch() |
105 | | - - Walks downloads directory using std::filesystem, streams files line-by-line, applies search options, and populates results list. |
106 | | -- UI |
107 | | - - Three-pane pattern: left (downloads), top-right (fetcher), bottom-right (search & results + preview). |
| 118 | +Guidelines |
| 119 | +- Fork the repo, create a feature branch: git checkout -b feat/short-desc |
| 120 | +- Keep PRs focused and include a concise description of changes. |
| 121 | +- Add tests for new logic where applicable (see Tests section). |
| 122 | +- Follow existing code style and comment non-obvious logic. |
| 123 | +- For UI changes, include updated screenshots or GIFs. |
108 | 124 |
|
109 | | -Suggestions for improvements: |
110 | | -- Multi-threaded crawling |
111 | | -- Inverted index for fast queries |
112 | | -- Highlighted matches in preview |
113 | | -- Recursive link crawling |
114 | | -- Optional SQLite integration for metadata |
| 125 | +Suggested improvements |
| 126 | +- Add inverted index for fast queries |
| 127 | +- Parallel fetcher with polite rate-limiting |
| 128 | +- Highlight matched snippets in preview |
| 129 | +- Optional SQLite store for metadata and query stats |
115 | 130 |
|
116 | 131 | --- |
117 | 132 |
|
118 | | -## Contributing |
119 | | - |
120 | | -- Fork → create feature branch → open PR with concise description and tests (if applicable). |
121 | | -- Keep commits focused and include build instructions for any new dependency. |
| 133 | +## 🧪 Tests |
122 | 134 |
|
123 | | ---- |
| 135 | +If tests are included, run them from the build directory: |
| 136 | +```bash |
| 137 | +# Example for CTest |
| 138 | +ctest --output-on-failure |
| 139 | +``` |
124 | 140 |
|
125 | | -## Author |
| 141 | +Recommended test areas |
| 142 | +- HTML → plain-text normalization |
| 143 | +- SHA-256 filename generation (duplicate detection) |
| 144 | +- Line-by-line search behavior: case/whole-word correctness |
| 145 | +- File I/O: read/write integrity across platforms |
126 | 146 |
|
127 | | -Rahul Singh — C++ developer exploring GUI and search engine design. |
128 | | -(Replace author info with your contact/portfolio link for recruiter-friendly profile.) |
| 147 | +Consider adding a small unit-test harness using Catch2 or GoogleTest for parser and search logic. |
129 | 148 |
|
130 | 149 | --- |
131 | 150 |
|
132 | | -## License |
| 151 | +## 📄 License |
133 | 152 |
|
134 | | -MIT License — see LICENSE file. |
| 153 | +MIT License — see LICENSE file for full text. |
135 | 154 |
|
136 | 155 | --- |
137 | 156 |
|
138 | | -## Keywords |
139 | | - |
140 | | -C++, wxWidgets, libcurl, filesystem, search-engine, web-crawler, offline-search, GUI |
| 157 | +If you want, I can produce a ready-to-copy CMakeLists.txt, a sample demo GIF script, or a concise CONTRIBUTING.md next. |
0 commit comments