Skip to content

Commit 3bbc750

Browse files
committed
Updated README.md
1 parent 617d08c commit 3bbc750

File tree

1 file changed

+107
-90
lines changed

1 file changed

+107
-90
lines changed

README.md

Lines changed: 107 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,140 +1,157 @@
1+
# 🕸️ wxWidgets Search Engine with Web Crawler
2+
13
<p align="center">
2-
<h1>🕸️ wxWidgets Search Engine with Web Crawler</h1>
3-
<p>
4-
<a href="https://github.com/your-username/your-repo"><img alt="build" src="https://img.shields.io/badge/build-passing-brightgreen" /></a>
5-
<a href="https://github.com/your-username/your-repo/actions"><img alt="ci" src="https://img.shields.io/github/actions/workflow/status/your-username/your-repo/ci.yml?branch=main&label=CI" /></a>
6-
<a href="https://github.com/your-username/your-repo/releases"><img alt="release" src="https://img.shields.io/github/v/release/your-username/your-repo" /></a>
7-
<a href="LICENSE"><img alt="license" src="https://img.shields.io/badge/license-MIT-blue" /></a>
8-
</p>
4+
<a href="https://github.com/your-username/your-repo"><img alt="build" src="https://img.shields.io/badge/build-passing-brightgreen" /></a>
5+
<a href="https://github.com/your-username/your-repo/actions"><img alt="ci" src="https://img.shields.io/github/actions/workflow/status/your-username/your-repo/ci.yml?branch=main&label=CI" /></a>
6+
<a href="https://img.shields.io/badge/license-MIT-blue"><img alt="license" src="https://img.shields.io/badge/license-MIT-blue" /></a>
7+
<img alt="language" src="https://img.shields.io/badge/language-C%2B%2B-blue" />
98
</p>
109

11-
A lightweight C++ desktop application using wxWidgets and libcurl for downloading web pages and searching them locally. Ideal as a mini offline search engine and GUI demo for recruiters.
12-
13-
---
14-
15-
## Table of Contents
16-
17-
- Features
18-
- Preview
19-
- Tech Stack
20-
- Quick Start
21-
- Build & Run
22-
- How It Works
23-
- Code Overview
24-
- Contributing
25-
- Author & License
26-
- Keywords
10+
Lightweight desktop search engine built in C++ using wxWidgets for GUI and libcurl for fetching pages. Stores fetched pages as plain-text files and provides fast offline full-text search with context snippets — ideal as a recruiter-facing demo or a small offline search utility.
2711

2812
---
2913

30-
## Features
14+
## 🚀 Features
3115

32-
- Download web pages via libcurl and save as plain-text `.txt` files (hashed filenames).
33-
- Offline full-text search across downloaded files with filename, line number, and context snippet.
34-
- Simple, responsive GUI built with wxWidgets.
35-
- Self-contained: no external DB — data saved under app data/downloads.
36-
- Search options: case sensitivity and whole-word matching.
37-
- Keyboard shortcuts and context actions (Open, Copy snippet, Reveal in Explorer).
16+
- Fetch HTTP(S) pages using libcurl and save as plain-text `.txt` files (SHA-256 hashed filenames).
17+
- Offline full-text search across saved files with filename, line number, and snippet preview.
18+
- Search options: case-sensitive toggle and whole-word matching.
19+
- Simple 3-pane GUI (downloads, fetcher, search/results + preview) built with wxWidgets.
20+
- No external DB — data stored under app data/downloads.
21+
- Context actions: Open file, Copy snippet, Reveal in Explorer.
22+
- Extensible: designed for adding multithreaded crawling, inverted index, or SQLite metadata.
3823

3924
---
4025

41-
## Preview
26+
## 🛠️ Installation
4227

43-
<p align="center">
44-
<img src="assets/ui_fetcher.png" alt="Fetcher panel" width="520" />
45-
<br />
46-
<img src="assets/ui_search.png" alt="Search & Results panel" width="520" />
47-
</p>
28+
Prerequisites
29+
- C++17 toolchain (g++, clang, or MSVC)
30+
- wxWidgets (development headers)
31+
- libcurl (development headers)
32+
- CMake (optional but recommended)
4833

49-
---
34+
Linux (Ubuntu/Debian example)
35+
```bash
36+
sudo apt update
37+
sudo apt install build-essential cmake libwxgtk3.0-gtk3-dev libcurl4-openssl-dev
38+
```
5039

51-
## Tech Stack
40+
macOS (Homebrew)
41+
```bash
42+
brew install wxwidgets curl cmake
43+
```
5244

53-
| Component | Technology |
54-
| -------------- | ---------- |
55-
| GUI | wxWidgets |
56-
| HTTP Fetching | libcurl |
57-
| File Handling | C++17 <filesystem> |
58-
| Language | C++ (STL) |
59-
| Build | g++, clang, or MSVC |
45+
Windows (MSYS2 / MinGW recommended)
46+
- Install MSYS2, then:
47+
```bash
48+
pacman -S mingw-w64-x86_64-toolchain mingw-w64-x86_64-wxWidgets mingw-w64-x86_64-curl
49+
```
6050

61-
---
51+
Build (CMake recommended)
52+
```bash
53+
git clone https://github.com/your-username/your-repo.git
54+
cd your-repo
55+
mkdir build && cd build
56+
cmake ..
57+
cmake --build . --config Release
58+
# Resulting binary: search_engine (or search_engine.exe)
59+
```
6260

63-
## Quick Start (Developer-friendly)
61+
Quick single-file build (example)
62+
```bash
63+
# Linux/macOS using wx-config
64+
g++ main.cpp `wx-config --cxxflags --libs` -lcurl -std=c++17 -o search_engine
65+
```
6466

65-
1. Clone the repo
66-
git clone https://github.com/your-username/your-repo.git
67-
2. Install dependencies: wxWidgets, libcurl, C++17 toolchain
68-
3. Build and run (examples below)
67+
Output location
68+
- Saved pages: <app_data>/downloads/page_<sha256>.txt
69+
Each file contains the original URL, fetch timestamp, and plain-text content.
6970

7071
---
7172

72-
## Build & Run
73+
## 💡 Usage
7374

74-
Linux / macOS (example)
75+
Start the app (GUI)
7576
```bash
76-
sudo apt install libwxgtk3.0-gtk3-dev libcurl4-openssl-dev
77-
g++ main.cpp `wx-config --cxxflags --libs` -lcurl -std=c++17 -o search_engine
7877
./search_engine
7978
```
8079

81-
Windows (MinGW example)
80+
Typical workflow
81+
1. Enter a URL in the fetcher pane and click Fetch. The app downloads and saves the page.
82+
2. Switch to Search pane, type a keyword or phrase.
83+
3. Toggle case-sensitivity / whole-word options if needed.
84+
4. Click a result to view the file at the matching line. Use context menu to Open, Copy snippet, or Reveal in Explorer.
85+
86+
Command-line (headless fetch helper, if included)
8287
```bash
83-
g++ main.cpp -std=c++17 -IC:\wxWidgets\include -LC:\wxWidgets\lib -lwxmsw31u_core -lwxbase31u -lcurl -o search_engine.exe
84-
search_engine.exe
88+
# Example helper to fetch a URL and save as text (if provided)
89+
./search_engine --fetch "https://example.com"
8590
```
8691

87-
Output files: <app_data>/downloads/<sha256>.txt — each file stores original URL, fetch timestamp, and plain-text content.
92+
Tips
93+
- Files are incremental and immutable: duplicates detected via SHA-256.
94+
- Use the downloads pane to remove or re-fetch pages.
8895

8996
---
9097

91-
## How It Works
98+
## 📸 Screenshots / Demo
99+
100+
Include these assets in the `assets/` folder of the repo:
101+
- assets/ui_fetcher.png — fetcher panel
102+
- assets/ui_search.png — search & results
103+
- assets/demo.gif — short GIF showing fetch → search → preview
92104

93-
1. Enter URL → app downloads HTML via libcurl.
94-
2. Save Page → written as `page_<sha256>.txt` in downloads folder.
95-
3. Search → type a keyword/phrase; app scans `.txt` files and lists matches with filename, line number, and snippet.
96-
4. Preview → click result to open file at matching line in preview pane.
105+
Example markdown to show images:
106+
<p align="center">
107+
<img src="assets/ui_fetcher.png" alt="Fetcher panel" width="520" />
108+
<br/>
109+
<img src="assets/ui_search.png" alt="Search & Results" width="520" />
110+
<br/>
111+
<img src="assets/demo.gif" alt="Demo" width="600" />
112+
</p>
97113

98114
---
99115

100-
## Code Overview
116+
## 👨‍💻 Contributing
101117

102-
- OnCrawl()
103-
- Performs HTTP GET with libcurl, normalizes HTML to text, computes SHA-256 filename, and saves the file.
104-
- OnSearch()
105-
- Walks downloads directory using std::filesystem, streams files line-by-line, applies search options, and populates results list.
106-
- UI
107-
- Three-pane pattern: left (downloads), top-right (fetcher), bottom-right (search & results + preview).
118+
Guidelines
119+
- Fork the repo, create a feature branch: git checkout -b feat/short-desc
120+
- Keep PRs focused and include a concise description of changes.
121+
- Add tests for new logic where applicable (see Tests section).
122+
- Follow existing code style and comment non-obvious logic.
123+
- For UI changes, include updated screenshots or GIFs.
108124

109-
Suggestions for improvements:
110-
- Multi-threaded crawling
111-
- Inverted index for fast queries
112-
- Highlighted matches in preview
113-
- Recursive link crawling
114-
- Optional SQLite integration for metadata
125+
Suggested improvements
126+
- Add inverted index for fast queries
127+
- Parallel fetcher with polite rate-limiting
128+
- Highlight matched snippets in preview
129+
- Optional SQLite store for metadata and query stats
115130

116131
---
117132

118-
## Contributing
119-
120-
- Fork → create feature branch → open PR with concise description and tests (if applicable).
121-
- Keep commits focused and include build instructions for any new dependency.
133+
## 🧪 Tests
122134

123-
---
135+
If tests are included, run them from the build directory:
136+
```bash
137+
# Example for CTest
138+
ctest --output-on-failure
139+
```
124140

125-
## Author
141+
Recommended test areas
142+
- HTML → plain-text normalization
143+
- SHA-256 filename generation (duplicate detection)
144+
- Line-by-line search behavior: case/whole-word correctness
145+
- File I/O: read/write integrity across platforms
126146

127-
Rahul Singh — C++ developer exploring GUI and search engine design.
128-
(Replace author info with your contact/portfolio link for recruiter-friendly profile.)
147+
Consider adding a small unit-test harness using Catch2 or GoogleTest for parser and search logic.
129148

130149
---
131150

132-
## License
151+
## 📄 License
133152

134-
MIT License — see LICENSE file.
153+
MIT License — see LICENSE file for full text.
135154

136155
---
137156

138-
## Keywords
139-
140-
C++, wxWidgets, libcurl, filesystem, search-engine, web-crawler, offline-search, GUI
157+
If you want, I can produce a ready-to-copy CMakeLists.txt, a sample demo GIF script, or a concise CONTRIBUTING.md next.

0 commit comments

Comments
 (0)