A lightweight, fast, in-memory inverted-index search engine with HTTP handlers and on-disk persistence.
- Full-text indexing on arbitrary JSON documents
- Prefix & fuzzy matching prefix lists + SymSpell with Levenshtein distance calculation
- Filters on arbitrary document fields (OR within a field, AND across fields)
- HTTP API for index creation, search, single-doc upsert/delete, bulk add
- Persistence: saves documents + index metadata and rebuilds all derived indexes on load
- Docker-ready: runs as an unprivileged user, persists under a mounted volume
Latency statistics for 5 million data
| Avg | P50 | P95 | P99 | Index Time |
|---|---|---|---|---|
| 6.76 ms | 3.52 ms | 20.95 ms | 56.55 ms | 1m42s |
-
Total requests: 10000 requests with 8 parallel workers in 9 seconds for 5 million data
-
Query lengths: 1 to 4 terms e.g.,
Last Night,The Only Thing Br,Modern,Real Ghod Time -
Prefix matching: e.g.,
Modern tamatchesModern Talking -
Typo tolerance: ~10% of queries include deliberate misspellings (e.g.,
Never Was an mngelfor “angel”) -
Dataset: MusicBrainz
-
Record format (JSON lines):
{"artist":"Modern Talking","song":"Heart of an Angel","id":"c7eda459-c11c-362f-a5c9-2c108c4a27e4","album":"Universe: The 12th Album"} {"artist":"Modern Talking","song":"Who Will Be There","id":"366f83d1-bd0a-3ed8-9974-b148ae6d6dd9","album":"Universe: The 12th Album"} -
Indexed fields:
artist,song,album -
Machine: Used MacBook Air M3, 24GB Memory
-
Pagination: 10 pages per request
-
loadtest/loadtest.go file is used to perform the test
# Run locally
cd cmd/service/
go run . # Build the Docker image
docker build -t searchengine:latest .
# Run the container
docker run -d \
-p 8080:8080 \
-v search_data:/data \
-e INDEX_DATA_DIR=/data \
--name searchengine \
searchengine:latestThis engine keeps the core inverted index in memory and uses tombstones for updates/deletes.
DataMap map[string]map[uint32]int
// term -> internalDocID -> score- Each document is tokenized from the configured
IndexFields. - Each token contributes a per-document score of
100000 ÷ (total tokens in that document). - If a token appears multiple times in a doc, its score is summed.
Documents are stored and referenced by an internal numeric ID:
ExternalToInternal: external string ID → current internal IDInternalToExternal: internal ID → external IDDocDeleted[internalID] = truetombstones old versions
Update semantics:
- Updating a doc creates a new internal ID and tombstones the old internal ID.
- Searches always skip tombstoned internal IDs.
This avoids expensive deletions from posting lists at update time.
During indexing, tokens are also inserted into:
Prefix map[string][]string: a precomputed list of prefix → candidate terms (capped byMaxPrefixTerms)Keys: a set of known terms (fast exact existence checks)SymSpell: used to suggest fuzzy terms when prefix candidates are insufficient
Single-term search prefers prefix candidates; when prefix isn’t enough it falls back to SymSpell suggestions.
All endpoints return JSON and use HTTP status codes (201 Created, 200 OK, 500 on errors).
POST /create-index
Content-Type: application/json
{
"indexName": "products",
"indexFields": ["name","tags"],
"filters": ["year"],
"pageCount": 10
}Creates an empty index.
POST /add-to-index?indexName=products
Content-Type: multipart/form-data
Content-Disposition: form-data; name="file"; filename="docs.json"
Content-Type: application/json
[ { "id":"1", "name":"foo", "tags":["a","b"], "year":"2020" }, ... ]GET /search?index=products&q=laptop&page=0&filter=year:2020,category:electronicsindex: name of the indexq: search querypage: zero-based page numberfilter: comma-separatedfield:valuepairs
POST /document?indexName=products
Content-Type: application/json
{
"document": { "id":"14", "name":"New Name", "tags":["x"], "year":"2021" }
}Upserts one document (creates a new internal ID; tombstones old version if it exists).
DELETE /document?indexName=products&id=14Tombstones the current version of the document.
- Save:
POST /save-controller
Content-Type: application/json
{ "indexName":"products" }Writes engine.gob under /data/<indexName>/engine.gob (or INDEX_DATA_DIR if set).
- Load:
POST /load-controller
Content-Type: application/json
{ "indexName":"products" }Loads the snapshot and rebuilds indexes from stored documents.
GET /pingReturns:
{ "status":"ok", "duration":"5µs", "durationMs":0 }# Run tests without caching
go test -count=1 ./...
# With race detection
go test -race -count=1 ./...
# With coverage report
go test -race -count=1 ./... \
-coverpkg=./... \
-coverprofile=coverage.out
go tool cover -func=coverage.outMIT © mg52