Skip to content

Commit b0c4ed0

Browse files
committed
Merge branch 'master' of github.com:aaronlifton/fastcrawl
2 parents b2a6cb3 + 5ed8ca1 commit b0c4ed0

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ Current fastest speed, with default controls of `max-depth` 4, `max-links-per-pa
99
`duration-secs` 4 (it crawls for 4 seconds, but any enqued link is still awaited, so it ran for 26.61s) is **75.12
1010
pages/sec**.
1111

12+
- fastcrawl runs a sharded, multi-threaded crawler (one OS thread per shard) with independent frontiers, bounded cross‑shard routing, and striped normalization output for fast downstream ingestion.
13+
- Uses tokio for async I/O and lol_html for low‑latency, streaming HTML parsing.
14+
- Includes an end‑to‑end RAG pipeline: normalize → embed (OpenAI or Qdrant inference) → load into Postgres/pgvector → retrieve via HTTP → answer with an OpenAI chat model.
15+
- Hybrid retrieval fuses pgvector similarity with Postgres full‑text search using Reciprocal Rank Fusion (RRF) for higher‑precision context.
16+
1217
## Metrics
1318

1419
When running

0 commit comments

Comments
 (0)