Polite, Secure, and Production-Ready Backend for Hallucination Verification.
TruthLens AI is a VIBE-CODING backend designed to power a Hallucination & Citation Verification Engine. It provides secure authentication, protected content management, and a highly polite, resilient server-side web scraper for gathering verification data.
- Runtime: Node.js 20 + Express.js
- Language: TypeScript
- Database: MongoDB (Mongoose)
- Caching: Redis
- Auth: JWT (Access in Body, Refresh in HttpOnly Cookie)
- Scraper: Custom logic with
robots.txtcompliance, caches, and rate limiting.
- Node.js v20+
- Docker & Docker Compose
- MongoDB & Redis (or use Docker)
- Clone the repo
- Install dependencies:
npm install
- Configure Environment:
cp .env.example .env # Edit .env with your secrets
# Development Mode (with hot-reload)
npm run dev
# Build & Start Production
npm run build
npm startdocker-compose up --buildWe use Jest and Supertest with mongodb-memory-server for deterministic integration testing.
npm testExpected Output:
PASS tests/auth.test.ts
PASS tests/scraper.test.ts
...
Test Suites: 2 passed, 2 total
Tests: 4 passed, 4 total
Snapshots: 0 total
Time: 3.456 s
- Token Delivery:
- Access Tokens: Short-lived (15m), returned in JSON body.
- Refresh Tokens: Long-lived (7d), returned in HttpOnly, Secure Cookie to prevent XSS theft.
- Rate Limiting:
- Global: 100 req/min
- Login: 5 req/min (Brute-force protection)
- Scraper: 10 req/min (Resource protection)
- Circuit Breaker Strategy (Proposed):
- Track consecutive failures per domain.
- Open circuit after 5 failures.
- Half-open after 30s cooldown.
- Close on success.
This scraper is built with strict adherence to web etiquette:
- Robots.txt: Always checked and respected per origin.
- Identification: Sends
User-Agent: TruthLensAI-Bot/1.0. - Rate Limiting: Includes exponential backoff for 429s.
- Consent: Intended for educational and verification use only.
Presentation link : https://drive.google.com/file/d/1t5f1ZGoNMRmVcHmWWkWWh7dutGg-_H84/view?usp=sharing
- Redis is optional but highly recommended for distributed rate limiting.
- Scraper currently fetches raw HTML; complex client-side rendered (SPA) pages may need Puppeteer/Playwright (out of scope for now).
ISC