All notable changes to GEO AI Core will be documented in this file.
- User-Agent truncation —
GeoAIMiddleware,GeoAIGuard, and@IsAIBot()now truncate theUser-Agentheader to 1 024 characters before processing, preventing memory abuse from maliciously large UA strings stored inCrawlEntry.userAgent.
generateTimeoutoption — newGeoAIOptionsfield (default30 000ms).GeoAIMiddlewarewrapsgenerateLlms()inPromise.race()against a configurable timeout; returns500 Internal Server Errorif theContentProviderhangs.
- [CRITICAL] XSS fix — HTML attribute escaping in
SeoGenerator— config-derived values (siteUrl,siteName, etc.) are now escaped viaescapeHtmlAttr()before interpolation into<meta>and<link>attribute values. Characters",',<,>,&are replaced with",',<,>,&respectively.generateJsonLd()is unaffected (returns a plain object).
- [HIGH] Unbounded
MemoryCacheAdaptergrowth — when all entries were within TTL,evictExpired()found nothing to remove and the store grew pastmaxEntries. Added FIFO eviction pass afterevictExpired(): oldest-inserted entries are deleted untilstore.size <= maxEntries.
- [MEDIUM] No fetch timeout —
validateRemoteand the--urlbranch ofrunInspectcalledfetchFn(url)without anAbortController. IntroducedsafeFetch(url, fetchFn, timeoutMs, maxBytes)helper that aborts after 10 s and returns{ timedOut: true }. - [MEDIUM] No response size limit —
await res.text()read the full body with no cap.safeFetchnow streams viaReadableStreamand returns{ tooLarge: true }if the body exceeds 1 MiB (1 048 576 bytes). - [LOW] Direct
process.cwd()calls in command functions —runGenerateandrunInspectcalledprocess.cwd()inline, making them hard to test. Both now acceptcwd: stringas first parameter (matchingrunInit);cli.tspassesprocess.cwd()at call sites.
A new standalone CLI package for the GEO AI ecosystem. Install once, use anywhere — no application code required.
npm install -g geo-ai-cli
# or per-project:
npm install --save-dev geo-ai-cliCommands:
geo-ai init— scaffolds ageo-ai.config.tsstarter file in the current directory with placeholder values forsiteName,siteUrl,siteDescription,crawlers, and aprovidersection. Exits safely if a config already exists.geo-ai generate— loads config (auto-discoversgeo-ai.config.ts→.js→.json), callsgeo-ai-coreto generate content, and writesllms.txt+llms-full.txtto./public(or--out <path>). Creates the output directory if missing.geo-ai validate— checks thatllms.txtandllms-full.txtare present and have valid content. Supports local files (--path <dir>) and remote URLs (--url <url>). Reportspass/warn/fail/not_foundper file with actionable recommendations. Exits1on anyfailornot_found.geo-ai inspect— previews your config: site name, URL, description, crawler rules, output directory, and resource sections with item counts. With--url, fetches and displays remotellms.txt/llms-full.txtcontent.
Flags:
--config <path>— override config file path--out <path>— override output directory (generate)--path <dir>— local directory to validate--url <url>— remote base URL for validate/inspect--help/-h— show help (global or per-command)--version/-v— show version
Error handling:
- Typed error classes with exit codes:
ConfigNotFoundError,ConfigParseError,ConfigValidationError,FsWriteError,NetworkError→ exit1;InternalError→ exit2 DEBUG=geo-aienv var prints stack traces to stderr- No raw stack traces on stdout
generateLlmsFiles(config)— static file generation forpublic/llms.txtandpublic/llms-full.txtbeforenext build- Creates output directory if missing
- Atomic writes (temp file + rename) to prevent partial files
- Configurable
outDir(default:public) andlocale - Logs progress and file sizes to stdout
- Throws with descriptive error on failure
geo-ai-generateCLI binary — run vianpx geo-ai-generatewith optional--configflag- New exports:
GenerateLlmsFilesConfig,GenerateLlmsFilesResulttypes
- 9 new tests for
generateLlmsFiles: file creation, directory creation, overwrite safety, non-empty content, isFull flags, locale passthrough, re-generation, error handling, logging
- Production 404 on
/llms.txt— middleware and route handler only served content dynamically at runtime; static hosting (Vercel, Netlify,next export) had no files to serve.generateLlmsFiles()now writes static files topublic/as a pre-build step.
- README: added Static File Generation section (recommended approach), CLI usage, troubleshooting for 404 on
/llms.txt
- Updated documentation with new domain geoai.run
- Added ecosystem overview
- Type checking now works from root — added
tsconfig.jsonwithpathsmapping for cross-package resolution,typecheckscript changed totsc --noEmit - Fixed tsup DTS build — removed
composite: truefrom package tsconfigs that conflicted with tsup's declaration generation
MemoryCrawlStore— addedmaxEntriescap (default 10 000) with automatic eviction of oldest 20% on overflow, preventing unbounded memory growthMemoryCacheAdapter— addedmaxEntriescap (default 1 000) with proactive expired entry eviction onset(), preventing stale entries from accumulating indefinitelygetPriceRange— replacedMath.min(...spread)/Math.max(...spread)with iterative loop, eliminating potential stack overflow on large variant arraysbulkGenerate— batches now process concurrently viaPromise.allSettled()instead of sequentially, makingbatchSizecontrol actual parallelismAiBulkConfig.onProgress— callback type changed fromResourcetoAiContext, removing unsafeas unknown as Resourcecast
geoAIMiddleware— addedCache-Control: public, max-age=3600header to llms.txt responses, consistent withcreateLlmsHandler- Both
GeoAIMiddlewareConfigandLlmsHandlerConfignow accept optionalcacheMaxAge(seconds, default 3600) for configurable Cache-Control
createGeoAI(config)factory function — single config object initializes all modulesContentProviderinterface andStaticContentProviderforRecord<string, Resource[]>shorthandLlmsGenerator— generates llms.txt and llms-full.txt Markdown from provider data- Helper functions:
stripHtml,trimWords,getPriceRange,getSalePrices,getAvailabilityStatus - Standard format:
- [title](url): description [keywords] - Full format: content (trimmed to 200 words), pricing, availability, variants
- Locale support with
Languagemetadata line - AI Crawler Rules section with per-bot status
- Footer with generator name and UTC timestamp
- Helper functions:
BotRulesEngine— per-bot allow/disallow rules, robots.txt block generation, bot detection by User-AgentAI_BOTSregistry with 16 supported crawlers: GPTBot, OAI-SearchBot, ClaudeBot, Google-Extended, PerplexityBot, DeepSeekBot, GrokBot, meta-externalagent, PanguBot, YandexBot, SputnikBot, Bytespider, Baiduspider, claude-web, Amazonbot, ApplebotCrawlTracker— bot visit logging with GDPR-compliant IP anonymization (SHA-256 via Web Crypto API, Edge Runtime compatible)MemoryCrawlStore— in-memory crawl store withlog(),getActivity(),cleanup()CrawlStoreinterface for pluggable storage backendsMemoryCacheAdapter— in-memory TTL cache with expiry check on readFileCacheAdapter— file-based TTL cache with JSON metadataCacheAdapterinterface for pluggable cache backendsCryptoService— AES-256-GCM encryption/decryption vianode:crypto, format:base64(IV + authTag + ciphertext)SeoGenerator— HTML meta tags, HTTP Link header, JSON-LD (WebSite/Product/Article)AiGenerator(separate entry pointgeo-ai-core/ai) — Claude and OpenAI API integration viaglobalThis.fetchRateLimiter— sliding window rate limiter (default 10 req/min)buildPrompt— template placeholder replacement ({title},{content},{type},{price},{category})classifyAiError— HTTP status to error type mapping (auth, rate_limit, server, network, unknown)AiProviderError— typed error classbulkGenerate— batch processing (default 5 per batch, max 50 items) withonProgresscallback
parseDuration— cache duration string parser ('1h','24h','7d'→ seconds)- Dual ESM/CJS build via tsup (
.mjs/.cjsextensions) - Two entry points:
.(main) and./ai(AI generator) - Full TypeScript declarations (
.d.ts) for all exports - Zero runtime dependencies
geoAIMiddleware(config)— Next.js middleware intercepting/llms.txtand/llms-full.txt- Returns
text/plainresponse for llms paths NextResponse.next()passthrough for all other paths- Optional
Linkheader injection viainjectLinkHeaderconfig - Fire-and-forget bot visit tracking
- Returns
createLlmsHandler(config)— Next.js App Router route handler- File type detection by URL path or
?type=fullquery parameter Content-Type: text/plain,Cache-Control: public, max-age=3600- Bot visit logging
- File type detection by URL path or
- Re-exports all public types, interfaces, and classes from
geo-ai-core geo-ai-coreas regular dependency (not peer),next >= 16as peerDependency
- Vitest test suite with 114 tests across 9 test files
- Property-based tests via fast-check (100+ iterations per property)
- Properties tested: stripHtml, trimWords, empty sections, llms.txt structure, locale metadata, bot rules, bot detection, IP anonymization, buildPrompt, classifyAiError
- Unit tests: BotRulesEngine, AiGenerator (mock fetch), createGeoAI, middleware, route handler
- npm workspaces monorepo (
packages/*) - Shared
tsconfig.base.json(strict, ESNext, bundler moduleResolution) - Shared
vitest.config.tswith v8 coverage - Kiro steering files: product.md, tech.md, structure.md
- Kiro skills: new-bot, new-module, new-wrapper, new-test, new-cache-adapter, new-crawl-store