Skip to content

o8o0o8o/audit-tool

Repository files navigation

SEO Audit Tool

An automated SEO audit tool that analyzes websites for SEO compliance, performance metrics, and best practices.

Installation

yarn
npx playwright install

Usage

Basic Audit

npx tsx start.ts --origin=https://example.com

With Options

# Enable performance metrics collection (LCP, CLS, TTFB)
npx tsx start.ts --origin=https://example.com --perf

# Resume from saved progress
npx tsx start.ts --origin=https://example.com --proceed

# Save HTML snapshots of each page
npx tsx start.ts --origin=https://example.com --save-html

# Set parallel crawling (faster audits)
npx tsx start.ts --origin=https://example.com --concurrency=10

# Combine options
npx tsx start.ts --origin=https://example.com --perf --save-html --proceed --concurrency=10

Options:

  • --origin=<url> (required) - Website URL to audit
  • --perf (optional) - Enable performance metrics collection (LCP, CLS, TTFB)
  • --proceed (optional) - Resume from saved progress after interruption
  • --save-html (optional) - Save full HTML snapshots of each crawled page
  • --concurrency=<number> (optional) - Number of pages to crawl in parallel (1-20, default: 5)
  • --config=<path> (optional) - Path to custom config file
  • --init-config (optional) - Generate sample .seoauditrc config file

Notes:

  • Performance metrics collection is disabled by default for faster audits
  • HTML snapshots are not saved by default to reduce disk usage
  • Higher concurrency uses more system resources but completes audits faster

Configuration File

Create a .seoauditrc file to customize thresholds and settings:

# Generate sample config
npx tsx start.ts --init-config

Example .seoauditrc:

{
  "thresholds": {
    "TTFB": 1000,
    "LCP": 3000,
    "CLS": 0.15,
    "TITLE_MIN": 40,
    "TITLE_MAX": 70,
    "DESCRIPTION_MIN": 100,
    "DESCRIPTION_MAX": 180
  },
  "crawl": {
    "concurrency": 10,
    "timeout": 45000
  },
  "excludeUrls": [
    "/admin/*",
    "/staging/*",
    "*.pdf"
  ]
}

Export to CSV

npx tsx generateCSV.ts audit-results/domain.com/2024-01-15_10-30-00/SEOAnalysis__domain.com.json.gz

Creates CSV files in domain.com_csv_exports/:

  • pages_overview.csv - All metrics per page
  • issues_breakdown.csv - Issues by severity
  • performance_metrics.csv - Performance data
  • quick_wins.csv - Prioritized action items
  • summary.csv - High-level statistics

Compare Audits

# List available audit history
npx tsx historyManager.ts list domain.com

# Compare two audits (by index from list)
npx tsx generateComparison.ts domain.com 0 1

Creates domain.com_comparison_0_vs_1.html with side-by-side comparison.

History Management

# Save current audit to history
npx tsx historyManager.ts save audit-results/domain.com/2024-01-15_10-30-00/SEOAnalysis__domain.com.json.gz

# List all audits for a domain
npx tsx historyManager.ts list domain.com

# Compare two historical audits
npx tsx historyManager.ts compare domain.com 0 1

Programmatic Usage

import { SEOData } from './robots';

// Create audit instance
const seo = new SEOData(
  'https://example.com',  // origin
  false,                  // proceed (resume from checkpoint)
  true,                   // perf (collect performance metrics)
  false,                  // saveHtml (save HTML snapshots)
  5                       // concurrency
);

// Run audit
await seo.audit();

// Access results
console.log(seo.data);            // Full results object
console.log(seo.domainName);      // 'example.com'
console.log(seo.file);            // Path to JSON results

Output Files

All results are stored in audit-results/domain.com/YYYY-MM-DD_HH-MM-SS/:

  • SEOAnalysis__domain.com.json.gz - Audit data (gzip compressed)
  • domain.com.html - Interactive HTML report
  • *__archive_part*.json.gz - Archive files for large sites (>5000 pages)

Other files:

  • audit-results/domain.com/audit-history.json - Historical audit snapshots
  • audit-results/failedPages.json - Failed URLs list
  • audit_domain.com/ - HTML snapshots (only with --save-html)

What Gets Checked

Robots.txt

  • Existence and structure
  • Sitemap presence
  • Crawlability rules

Sitemap

  • Duplicate URLs
  • Trailing slash consistency
  • Origin consistency
  • Size limits (50,000 URLs)

Page Analysis

  • Title (presence, uniqueness, length)
  • Meta description (presence, uniqueness, length)
  • Canonical URLs
  • Alternate language links (hreflang)
  • Heading structure (h1-h6)
  • Open Graph and Twitter meta tags
  • Structured data (Schema.org)

Performance (with --perf)

  • TTFB (threshold: 800ms)
  • LCP (threshold: 2500ms)
  • CLS (threshold: 0.1)
  • Page size (threshold: 6MB)
  • Request count (threshold: 20)

Security

  • HTTPS enforcement
  • Security headers (CSP, X-Frame-Options, etc.)

Accessibility

  • WCAG 2.1 compliance checks
  • Mobile-friendly viewport

Space Optimizations

  • All JSON files are automatically gzip compressed (70-80% reduction)
  • Incoming links stored as counts, not arrays
  • Automatic archiving for sites with 5000+ pages
  • Backward compatible with legacy uncompressed files

Large Website Support

For sites with thousands of pages:

  • Automatic archiving splits processed pages into separate files
  • Progress files stay manageable (<100MB)
  • Resume with --proceed loads all archive files automatically
  • No manual intervention required

License

ISC

Author

Sergey Labut

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages