Skip to content

oneshot2001/Nerve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NERVE

Normalized Extraction and Retrieval for Vehicle/Edge-device Knowledge

NERVE is the knowledge extraction backbone of AxisX. It ingests the full Axis Communications product portfolio — 474 camera, audio, access control, and network device datasheets — and produces a structured, queryable corpus that AI agents can query with zero hallucination on product specs.

If NERVE is wrong, AxisX is wrong. If NERVE is incomplete, AxisX hallucinates. NERVE is the moat.


What It Does

  1. Ingests raw product datasheets (PDF + scraped TXT) from the Axis corpus
  2. Normalizes text — encoding repair, column-reflow, footnote stripping, section header standardization
  3. Extracts structured product specs via regex-first parsing with Claude Haiku LLM fallback for complex fields
  4. Enriches every record with decoded Axis naming attributes (form factor, outdoor rating, vandal resistance, LED illumination, etc.)
  5. Stores results as JSON files + SQLite index for fast filtered queries
  6. Serves specs to AxisX agents via a simple CLI query interface

Corpus Scale

Metric Value
Source files 474 real PDFs + paired TXT files
Product types Cameras (P/M/Q/F/C/V), Audio (D), Access Control (A/I), Body-worn (W), Network, Accessories (T/TQ/TU), Specialty
Storage ~5 MB clean text → JSON corpus + SQLite index
Ingest time < 10 minutes for full corpus
Update cadence ~12 new products/year (monthly incremental runs)

Quick Start

# Install
pip install -e .

# Ingest full corpus
nerve ingest --source /path/to/raw-pdfs

# Query a product
nerve query --model "P3268-SLVE"

# Filter search
nerve search --chipset ARTPEC-8 --dlpu
nerve search --outdoor --vandal --form-factor dome
nerve search --type camera --series P32

# Corpus stats
nerve stats

With LLM enrichment (Claude Haiku)

export ANTHROPIC_API_KEY=sk-ant-...
nerve ingest --source /path/to/raw-pdfs

When ANTHROPIC_API_KEY is set, NERVE uses Claude Haiku for complex fields (ONVIF profiles, protocol lists, analytics apps, certifications) when regex confidence falls below threshold. Skip with --no-llm.


CLI Reference

nerve ingest --source DIR [--corpus DIR] [--limit N] [--no-llm]
nerve query  --model "P3268-SLVE" [--field chipset]
nerve search [--chipset ARTPEC-8] [--type camera] [--series P32]
             [--dlpu] [--outdoor] [--vandal] [--led] [--stainless]
             [--wireless] [--form-factor dome|ptz|panoramic|box]
             [--min-confidence 0.8]
nerve stats  [--corpus DIR]

Architecture

[Source Files: PDF + TXT]
        ↓
[Ingestion]     nerve/ingest.py    — format detection, TXT/PDF/ZIP readers
        ↓
[Normalization] nerve/normalize.py — ftfy encoding repair, reflow, section headers
        ↓
[Extraction]    nerve/extract.py   — regex + Claude Haiku hybrid
        ↓
[Naming]        axis_naming.py     — decode model string → form factor, extensions, flags
        ↓
[Storage]       nerve/storage.py   — JSON corpus + SQLite index
        ↓
[Query API]     nerve/query.py     — CorpusQuery class (used by CLI + AxisX)

Corpus Output Schema

Each product is stored as a JSON file under nerve-corpus/products/{series}/. Key fields:

{
  "nerve_id": "uuid-v4",
  "model": "AXIS P3268-SLVE",
  "model_short": "P3268-SLVE",
  "series": "P32",
  "product_type": "camera",
  "extraction_confidence": 0.94,
  "naming": {
    "product_line": "P",
    "product_line_name": "Professional",
    "form_factor": "Fixed dome",
    "resolution_name": "4K/8MP",
    "resolution_pixels": "3840×2160",
    "extensions": ["S", "L", "V", "E"],
    "extension_names": ["Stainless steel", "LED illumination", "Vandal-resistant", "Outdoor-ready"],
    "is_outdoor": true,
    "is_vandal_resistant": true,
    "has_led": true,
    "is_stainless": true,
    "is_rugged": true,
    "summary": "P-line fixed dome, 4K/8MP, Stainless steel, LED illumination, Vandal-resistant, Outdoor-ready"
  },
  "chipset": { "model": "ARTPEC-8", "dlpu": true },
  "video": { "max_resolution": "3840x2160", "codecs": ["H.265", "H.264", "AV1"], "zipstream": true },
  "power": { "poe_type": "PoE+", "max_watts": 25.5 },
  "physical": { "ip_ratings": ["IP66", "IP67"], "ik_rating": "IK10" },
  "features": { "edge_vault": true, "signed_video": true, "acap": true }
}

SQLite Index (nerve-corpus/index.db)

Fast-filter columns without loading JSON:

Column Use
chipset, artpec_gen, dlpu Chipset filtering
product_type, series Product category filtering
ip_rating, max_resolution Spec filtering
form_factor Form factor filtering
is_outdoor, is_vandal_resistant, has_led Naming flag filtering
is_stainless, is_wireless, is_explosion_protected, is_panoramic Naming flag filtering
extraction_confidence Quality filtering

axis_naming.py

Standalone canonical decoder for Axis model naming conventions. Shared between NERVE and AxisX — never duplicate this logic.

from axis_naming import decode_model, nl_to_naming_filters

decode_model("M4225-LVE")
# → {'form_factor': 'Fixed dome', 'is_outdoor': True, 'is_vandal_resistant': True,
#    'has_led': True, 'resolution_name': 'HDTV 1080/2MP', ...}

nl_to_naming_filters("outdoor vandal dome cameras")
# → {'is_outdoor': True, 'is_vandal_resistant': True, 'form_factor': 'dome'}

Stack

Component Technology
Language Python 3.12
PDF extraction pdfplumber
Encoding repair ftfy
LLM extraction Claude Haiku (claude-haiku-4-5) via Anthropic SDK
Storage JSON files + SQLite
CLI click

Project Structure

axis_naming.py          # Shared Axis model naming decoder (NERVE + AxisX)
nerve/
  ingest.py             # Source file scanning and reading
  normalize.py          # Text cleaning pipeline
  extract.py            # Regex + LLM field extraction
  storage.py            # JSON + SQLite persistence
  query.py              # CorpusQuery class
  cli.py                # CLI entry point
nerve-corpus/
  products/             # One JSON per product, organized by series prefix
  index.db              # SQLite index for fast queries
docs/
  spec.md               # Full Software Design Document

Spec

Full system design is in docs/spec.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages