Skip to content

Sequential pattern analysis through variable-order Markov chains. Built for detecting deviations in finite-alphabet sequences.

License

Notifications You must be signed in to change notification settings

Abimael10/anomaly-grid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anomaly Grid

 █████╗ ███╗   ██╗ ██████╗ ███╗   ███╗ █████╗ ██╗  ██╗   ██╗
██╔══██╗████╗  ██║██╔═══██╗████╗ ████║██╔══██╗██║  ╚██╗ ██╔╝
███████║██╔██╗ ██║██║   ██║██╔████╔██║███████║██║   ╚████╔╝ 
██╔══██║██║╚██╗██║██║   ██║██║╚██╔╝██║██╔══██║██║    ╚██╔╝  
██║  ██║██║ ╚████║╚██████╔╝██║ ╚═╝ ██║██║  ██║███████╗██║   
╚═╝  ╚═╝╚═╝  ╚═══╝ ╚═════╝ ╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝   
[ANOMALY-GRID v0.3.0] - SEQUENCE ANOMALY DETECTION ENGINE

Crates.io PyPI version Documentation License: MIT Tests

A Rust library implementing variable-order Markov chains for sequence anomaly detection in finite alphabets.

To use a Python wrapper of this library implementations refer, to my other repository at: https://github.com/Abimael10/anomaly-grid-py

Quick Start

[dependencies]
anomaly-grid = "0.3.0"
use anomaly_grid::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create detector
    let mut detector = AnomalyDetector::new(3)?;
    
    // Train on normal patterns
    let normal_sequence = vec!["A", "B", "C", "A", "B", "C"]
        .iter().map(|s| s.to_string()).collect();
    detector.train(&normal_sequence)?;
    
    // Detect anomalies
    let test_sequence = vec!["A", "X", "Y"]
        .iter().map(|s| s.to_string()).collect();
    let anomalies = detector.detect_anomalies(&test_sequence, 0.1)?;
    
    for anomaly in anomalies {
        println!("Anomaly: {:?}, Strength: {:.3}", 
                 anomaly.sequence, anomaly.anomaly_strength);
    }
    
    Ok(())
}

What This Library Does

  • Variable-Order Markov Models: Builds contexts of length 1 to max_order from training sequences with hierarchical context selection
  • Adaptive Context Selection: Uses longest available context with sufficient data, falls back to shorter contexts automatically
  • Information-Theoretic Scoring: Shannon entropy and KL divergence calculations with lazy computation and caching
  • Memory-Optimized Storage: String interning, trie-based context storage with prefix sharing, and SmallVec for efficient small collections
  • Parallel Batch Processing: Processes multiple sequences concurrently using Rayon for improved throughput
  • Comprehensive Testing: 162 tests covering unit, integration, domain, and performance validation with mathematical correctness verification

Configuration

let config = AnomalyGridConfig::default()
    .with_max_order(4)?                    // Higher order = more memory, better accuracy
    .with_smoothing_alpha(0.5)?            // Lower = more sensitive to training data
    .with_weights(0.8, 0.2)?               // Likelihood vs information weight
    .with_memory_limit(100 * 1024 * 1024); // 100MB memory limit

let detector = AnomalyDetector::with_config(config)?;

Use Cases

✅ Excellent Fit

  • Software Development Workflows: Git command sequences, CI/CD pipeline analysis, code review patterns
  • Database Query Optimization: SQL operation sequences, transaction pattern analysis, N+1 query detection
  • Network Protocol Analysis: TCP/HTTP/TLS state transitions, protocol compliance verification, traffic flow analysis
  • System Administration: CLI command sequences, automation pattern detection, user proficiency analysis
  • Creative Pattern Analysis: Musical composition analysis, artistic workflow patterns, style classification
  • Security Monitoring: Login sequences, access patterns, behavioral anomaly detection
  • IoT and Sensor Networks: Device state transitions, sensor reading patterns, equipment health monitoring

✅ Good Fit

  • Business Process Mining: Workflow step sequences, process compliance, bottleneck identification
  • User Experience Analysis: Click sequences, navigation patterns, conversion funnel analysis
  • Manufacturing Quality Control: Production step sequences, assembly line monitoring, defect pattern detection
  • Financial Transaction Analysis: Payment sequences, fraud pattern detection, risk assessment
  • Healthcare Workflow Analysis: Treatment sequences, care pathway optimization, protocol adherence

⚠️ Requires Preprocessing

  • Natural Language Processing: Tokenize to categorical sequences (POS tags, named entities, semantic categories)
  • Time Series Data: Discretize continuous values into categorical states or trend patterns
  • High-Resolution Sensor Data: Aggregate into categorical states or pattern classifications
  • Large Vocabularies: Apply dimensionality reduction or clustering to create manageable alphabets

❌ Poor Fit

  • Raw Continuous Data: Unprocessed sensor readings, audio waveforms, high-frequency financial data
  • Extremely Large Alphabets: >1000 unique states without preprocessing
  • Real-Time Streaming: Microsecond-latency requirements (though batch processing is efficient)
  • Unstructured Data: Images, videos, raw binary data without categorical interpretation

Testing

# Run all tests (162 tests)
cargo test

# Run specific test suites
cargo test unit_           # Unit tests (39 tests)
cargo test integration_    # Integration tests (24 tests)  
cargo test domain_         # Domain tests (5 tests)
cargo test performance_    # Performance tests (36 tests)

# Run examples
cargo run --example quick_start
cargo run --example network_security_monitoring
cargo run --example financial_fraud_detection

Documentation

Dependencies

[dependencies]
rayon = "1.10.0"    # Parallel batch processing
smallvec = "1.13.0" # Memory-efficient small collections

Minimal dependencies for core functionality and memory optimization.

License

MIT License - see LICENSE file.


Performance Note: The library efficiently handles alphabets up to ~100 unique states with excellent memory usage (typically <100MB). For larger alphabets, consider preprocessing techniques like clustering, dimensionality reduction, or hierarchical categorization.