█████╗ ███╗ ██╗ ██████╗ ███╗ ███╗ █████╗ ██╗ ██╗ ██╗
██╔══██╗████╗ ██║██╔═══██╗████╗ ████║██╔══██╗██║ ╚██╗ ██╔╝
███████║██╔██╗ ██║██║ ██║██╔████╔██║███████║██║ ╚████╔╝
██╔══██║██║╚██╗██║██║ ██║██║╚██╔╝██║██╔══██║██║ ╚██╔╝
██║ ██║██║ ╚████║╚██████╔╝██║ ╚═╝ ██║██║ ██║███████╗██║
╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝
[ANOMALY-GRID v0.3.0] - SEQUENCE ANOMALY DETECTION ENGINE
A Rust library implementing variable-order Markov chains for sequence anomaly detection in finite alphabets.
To use a Python wrapper of this library implementations refer, to my other repository at: https://github.com/Abimael10/anomaly-grid-py
[dependencies]
anomaly-grid = "0.3.0"
use anomaly_grid::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create detector
let mut detector = AnomalyDetector::new(3)?;
// Train on normal patterns
let normal_sequence = vec!["A", "B", "C", "A", "B", "C"]
.iter().map(|s| s.to_string()).collect();
detector.train(&normal_sequence)?;
// Detect anomalies
let test_sequence = vec!["A", "X", "Y"]
.iter().map(|s| s.to_string()).collect();
let anomalies = detector.detect_anomalies(&test_sequence, 0.1)?;
for anomaly in anomalies {
println!("Anomaly: {:?}, Strength: {:.3}",
anomaly.sequence, anomaly.anomaly_strength);
}
Ok(())
}
- Variable-Order Markov Models: Builds contexts of length 1 to max_order from training sequences with hierarchical context selection
- Adaptive Context Selection: Uses longest available context with sufficient data, falls back to shorter contexts automatically
- Information-Theoretic Scoring: Shannon entropy and KL divergence calculations with lazy computation and caching
- Memory-Optimized Storage: String interning, trie-based context storage with prefix sharing, and SmallVec for efficient small collections
- Parallel Batch Processing: Processes multiple sequences concurrently using Rayon for improved throughput
- Comprehensive Testing: 162 tests covering unit, integration, domain, and performance validation with mathematical correctness verification
let config = AnomalyGridConfig::default()
.with_max_order(4)? // Higher order = more memory, better accuracy
.with_smoothing_alpha(0.5)? // Lower = more sensitive to training data
.with_weights(0.8, 0.2)? // Likelihood vs information weight
.with_memory_limit(100 * 1024 * 1024); // 100MB memory limit
let detector = AnomalyDetector::with_config(config)?;
- Software Development Workflows: Git command sequences, CI/CD pipeline analysis, code review patterns
- Database Query Optimization: SQL operation sequences, transaction pattern analysis, N+1 query detection
- Network Protocol Analysis: TCP/HTTP/TLS state transitions, protocol compliance verification, traffic flow analysis
- System Administration: CLI command sequences, automation pattern detection, user proficiency analysis
- Creative Pattern Analysis: Musical composition analysis, artistic workflow patterns, style classification
- Security Monitoring: Login sequences, access patterns, behavioral anomaly detection
- IoT and Sensor Networks: Device state transitions, sensor reading patterns, equipment health monitoring
- Business Process Mining: Workflow step sequences, process compliance, bottleneck identification
- User Experience Analysis: Click sequences, navigation patterns, conversion funnel analysis
- Manufacturing Quality Control: Production step sequences, assembly line monitoring, defect pattern detection
- Financial Transaction Analysis: Payment sequences, fraud pattern detection, risk assessment
- Healthcare Workflow Analysis: Treatment sequences, care pathway optimization, protocol adherence
- Natural Language Processing: Tokenize to categorical sequences (POS tags, named entities, semantic categories)
- Time Series Data: Discretize continuous values into categorical states or trend patterns
- High-Resolution Sensor Data: Aggregate into categorical states or pattern classifications
- Large Vocabularies: Apply dimensionality reduction or clustering to create manageable alphabets
- Raw Continuous Data: Unprocessed sensor readings, audio waveforms, high-frequency financial data
- Extremely Large Alphabets: >1000 unique states without preprocessing
- Real-Time Streaming: Microsecond-latency requirements (though batch processing is efficient)
- Unstructured Data: Images, videos, raw binary data without categorical interpretation
# Run all tests (162 tests)
cargo test
# Run specific test suites
cargo test unit_ # Unit tests (39 tests)
cargo test integration_ # Integration tests (24 tests)
cargo test domain_ # Domain tests (5 tests)
cargo test performance_ # Performance tests (36 tests)
# Run examples
cargo run --example quick_start
cargo run --example network_security_monitoring
cargo run --example financial_fraud_detection
- Complete Documentation - Comprehensive guides and API reference
- API Reference - Online API documentation
- Examples - Production-ready examples with validation
- Changelog - Version history and changes
[dependencies]
rayon = "1.10.0" # Parallel batch processing
smallvec = "1.13.0" # Memory-efficient small collections
Minimal dependencies for core functionality and memory optimization.
MIT License - see LICENSE file.
Performance Note: The library efficiently handles alphabets up to ~100 unique states with excellent memory usage (typically <100MB). For larger alphabets, consider preprocessing techniques like clustering, dimensionality reduction, or hierarchical categorization.