Note
This project was developed with assistance from AI tools.
Python tool for analyzing log files with ML-enhanced error detection, clustering, and professional reporting.
pip install -r requirements.txt
python -m src.main --input test_logs/ --detector pattern --output analysis_report.html
python serve_results.py --format html# Basic analysis
python -m src.main --input logs/ --output results.json
# With similarity error clustering
python -m src.main --input logs/ --detector hybrid --enable-clustering --output analysis_report.html
## Key Components
### Detectors (`src/detectors/`)
- **`pattern.py`** - Fast regex/keyword matching (production-ready)
- **`semantic.py`** - NLP-based similarity detection
- **`hybrid.py`** - Combines pattern + semantic + ML features
- **`statistical.py`** - Anomaly detection for durations, frequencies
### Configuration (`config/patterns.yaml`)
```yaml
# Add new error patterns
ansible_patterns:
your_new_category:
- "your pattern here"
- "another.*regex.*pattern"
# Add semantic phrases
semantic_phrases:
your_category:
- "natural language error description"
# Exclude false positives
false_positives:
exclude_patterns:
- "Success.*completed" # Won't flag as errorstream.py- File processing + multiprocessingcontext.py- Context extraction around errorsclusterer.py- ML-based error grouping
# Test against your log files
python -m src.main --input your_logs/ --detector pattern --verbose
# Check what patterns matched
python serve_results.py --format cli --no-context# Create detector in src/detectors/your_detector.py
# Follow pattern.py structure with detect() method
# Register in src/main.py create_detector()
# Test it
python -m src.main --detector your_detector --input test_logs/# Pattern detector (fastest)
python -m src.main --detector pattern --input large_logs/ --parallel 8
# Semantic (slowest, most accurate)
python -m src.main --detector semantic --input small_logs/ --parallel 1
# Hybrid (balanced)
python -m src.main --detector hybrid --input logs/ --enable-clusteringsrc/
├── detectors/ # Add new detection methods here
├── processors/ # File processing logic
├── models/ # Data structures (DetectionResult, etc.)
└── main.py # CLI entry point
config/patterns.yaml # Pattern definitions - edit this frequently
serve_results.py # Results viewer - multiple output formats
requirements.txt # Dependencies
test_logs/ # Sample data for testing
- Edit
config/patterns.yaml - Add to appropriate category or create new one
- Test:
python -m src.main --input test_logs/ --detector pattern
- Adjust
pattern_weightsinpatterns.yaml - Or modify
confidence_threshold:--confidence-threshold 0.8
# Verbose output shows what patterns matched
python -m src.main --input problem_log.log --verbose --show-details
# Check clustering results
python show_clustering_results.py # After running with --enable-clustering- Add patterns to
config/patterns.yaml - Test with sample files
- Adjust context windows if needed:
--context-before 10 --context-after 20
Results include clustering info, retry grouping, and workflow analysis. Key fields:
cluster_id- Groups similar errors togetherretry_count- Grouped retry attemptsmatch_details- What patterns/phrases caused detection
Use serve_results.py for easy result browsing - it handles all the clustering visualization automatically.
- Slow semantic processing: Use
--detector patternor--parallel 1 - Memory issues: Process smaller batches, reduce parallel workers
- Missing patterns: Check
config/patterns.yaml, add verbose logging - False positives: Add exclusions to
false_positivessection