An intelligent log classification system that uses hybrid AI approaches (BERT + LLM) to categorize log messages into actionable categories for better operational insights and automated SOC (Security Operations Center) workflows.
Traditional log monitoring approaches rely on basic keyword matching and log levels, missing critical operational patterns. This system provides intelligent classification into meaningful categories like Security Alerts, Resource Usage, and Workflow Errors, enabling proactive incident response and system monitoring.
- Hybrid AI Classification: Combines BERT and LLM models for optimal performance and cost
- Actionable Categories: Security Alert, Resource Usage, Workflow Error classifications
- Real-time Processing: Fast processing with confidence scoring and severity assessment
- PostgreSQL Integration: Persistent storage with analytics and trend analysis
- Web Interface: Streamlit-based dashboard with analytics and monitoring
- JIRA & Slack Integration: Automated incident creation and notifications
- Performance Analytics: Real-time metrics and system monitoring
The system uses a hybrid classification pipeline:
Log Input → Regex Filter → BERT Classification → LLM Fallback → Database Storage
↓
Analytics Dashboard ← PostgreSQL ← Confidence & Severity Scoring
- Security Alert: Multiple login failures, abnormal system behavior, security breaches
- Resource Usage: Memory/CPU exceeded, resource exhaustion, performance issues
- Workflow Error: Escalation failures, task assignment errors, process breakdowns
- Python 3.8+
- PostgreSQL 12+
- GROQ API key for LLM classification
-
Clone the repository
git clone https://github.com/sohamvsonar/Intelligent-SOC-Log-Classification-System.git cd Log-Classification-System
-
Set up PostgreSQL database
# Create database sudo -u postgres psql CREATE DATABASE log_classification; CREATE USER log_user WITH PASSWORD 'secure_password'; GRANT ALL PRIVILEGES ON DATABASE log_classification TO log_user; \q
-
Configure environment
cp .env.example .env # Edit .env with your database credentials and GROQ API key
-
Install dependencies
cd src pip install -r requirements.txt
-
Initialize database
python init_database.py
-
Launch application
streamlit run app.py
Access the application at http://localhost:8501
The Streamlit interface provides multiple pages:
- Log Classification: Upload CSV files for batch processing
- Analytics Dashboard: Real-time classification trends and metrics
- Log History: Browse and filter historical log data
- Single Log Test: Test individual log messages
- System Status: Database health and performance monitoring
# Database Configuration
DATABASE_URL=postgresql://log_user:password@localhost:5432/log_classification
# API Keys
GROQ_API_KEY=your_groq_api_key_here
# Optional: JIRA Integration
JIRA_SERVER=https://your-domain.atlassian.net
[email protected]
JIRA_API_TOKEN=your_api_token
# Optional: Slack Integration
SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_CHANNEL=#security-alerts
- BERT Classification: ~100ms per log
- LLM Classification: ~2s per log
- Database Storage: ~50ms per log entry
- Batch Processing: 1000+ logs/minute
- Classification Accuracy: >85% confidence average
``
Log-Classification-System/
├── src/
│ ├── app.py # Streamlit web interface
│ ├── processors/ # Classification processors
│ │ ├── enhanced_processor.py
│ │ └── high_performance_processor.py
│ ├── database/ # Database services
│ │ ├── connection.py
│ │ ├── models.py
│ │ └── service.py
│ ├── integrations/ # External integrations
│ │ ├── jira/
│ │ └── slack/
│ └── requirements.txt
├── docs/ # Documentation
├── resources/ # Sample data and datasets
├── models/ # Trained models
└── training/ # Training notebooks and data
- Database credentials stored in environment variables
- API keys properly secured and not committed to repository
- Input validation for all log processing endpoints
- Sanitized database queries to prevent SQL injection
- Rate limiting for API endpoints
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
For issues and questions:
- Check the troubleshooting guide
- Review existing documentation in
/docs
- Create an issue with detailed error information