Log Classification System

An intelligent log classification system that uses hybrid AI approaches (BERT + LLM) to categorize log messages into actionable categories for better operational insights and automated SOC (Security Operations Center) workflows.

🎯 Overview

Traditional log monitoring approaches rely on basic keyword matching and log levels, missing critical operational patterns. This system provides intelligent classification into meaningful categories like Security Alerts, Resource Usage, and Workflow Errors, enabling proactive incident response and system monitoring.

Key Features

Hybrid AI Classification: Combines BERT and LLM models for optimal performance and cost
Actionable Categories: Security Alert, Resource Usage, Workflow Error classifications
Real-time Processing: Fast processing with confidence scoring and severity assessment
PostgreSQL Integration: Persistent storage with analytics and trend analysis
Web Interface: Streamlit-based dashboard with analytics and monitoring
JIRA & Slack Integration: Automated incident creation and notifications
Performance Analytics: Real-time metrics and system monitoring

🏗️ Architecture

The system uses a hybrid classification pipeline:

Log Input → Regex Filter → BERT Classification → LLM Fallback → Database Storage
                     ↓
              Analytics Dashboard ← PostgreSQL ← Confidence & Severity Scoring

Classification Categories

Security Alert: Multiple login failures, abnormal system behavior, security breaches
Resource Usage: Memory/CPU exceeded, resource exhaustion, performance issues
Workflow Error: Escalation failures, task assignment errors, process breakdowns

🚀 Quick Start

Prerequisites

Python 3.8+
PostgreSQL 12+
GROQ API key for LLM classification

Installation

Clone the repository

git clone https://github.com/sohamvsonar/Intelligent-SOC-Log-Classification-System.git
cd Log-Classification-System

Set up PostgreSQL database

# Create database
sudo -u postgres psql
CREATE DATABASE log_classification;
CREATE USER log_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE log_classification TO log_user;
\q

Configure environment

cp .env.example .env
# Edit .env with your database credentials and GROQ API key

Install dependencies
```
cd src
pip install -r requirements.txt
```
Initialize database
```
python init_database.py
```
Launch application
```
streamlit run app.py
```

Access the application at http://localhost:8501

📊 Usage

Web Interface

The Streamlit interface provides multiple pages:

Log Classification: Upload CSV files for batch processing
Analytics Dashboard: Real-time classification trends and metrics
Log History: Browse and filter historical log data
Single Log Test: Test individual log messages
System Status: Database health and performance monitoring

🔧 Configuration

Environment Variables

# Database Configuration
DATABASE_URL=postgresql://log_user:password@localhost:5432/log_classification

# API Keys
GROQ_API_KEY=your_groq_api_key_here

# Optional: JIRA Integration
JIRA_SERVER=https://your-domain.atlassian.net
[email protected]
JIRA_API_TOKEN=your_api_token

# Optional: Slack Integration  
SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_CHANNEL=#security-alerts

📈 Performance Benchmarks

BERT Classification: ~100ms per log
LLM Classification: ~2s per log
Database Storage: ~50ms per log entry
Batch Processing: 1000+ logs/minute
Classification Accuracy: >85% confidence average

``

Screenshots

📁 Project Structure

Log-Classification-System/
├── src/
│   ├── app.py                    # Streamlit web interface
│   ├── processors/               # Classification processors
│   │   ├── enhanced_processor.py
│   │   └── high_performance_processor.py
│   ├── database/                 # Database services
│   │   ├── connection.py
│   │   ├── models.py
│   │   └── service.py
│   ├── integrations/             # External integrations
│   │   ├── jira/
│   │   └── slack/
│   └── requirements.txt
├── docs/                         # Documentation
├── resources/                    # Sample data and datasets
├── models/                       # Trained models
└── training/                     # Training notebooks and data

🛡️ Security Considerations

Database credentials stored in environment variables
API keys properly secured and not committed to repository
Input validation for all log processing endpoints
Sanitized database queries to prevent SQL injection
Rate limiting for API endpoints

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

🆘 Support

For issues and questions:

Check the troubleshooting guide
Review existing documentation in /docs
Create an issue with detailed error information

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
assets		assets
docs		docs
models		models
resources		resources
src		src
training		training
.env.example		.env.example
GRAFANA_SETUP_GUIDE.md		GRAFANA_SETUP_GUIDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Log Classification System

🎯 Overview

Key Features

🏗️ Architecture

Classification Categories

🚀 Quick Start

Prerequisites

Installation

📊 Usage

Web Interface

🔧 Configuration

Environment Variables

📈 Performance Benchmarks

Screenshots

📁 Project Structure

🛡️ Security Considerations

🤝 Contributing

🆘 Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sohamvsonar/Intelligent-SOC-Log-Classification-System

Folders and files

Latest commit

History

Repository files navigation

Log Classification System

🎯 Overview

Key Features

🏗️ Architecture

Classification Categories

🚀 Quick Start

Prerequisites

Installation

📊 Usage

Web Interface

🔧 Configuration

Environment Variables

📈 Performance Benchmarks

Screenshots

📁 Project Structure

🛡️ Security Considerations

🤝 Contributing

🆘 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages