SOCBot Technical Documentation

AI-Powered IOC Classification System for Security Operations

1. Technology Stack

1.1 Frontend Stack

Location: Frontend/ directory (index.html, script.js, style.css)

Technologies:

Vanilla JavaScript with Fetch API for HTTP requests
HTML5 for semantic markup
CSS3 for styling and responsive design
No build tools required - runs directly in browser without Node.js/Vite at runtime

Design & Features:

WhatsApp-style chat interface with left-aligned user messages and right-aligned bot messages
Top banner: Displays "Welcome to SOCbot. Your AI agent for identification of Malicious Changes to Secure your Infra."
Scrollable message container for conversation history
Right-side static instruction panel showing IOC selection menu (1=Domain, 2=URL, 3=IP, 4=RegKey)
Bottom input bar with text field and send button
Session persistence via session_id maintained across conversation lifecycle

Client-Side Validation Rules:

IOC Selection: Accepts comma/semicolon-separated numeric values (1,2,3,4)
Domain validation: Must match example.com format (no scheme or path)
URL validation: Must parse as http(s)://... with valid scheme
IP validation: IPv4 dotted-quad format with octets 0-255
Registry key validation: Must start with HKEY_LOCAL_MACHINE\, HKEY_CURRENT_USER\, HKEY_CLASSES_ROOT\, HKEY_USERS\, or HKEY_CURRENT_CONFIG\
Y/N phases: Accepts only Y/Yes or N/No (case-insensitive)

1.2 Flask Backend

Location: app.py

Technologies:

Flask 2.x - Python web framework
Flask-CORS - Cross-Origin Resource Sharing support
Python 3.x - Backend runtime

Purpose & Responsibilities:

Static file server: Serves frontend assets from Frontend/ directory
Session management: Maintains in-memory conversation state per session_id
Conversation orchestration: Implements finite state machine (FSM) for multi-phase dialogue
Model delegation: Calls model_tester.py for ML predictions
Result aggregation: Computes per-entry, per-IOC cumulative, and total-sample verdicts

API Endpoints:

GET / - Serves index.html
GET /<path> - Serves static assets (CSS, JS, images)
POST /api/send_message - Main chat endpoint accepting user messages and returning bot responses

Backend-Frontend Connection:

Frontend sends JSON payload:

{ "message": "1,2", "session_id": "uuid-string" }

Backend responds with:

{ "bot_message": "Select IOC Types...", "session_id": "uuid-string", "state_phase": "choose_types", "ready": false, "final_report": null }

1.3 Machine Learning Pipeline

Model Storage: models/ or Models/ directory containing .joblib serialized pipelines

Supported IOC Types:

domain.joblib - Domain name classifier (Naive Bayes)
url.joblib - URL classifier (Logistic Regression)
ip.joblib - IP address classifier (Naive Bayes)
regkey.joblib - Windows Registry key classifier (Naive Bayes)

1.3.1 Evolution from Original Design

Original Approach (Rejected - 79% Accuracy):

Single unified dataset with columns: sha256, ioc_type, ioc_value, label
One-hot encoding for IOC type combined with TF-IDF features
Single multi-class model attempting to classify all IOC types simultaneously
Critical Failures:
- 25% false negative rate (malicious samples flagged as benign)
- 15% false positive rate (benign samples flagged as malicious)
- No contextual understanding of IOC role in attack lifecycle

Why the Original Approach Failed: Treating all IOCs uniformly ignored the fundamental differences in their operational context within the Cyber Kill Chain (CKC). A SOC analyst consultation revealed that each IOC type serves a distinct purpose and requires tailored sensitivity thresholds.

1.3.2 Current SOC-Aligned Design Philosophy

Design Principle: Each IOC type is trained independently based on its role in the Cyber Kill Chain, with algorithm and threshold tuning matching operational requirements.

SOC Head-Defined Classification Strategy:

IOC Type	CKC Phase	Operational Role	Detection Strategy	Algorithm Choice
Domain	Reconnaissance / C2 Communication	Linked to IPs; determines where system connects	Conservative - Avoid false positives that block legitimate workflows	Naive Bayes (balanced priors)
IP	C2 Communication	Direct network connections; tied to domains	Conservative - Same as domains; must not disrupt critical services	Naive Bayes (balanced priors)
URL	Delivery / Exploitation	Phishing vectors, payload delivery mechanisms	Balanced - Equal weight to false positives/negatives; primary attack vector	Logistic Regression (class-balanced)
RegKey	Installation / Actions on Objectives	Offline persistence changes; high-impact modifications	Aggressive - Can trigger system shutdown/forensic isolation; zero-tolerance for FNs	Naive Bayes (strict threshold)

Rationale from SOC Operations:

Domains/IPs: Conservative filtering prevents blocking legitimate infrastructure that may share IP space or use CDNs. False positives here disrupt business operations.
URLs: Balanced approach because URLs are the primary delivery mechanism for phishing and drive-by downloads. Requires equal sensitivity to both error types.
Registry Keys: Aggressive detection because unauthorized registry modifications indicate persistence mechanisms or system compromise. These are forensic indicators requiring immediate response, even at the cost of false positives.

1.3.3 Revised Dataset Architecture

Per-IOC Training Sets:

domain.csv, ip.csv, url.csv, regkey.csv
Columns: sha256 (sample identifier), value (IOC string), label (0=Benign, 1=Malicious)
Benefits:
- Independent feature spaces optimized per IOC type
- Algorithm selection tailored to detection requirements
- Eliminates cross-contamination from unrelated IOC types

1.3.4 Pipeline Architecture by IOC Type

Domain Model (Naive Bayes - Conservative):

TF-IDF: Character-level n-grams (3-5), max 6000 features
Custom Features: Length, digits, special chars, entropy, suspicious keywords (exe, cmd, powershell, run, c2, dll, temp, appdata)
Classifier: MultinomialNB (alpha=0.3)
Output Format: (tfidf, feature_extractor, classifier) tuple

IP Model (Naive Bayes - Conservative):

TF-IDF: Character-level n-grams (1-3), max 6000 features
Custom Features: Same 5-feature set as Domain
Classifier: MultinomialNB (alpha=0.3)
Output Format: (tfidf, feature_extractor, classifier) tuple

URL Model (Logistic Regression - Balanced):

TF-IDF Only: Character-level n-grams (3-6), max 6000 features
Classifier: LogisticRegression (C=3.0, class_weight='balanced', solver='liblinear', max_iter=2000)
Output Format: Scikit-learn Pipeline object
Note: No custom features; relies purely on character patterns due to URL structural complexity

RegKey Model (Naive Bayes - Aggressive):

TF-IDF: Character-level n-grams (3-6), max 6000 features
Custom Features (RegKey-Specific): Length, backslash count (path depth), digits, entropy, suspicious keywords (run, startup, powershell, cmd, exe, dll)
Classifier: MultinomialNB (alpha=0.25) + MinMaxScaler for numeric features
Output Format: Scikit-learn Pipeline with FeatureUnion

1.3.5 Feature Engineering Details

Shannon Entropy Calculation: H(X) = -Σ p(x) * log₂(p(x)) Measures randomness; higher entropy indicates obfuscation or encoding (common in malicious IOCs).

Suspicious Keyword Detection:

Domain/IP: Flags command execution indicators (exe, cmd, powershell, dll, temp, appdata)
RegKey: Flags persistence locations (run, startup) and execution paths (powershell, cmd, exe, dll)

RegKey Path Depth:

Counts backslashes (\) to measure registry hierarchy depth
Deeper paths often correlate with hidden persistence mechanisms

1.3.6 Model Performance Metrics

Model	Accuracy	Precision (Mal)	Recall (Mal)	F1-Score (Mal)	Training Strategy
Domain	88.10%	100.00%	37.50%	54.55%	High precision, low recall (conservative)
IP	85.42%	100.00%	30.00%	46.15%	High precision, low recall (conservative)
URL	82.14%	80.00%	72.73%	76.19%	Balanced precision/recall
RegKey	83.17%	81.03%	88.68%	84.68%	High recall, acceptable precision (aggressive)

Interpretation:

Domain/IP: Perfect precision (zero false positives) at the cost of recall - aligns with conservative operational requirement
URL: Balanced metrics - appropriate for primary attack vector
RegKey: High recall prioritizes catching all malicious changes - aligns with aggressive forensic requirement

1.3.7 Training Configuration Summary

Common Hyperparameters:

Test split: 20% (stratified)
Random state: 42 (reproducibility)
TF-IDF: min_df=2, max_df=0.9 (filter rare/common terms)

Algorithm-Specific Settings:

Naive Bayes: Laplace smoothing (alpha=0.25-0.3) to handle unseen n-grams
Logistic Regression: L2 regularization (C=3.0), balanced class weights to handle dataset imbalance

2. Code Explanation

2.1 app.py

Purpose: Central Flask application managing conversation flow and serving frontend.

Key Components:

Global Configuration: IOC_MAP = {"1": "domain", "2": "url", "3": "ip", "4": "regkey"} THRESHOLDS = [(0, 15, "Benign"), (15, 30, "Possibly Benign"), (30, 65, "Possibly Malicious"), (65, 101, "Malicious")] sessions: Dict[str, Dict] = {} # In-memory session store

Session State Structure: { "phase": str, # Current FSM phase "selected": List[str], # Selected IOC types "current_ioc_index": int, # Index for iterating IOC collection "entries": Dict, # Collected IOC entries per type "per_ioc_same": Dict, # Per-IOC sample relationship (Y/N) "global_same": bool, # Global sample relationship "final_report": Dict # Classification results }

Finite State Machine Phases:

choose_types - User selects IOC types (1,2,3,4)
collect_entries - User submits individual IOC strings, types DONE to proceed
per_ioc_sample - Bot asks sample relationship for each IOC type
global_sample - Bot asks if all IOCs belong to one sample (only if all per-IOC = YES)
final - Results computed and displayed

Key Functions:

available_model_types() - Scans models/ directory for available .joblib files
verdict_from_percent(p) - Maps numeric score to threshold-based verdict
ensure_session(sid) - Creates or retrieves session state
next_prompt(state) - Generates context-appropriate bot message
state_machine(state, user_text) - Processes user input and advances FSM
finalize(state) - Delegates to classify_entries(), computes cumulative scores, generates final report

Scoring Logic in finalize():

Per-entry scoring via model_tester

per_entry = classify_entries(state["entries"])

Per-IOC cumulative (only if per_ioc_same[ioc] == True)

per_ioc_cum[ioc] = avg(scores) if per_ioc_same[ioc] else None

Total sample average (only if global_same and all IOC cumulatives exist)

total_sample = avg(per_ioc_cum.values()) if conditions_met else None

2.2 history.py

Purpose: Persistent storage for session records using JSON Lines format.

Storage Format:

File: history.jsonl in project root
Format: Newline-delimited JSON (one JSON object per line)
Max Records: 100 (auto-pruning via prune_history())

Key Functions:

append_session(record: Dict)

Appends new session record to history.jsonl
Triggers automatic pruning if record count exceeds MAX_RECORDS

load_history() -> List[Dict]

Reads all records from file
Returns list of dictionaries
Skips malformed JSON lines gracefully

prune_history()

Keeps only the most recent 100 records
Overwrites file with trimmed dataset

Use Case: Enables future analytics, session replay, and model retraining from production data. Currently imported but not actively called in app.py (ready for integration).

2.3 model_tester.py

Purpose: Model loader and inference engine for IOC classification.

Architecture:

Model Caching: models_cache: Dict[str, Optional[object]] = {} Lazy-loads models on first use, caches in memory to avoid repeated disk I/O.

Custom Transformers:

IOCFeatureExtractor

Computes 5 numeric features: length, digits, special chars, entropy, suspicious keywords
Shannon entropy calculation: Σ -p(x) * log₂(p(x))
Registered in sys.modules['__main__'] for joblib unpickling compatibility

RegKeyNumericFeatures

Specialized for Windows registry paths
Detects registry-specific suspicious patterns (e.g., \software\, \currentversion\)

Key Functions:

get_model(ioc_type: str)

Loads model from models/{ioc_type}.joblib
Returns None if model file missing or loading fails
Memoizes result in models_cache

classify_entries(entries: Dict[str, List[str]])

Input: {"url": ["http://evil.com", ...], "ip": ["192.0.2.1", ...]}
Process:
1. For each IOC type, load corresponding model via get_model()
2. Transform IOC strings using pipeline (TF-IDF + features)
3. Extract probability from predict_proba() (malicious class index)
4. Scale to 0-100% range
Output: { "url": [ {"value": "http://evil.com", "score": 87.3, "verdict": "Malicious"}, ... ] }

Model Compatibility: Supports both:

Scikit-learn Pipeline objects (with predict_proba()) - used by URL and RegKey models
Tuple format: (tfidf, feature_extractor, classifier) - used by Domain and IP models

Inference Logic: if isinstance(model, (list, tuple)) and len(model) >= 3: # Tuple format (Domain/IP) tfidf, fe, clf = model[0], model[1], model[2] X = hstack([tfidf.transform([val]), fe.transform([val])]) proba = clf.predict_proba(X)[0] score = float(proba[1] * 100) # Malicious class probability elif hasattr(model, "predict_proba"): # Pipeline format (URL/RegKey) proba = model.predict_proba([val])[0] score = float(proba[1] * 100)

Verdict Mapping: def verdict_from_score(score: Optional[float]) -> str: if score is None: return "Unavailable" return "Malicious" if score >= 50 else "Benign"

Integration with app.py: Called during finalize() phase: per_entry = classify_entries(state["entries"])

Returns per-entry scores used for cumulative calculations

Error Handling:

Missing models return score=None, verdict="Unavailable"
Gracefully handles prediction failures with try-except blocks
Continues processing remaining IOCs if one model fails

2.4 model_trainer.py (ModelTrainer.ipynb)

Purpose: Training script for all four IOC classification models using SOC-aligned design principles.

Location: Implemented as Jupyter Notebook (ModelTrainer.ipynb) with four independent training cells.

2.4.1 Domain Model Training

Function: train_domain_nb(csv_path='domain.csv')

Implementation Steps:

Data Loading: df = pd.read_csv(csv_path)[['value', 'label']].dropna() X = df['value'].astype(str) y = df['label']
Train-Test Split:
- 80/20 split with stratification to preserve class balance
- Random state: 42
Feature Extraction:
- TF-IDF Vectorizer:
  - Analyzer: char (character-level)
  - N-gram range: (3, 5)
  - Min document frequency: 2 (filters rare patterns)
  - Max document frequency: 0.9 (filters common patterns)
  - Max features: 6000
- IOCFeatureExtractor: Computes 5 numeric features
  - Length, digit count, special character count, Shannon entropy, suspicious keyword flag
Model Training:
- Algorithm: MultinomialNB (alpha=0.3 for Laplace smoothing)
- Feature matrix: hstack([tfidf_features, numeric_features])
Evaluation:
- Accuracy: 88.10%
- Confusion Matrix: [[34, 0], [5, 3]] (34 TN, 0 FP, 5 FN, 3 TP)
- Precision (Malicious): 100% (zero false positives)
- Recall (Malicious): 37.5% (conservative detection)

Model Persistence: joblib.dump((tfidf, fe, nb), "domain.joblib")

- Saved as tuple: `(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)`

Design Rationale: Conservative approach prioritizes precision over recall to avoid blocking legitimate domains used in business workflows.

2.4.2 IP Model Training

Function: train_ip_nb_model(csv_path='ip.csv')

Implementation Steps:

Data Loading: Same as Domain model
Feature Extraction:
- TF-IDF Vectorizer:
  - N-gram range: (1, 3) (shorter than domains due to IP structure)
  - Other params identical to Domain
- IOCFeatureExtractor: Same 5-feature set
Model Training:
- Algorithm: MultinomialNB (alpha=0.3)
- Feature stacking: hstack([tfidf, numeric])
Evaluation:
- Accuracy: 85.42%
- Confusion Matrix: [[38, 0], [7, 3]]
- Precision (Malicious): 100%
- Recall (Malicious): 30% (highly conservative)
Model Persistence: joblib.dump((tfidf, fe, nb), "ip.joblib")

Design Rationale: Conservative like Domain - IPs are tied to infrastructure connections; false positives disrupt critical services.

2.4.3 RegKey Model Training

Function: train_regkey_nb(csv_path='regkey.csv')

Implementation Steps:

Data Loading: Same pattern as other models
Custom Feature Extractor:
class RegKeyNumericFeatures(BaseEstimator, TransformerMixin): def transform(self, X): return [ len(v), # Total path length v.count('\'), # Path depth (backslashes) sum(c.isdigit() for c in v), # Numeric characters self._entropy(v), # Shannon entropy int(any(k in v.lower() for k in [ 'run', 'startup', 'powershell', 'cmd', 'exe', 'dll' ])) # Persistence/execution flags ]
Feature Pipeline:
- FeatureUnion combining:
  - TF-IDF (char n-grams 3-6)
  - Numeric features (with MinMaxScaler)
- N-gram range: (3, 6) to capture registry path patterns
Model Training:
- Algorithm: MultinomialNB (alpha=0.25) - lower smoothing for aggressive detection
- Full Pipeline: Pipeline([('features', FeatureUnion), ('nb', MultinomialNB)])
Evaluation:
- Accuracy: 83.17%
- Confusion Matrix: [[37, 11], [6, 47]]
- Precision (Malicious): 81.03%
- Recall (Malicious): 88.68% (aggressive detection)
Model Persistence:

joblib.dump(model, "regkey.joblib") -Saved as full Pipeline object

Design Rationale: High recall prioritizes catching all malicious registry changes - acceptable false positive rate for offline forensic analysis where system isolation is standard procedure.

2.4.4 URL Model Training

Function: train_url_lr(csv_path='url.csv')

Implementation Steps:

Data Loading: Identical to other models
Feature Extraction:
- TF-IDF Only (no custom numeric features)
  - N-gram range: (3, 6)
  - Character-level analysis captures URL structure (protocols, paths, parameters)
Model Training:
- Algorithm: Logistic Regression (different from others)
  - Max iterations: 2000
  - Class weight: 'balanced' (handles imbalanced dataset)
  - Regularization: C=3.0 (moderate L2 penalty)
  - Solver: 'liblinear' (efficient for small datasets)
Evaluation:
- Accuracy: 82.14%
- Confusion Matrix: [[15, 2], [3, 8]]
- Precision (Malicious): 80%
- Recall (Malicious): 72.73%
- Balanced metrics appropriate for phishing/payload delivery detection
Model Persistence:
```
joblib.dump(lr_pipeline, "url.joblib")
```
- Saved as full Pipeline: Pipeline([('tfidf', TfidfVectorizer), ('lr', LogisticRegression)])

Design Rationale: Logistic Regression chosen for balanced precision/recall. URLs are primary attack vectors (phishing, drive-by downloads) requiring equal sensitivity to both false positives and false negatives.

2.4.5 Common Components

IOCFeatureExtractor Class:

class IOCFeatureExtractor(BaseEstimator, TransformerMixin): def fit(self, X, y=None): return self

def transform(self, X):
    feats = []
    for v in X:
        v = str(v)
        feats.append([
            len(v),                          # String length
            sum(c.isdigit() for c in v),     # Digit count
            sum(not c.isalnum() for c in v), # Special char count
            self._entropy(v),                # Shannon entropy
            int(any(k in v.lower() for k in [
                "exe", "cmd", "powershell", "run", "c2", 
                "dll", "temp", "appdata"
            ]))  # Suspicious keyword flag
        ])
    return csr_matrix(np.array(feats), dtype=float)

def _entropy(self, s):
    if len(s) == 0: return 0.0
    probs = [s.count(c) / len(s) for c in set(s)]
    return -sum(p * math.log2(p) for p in probs if p > 0)

Key Design Decisions:

Sparse matrix output (csr_matrix) for memory efficiency when stacking with TF-IDF
Suspicious keywords tuned per IOC type operational context
Entropy captures randomness often found in obfuscated/encoded IOCs

2.4.6 Training Workflow Summary

For each IOC type:

Load CSV (sha256, value, label)
Split 80/20 (stratified)
Extract features (TF-IDF + Numeric)
Train classifier (NB or LR)
Evaluate on test set
Print confusion matrix, classification report
Serialize to .joblib

Output Files:

domain.joblib → (TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)
ip.joblib → (TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)
url.joblib → Pipeline([tfidf, LogisticRegression])
regkey.joblib → Pipeline([FeatureUnion, MultinomialNB])

Integration with model_tester.py: The serialized models are loaded by model_tester.py which handles:

Lazy loading and caching
Inference on new IOC entries
Probability extraction via predict_proba()
Verdict mapping (0-100% score)

3. Prediction Logic and Decision Rules

SOCBot implements a hierarchical evaluation strategy that adapts based on user intent regarding sample relationships.

3.1 Single IOC Type Selected

Workflow:

Sample Relationship Question:
- Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"
User Answers YES:
- Classify each entry individually
- Compute cumulative maliciousness percentage = avg(all_entry_scores)
- Apply threshold mapping to cumulative score
- Output:
  - Per-entry predictions with individual verdicts
  - Cumulative percentage and verdict for the IOC type
User Answers NO:
- Classify each entry individually
- Output: Per-entry predictions only
- No cumulative scoring

3.2 Multiple IOC Types Selected

Multi-stage decision tree ensuring valid cross-IOC aggregation.

Step 1: Per-IOC Sample Relationship

For each selected IOC type independently:

Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"

Per-IOC Outcomes:

YES: Individual classification + cumulative % computed → stored for potential global averaging
NO: Individual classification only → cumulative scoring disabled for this IOC → global averaging becomes impossible

Critical Rule: If ANY IOC receives "NO", the system must not ask the global question.

Step 2: Global Sample Relationship (Conditional)

Trigger Condition: All per-IOC answers were "YES"

Bot: "Do all IOC types belong to one sample? (Y/N)"

Global Outcomes:

YES:
- Compute Total Sample Average = avg(all_ioc_cumulative_scores)
- Map to final verdict using classification thresholds
- Output: Per-entry + per-IOC cumulative + total sample verdict
NO:
- Output: Per-entry + per-IOC cumulative only
- No total sample average

3.3 Decision Logic Summary Table

Scenario	Per-IOC Answers	Global Question Asked?	Output Includes
Single IOC + Same Sample = YES	N/A (only 1 IOC)	No	Per-entry + cumulative
Single IOC + Same Sample = NO	N/A	No	Per-entry only
Multiple IOCs + Any "NO"	At least 1 NO	No	Per-entry + per-IOC cumulative (where YES), no total
Multiple IOCs + All "YES"	All YES	Yes	Depends on global answer ↓
↳ Global = YES	All YES	Yes (answered YES)	Per-entry + per-IOC cumulative + total sample
↳ Global = NO	All YES	Yes (answered NO)	Per-entry + per-IOC cumulative, no total sample

4. Classification Thresholds

All percentage-based scores (per-entry, per-IOC cumulative, total sample) use the following uniform mapping:

Percentage Range	Verdict
0% – 15%	Benign
15% – 30%	Possibly Benign
30% – 65%	Possibly Malicious
65% – 100%	Malicious

Application:

Individual IOC entry scores
IOC-level cumulative averages
Total sample-level final verdict

5. Test Cases

Section 1: Model Availability Tests

Objective: Verify all four IOC models load successfully

Results:

✅ Domain model (1) - Working
✅ URL model (2) - Working
✅ IP model (3) - Working
✅ RegKey model (4) - Working

Section 2: Single IOC Type Tests

Test Case 2.1: URL + Same Sample = YES

Steps:

Select IOC type: 2 (URL)
Submit URL entries, type DONE
Q: "Do all entries for IOC URL belong to the same sample?" → Answer: YES

Expected Output:

Per-entry predictions with individual scores and verdicts
Cumulative percentage for URL type
Cumulative verdict based on threshold mapping

Status: ✅ Passed

Test Case 2.2: URL + Same Sample = NO

Steps:

Select IOC type: 2 (URL)
Submit URL entries, type DONE
Q: "Do all entries for IOC URL belong to the same sample?" → Answer: NO

Expected Output:

Per-entry predictions with individual scores and verdicts
No cumulative scoring

Status: ✅ Passed

Section 3: Multiple IOC Type Tests

Test Case 3.1: URL + RegKey, All YES, Global YES

Steps:

Select IOC types: 2,4 (URL, RegKey)
Submit URL entries → DONE
Submit RegKey entries → DONE
Q: "Do all entries for IOC URL belong to the same sample?" → YES
Q: "Do all entries for IOC RegKey belong to the same sample?" → YES
Q: "Do all IOC types belong to one sample?" → YES

Expected Output:

Per-entry predictions for URL and RegKey
Per-IOC cumulative % for URL and RegKey
Total Sample Average = (URL_cumulative + RegKey_cumulative) / 2
Total Sample Verdict based on average

Status: ✅ Passed

Test Case 3.2: URL + RegKey, All YES, Global NO

Steps:

Select: 2,4
Submit entries for both
Q: URL same sample? → YES
Q: RegKey same sample? → YES
Q: Global sample? → NO

Expected Output:

Per-entry predictions for URL and RegKey
Per-IOC cumulative % for URL and RegKey
No Total Sample Average

Status: ✅ Passed

Test Case 3.3: URL + RegKey, Mixed (URL=YES, RegKey=NO)

Steps:

Select: 2,4
Submit entries
Q: URL same sample? → YES
Q: RegKey same sample? → NO

Expected Output:

URL: Per-entry + cumulative %
RegKey: Per-entry only (no cumulative)
Global question not asked
No Total Sample Average

Status: ✅ Passed

Test Case 3.4: URL + RegKey, Both NO

Steps:

Select: 2,4
Submit entries
Q: URL same sample? → NO
Q: RegKey same sample? → NO

Expected Output:

URL: Per-entry only
RegKey: Per-entry only
Global question not asked
No cumulative scoring at any level

Status: ✅ Passed

6. Web Application Architecture

6.1 System Overview

┌─────────────────┐
│   Frontend      │  (HTML/CSS/JS)
│   Static Files  │
└────────┬────────┘
         │ HTTP
         ↓
┌─────────────────┐
│  Flask Backend  │  (app.py)
│  Session Store  │
└────────┬────────┘
         │ Function Call
         ↓
┌─────────────────┐
│  Model Tester   │  (model_tester.py)
│  ML Inference   │
└────────┬────────┘
         │ Joblib Load
         ↓
┌─────────────────┐
│  Trained Models │  (models/*.joblib)
│  Pipelines      │
└─────────────────┘

6.2 Request Flow

User types message in frontend input field
JavaScript sends POST to /api/send_message with {message, session_id}
Flask state_machine() processes input, updates session state
If finalization triggered:
- Calls classify_entries() from model_tester.py
- Loads models, runs inference
- Computes aggregated scores
Flask returns JSON response with bot message and results
Frontend renders bot message in chat UI

6.3 Session Lifecycle

User connects → UUID generated → Session state initialized
      ↓
Choose IOC types → Validate selection → Store in state
      ↓
Collect entries → Loop per IOC type → Store in state
      ↓
Per-IOC questions → Store Y/N answers → Conditional progression
      ↓
Global question (if eligible) → Store Y/N → Finalize
      ↓
Classification → Display results → Session persists (in-memory)

Appendix: File Structure

SOCBot/
├── .venv/                      # Python virtual environment (library root)
├── Frontend/                   # Static web assets
│   ├── index.html              # Chat UI structure
│   ├── script.js               # Client-side logic
│   └── style.css               # WhatsApp-style theme
├── models/                     # Trained model artifacts
│   ├── domain.csv              # Training data - Domain IOCs
│   ├── domain.joblib           # Trained Domain classifier
│   ├── ip.csv                  # Training data - IP IOCs
│   ├── ip.joblib               # Trained IP classifier
│   ├── ModelTrainer.ipynb      # Jupyter notebook for model training
│   ├── modeltrainer.py         # Python script version of training code
│   ├── regkey.csv              # Training data - Registry Key IOCs
│   ├── regkey.joblib           # Trained RegKey classifier
│   ├── url.csv                 # Training data - URL IOCs
│   └── url.joblib              # Trained URL classifier
├── app.py                      # Flask backend + FSM logic
├── history.py                  # Session logging utilities
├── model_tester.py             # ML inference engine
├── requirements.txt            # Python dependencies
└── history.jsonl               # Session records (auto-generated)

Document Version: 1.0
Last Updated: December 21, 2025
Prepared By: SOCBot Development Team

FilesExpand file tree

socbot_tech_doc.md

Latest commit

History

socbot_tech_doc.md

File metadata and controls

SOCBot Technical Documentation

1. Technology Stack

1.1 Frontend Stack

1.2 Flask Backend

Frontend sends JSON payload:

Backend responds with:

1.3 Machine Learning Pipeline

1.3.1 Evolution from Original Design

1.3.2 Current SOC-Aligned Design Philosophy

1.3.3 Revised Dataset Architecture

1.3.4 Pipeline Architecture by IOC Type

1.3.5 Feature Engineering Details

1.3.6 Model Performance Metrics

1.3.7 Training Configuration Summary

2. Code Explanation

2.1 app.py

Per-entry scoring via model_tester

Per-IOC cumulative (only if per_ioc_same[ioc] == True)

Total sample average (only if global_same and all IOC cumulatives exist)

2.2 history.py

2.3 model_tester.py

Returns per-entry scores used for cumulative calculations

2.4 model_trainer.py (ModelTrainer.ipynb)

2.4.1 Domain Model Training

2.4.2 IP Model Training

2.4.3 RegKey Model Training

2.4.4 URL Model Training

2.4.5 Common Components

2.4.6 Training Workflow Summary

3. Prediction Logic and Decision Rules

3.1 Single IOC Type Selected

3.2 Multiple IOC Types Selected

3.3 Decision Logic Summary Table

4. Classification Thresholds

5. Test Cases

Section 1: Model Availability Tests

Section 2: Single IOC Type Tests

Section 3: Multiple IOC Type Tests

6. Web Application Architecture

6.1 System Overview

6.2 Request Flow

6.3 Session Lifecycle

Appendix: File Structure