Skip to content

Latest commit

 

History

History
899 lines (668 loc) · 31.5 KB

File metadata and controls

899 lines (668 loc) · 31.5 KB

SOCBot Technical Documentation

AI-Powered IOC Classification System for Security Operations


1. Technology Stack

1.1 Frontend Stack

Location: Frontend/ directory (index.html, script.js, style.css)

Technologies:

  • Vanilla JavaScript with Fetch API for HTTP requests
  • HTML5 for semantic markup
  • CSS3 for styling and responsive design
  • No build tools required - runs directly in browser without Node.js/Vite at runtime

Design & Features:

  • WhatsApp-style chat interface with left-aligned user messages and right-aligned bot messages
  • Top banner: Displays "Welcome to SOCbot. Your AI agent for identification of Malicious Changes to Secure your Infra."
  • Scrollable message container for conversation history
  • Right-side static instruction panel showing IOC selection menu (1=Domain, 2=URL, 3=IP, 4=RegKey)
  • Bottom input bar with text field and send button
  • Session persistence via session_id maintained across conversation lifecycle

Client-Side Validation Rules:

  • IOC Selection: Accepts comma/semicolon-separated numeric values (1,2,3,4)
  • Domain validation: Must match example.com format (no scheme or path)
  • URL validation: Must parse as http(s)://... with valid scheme
  • IP validation: IPv4 dotted-quad format with octets 0-255
  • Registry key validation: Must start with HKEY_LOCAL_MACHINE\, HKEY_CURRENT_USER\, HKEY_CLASSES_ROOT\, HKEY_USERS\, or HKEY_CURRENT_CONFIG\
  • Y/N phases: Accepts only Y/Yes or N/No (case-insensitive)

1.2 Flask Backend

Location: app.py

Technologies:

  • Flask 2.x - Python web framework
  • Flask-CORS - Cross-Origin Resource Sharing support
  • Python 3.x - Backend runtime

Purpose & Responsibilities:

  1. Static file server: Serves frontend assets from Frontend/ directory
  2. Session management: Maintains in-memory conversation state per session_id
  3. Conversation orchestration: Implements finite state machine (FSM) for multi-phase dialogue
  4. Model delegation: Calls model_tester.py for ML predictions
  5. Result aggregation: Computes per-entry, per-IOC cumulative, and total-sample verdicts

API Endpoints:

  • GET / - Serves index.html
  • GET /<path> - Serves static assets (CSS, JS, images)
  • POST /api/send_message - Main chat endpoint accepting user messages and returning bot responses

Backend-Frontend Connection:

Frontend sends JSON payload:

{ "message": "1,2", "session_id": "uuid-string" }

Backend responds with:

{ "bot_message": "Select IOC Types...", "session_id": "uuid-string", "state_phase": "choose_types", "ready": false, "final_report": null }

1.3 Machine Learning Pipeline

Model Storage: models/ or Models/ directory containing .joblib serialized pipelines

Supported IOC Types:

  • domain.joblib - Domain name classifier (Naive Bayes)
  • url.joblib - URL classifier (Logistic Regression)
  • ip.joblib - IP address classifier (Naive Bayes)
  • regkey.joblib - Windows Registry key classifier (Naive Bayes)

1.3.1 Evolution from Original Design

Original Approach (Rejected - 79% Accuracy):

  • Single unified dataset with columns: sha256, ioc_type, ioc_value, label
  • One-hot encoding for IOC type combined with TF-IDF features
  • Single multi-class model attempting to classify all IOC types simultaneously
  • Critical Failures:
    • 25% false negative rate (malicious samples flagged as benign)
    • 15% false positive rate (benign samples flagged as malicious)
    • No contextual understanding of IOC role in attack lifecycle

Why the Original Approach Failed: Treating all IOCs uniformly ignored the fundamental differences in their operational context within the Cyber Kill Chain (CKC). A SOC analyst consultation revealed that each IOC type serves a distinct purpose and requires tailored sensitivity thresholds.


1.3.2 Current SOC-Aligned Design Philosophy

Design Principle: Each IOC type is trained independently based on its role in the Cyber Kill Chain, with algorithm and threshold tuning matching operational requirements.

SOC Head-Defined Classification Strategy:

IOC Type CKC Phase Operational Role Detection Strategy Algorithm Choice
Domain Reconnaissance / C2 Communication Linked to IPs; determines where system connects Conservative - Avoid false positives that block legitimate workflows Naive Bayes (balanced priors)
IP C2 Communication Direct network connections; tied to domains Conservative - Same as domains; must not disrupt critical services Naive Bayes (balanced priors)
URL Delivery / Exploitation Phishing vectors, payload delivery mechanisms Balanced - Equal weight to false positives/negatives; primary attack vector Logistic Regression (class-balanced)
RegKey Installation / Actions on Objectives Offline persistence changes; high-impact modifications Aggressive - Can trigger system shutdown/forensic isolation; zero-tolerance for FNs Naive Bayes (strict threshold)

Rationale from SOC Operations:

  • Domains/IPs: Conservative filtering prevents blocking legitimate infrastructure that may share IP space or use CDNs. False positives here disrupt business operations.
  • URLs: Balanced approach because URLs are the primary delivery mechanism for phishing and drive-by downloads. Requires equal sensitivity to both error types.
  • Registry Keys: Aggressive detection because unauthorized registry modifications indicate persistence mechanisms or system compromise. These are forensic indicators requiring immediate response, even at the cost of false positives.

1.3.3 Revised Dataset Architecture

Per-IOC Training Sets:

  • domain.csv, ip.csv, url.csv, regkey.csv
  • Columns: sha256 (sample identifier), value (IOC string), label (0=Benign, 1=Malicious)
  • Benefits:
    • Independent feature spaces optimized per IOC type
    • Algorithm selection tailored to detection requirements
    • Eliminates cross-contamination from unrelated IOC types

1.3.4 Pipeline Architecture by IOC Type

Domain Model (Naive Bayes - Conservative):

  • TF-IDF: Character-level n-grams (3-5), max 6000 features
  • Custom Features: Length, digits, special chars, entropy, suspicious keywords (exe, cmd, powershell, run, c2, dll, temp, appdata)
  • Classifier: MultinomialNB (alpha=0.3)
  • Output Format: (tfidf, feature_extractor, classifier) tuple

IP Model (Naive Bayes - Conservative):

  • TF-IDF: Character-level n-grams (1-3), max 6000 features
  • Custom Features: Same 5-feature set as Domain
  • Classifier: MultinomialNB (alpha=0.3)
  • Output Format: (tfidf, feature_extractor, classifier) tuple

URL Model (Logistic Regression - Balanced):

  • TF-IDF Only: Character-level n-grams (3-6), max 6000 features
  • Classifier: LogisticRegression (C=3.0, class_weight='balanced', solver='liblinear', max_iter=2000)
  • Output Format: Scikit-learn Pipeline object
  • Note: No custom features; relies purely on character patterns due to URL structural complexity

RegKey Model (Naive Bayes - Aggressive):

  • TF-IDF: Character-level n-grams (3-6), max 6000 features
  • Custom Features (RegKey-Specific): Length, backslash count (path depth), digits, entropy, suspicious keywords (run, startup, powershell, cmd, exe, dll)
  • Classifier: MultinomialNB (alpha=0.25) + MinMaxScaler for numeric features
  • Output Format: Scikit-learn Pipeline with FeatureUnion

1.3.5 Feature Engineering Details

Shannon Entropy Calculation: H(X) = -Σ p(x) * log₂(p(x)) Measures randomness; higher entropy indicates obfuscation or encoding (common in malicious IOCs).

Suspicious Keyword Detection:

  • Domain/IP: Flags command execution indicators (exe, cmd, powershell, dll, temp, appdata)
  • RegKey: Flags persistence locations (run, startup) and execution paths (powershell, cmd, exe, dll)

RegKey Path Depth:

  • Counts backslashes (\) to measure registry hierarchy depth
  • Deeper paths often correlate with hidden persistence mechanisms

1.3.6 Model Performance Metrics

Model Accuracy Precision (Mal) Recall (Mal) F1-Score (Mal) Training Strategy
Domain 88.10% 100.00% 37.50% 54.55% High precision, low recall (conservative)
IP 85.42% 100.00% 30.00% 46.15% High precision, low recall (conservative)
URL 82.14% 80.00% 72.73% 76.19% Balanced precision/recall
RegKey 83.17% 81.03% 88.68% 84.68% High recall, acceptable precision (aggressive)

Interpretation:

  • Domain/IP: Perfect precision (zero false positives) at the cost of recall - aligns with conservative operational requirement
  • URL: Balanced metrics - appropriate for primary attack vector
  • RegKey: High recall prioritizes catching all malicious changes - aligns with aggressive forensic requirement

1.3.7 Training Configuration Summary

Common Hyperparameters:

  • Test split: 20% (stratified)
  • Random state: 42 (reproducibility)
  • TF-IDF: min_df=2, max_df=0.9 (filter rare/common terms)

Algorithm-Specific Settings:

  • Naive Bayes: Laplace smoothing (alpha=0.25-0.3) to handle unseen n-grams
  • Logistic Regression: L2 regularization (C=3.0), balanced class weights to handle dataset imbalance

2. Code Explanation

2.1 app.py

Purpose: Central Flask application managing conversation flow and serving frontend.

Key Components:

Global Configuration: IOC_MAP = {"1": "domain", "2": "url", "3": "ip", "4": "regkey"} THRESHOLDS = [(0, 15, "Benign"), (15, 30, "Possibly Benign"), (30, 65, "Possibly Malicious"), (65, 101, "Malicious")] sessions: Dict[str, Dict] = {} # In-memory session store

Session State Structure: { "phase": str, # Current FSM phase "selected": List[str], # Selected IOC types "current_ioc_index": int, # Index for iterating IOC collection "entries": Dict, # Collected IOC entries per type "per_ioc_same": Dict, # Per-IOC sample relationship (Y/N) "global_same": bool, # Global sample relationship "final_report": Dict # Classification results }

Finite State Machine Phases:

  1. choose_types - User selects IOC types (1,2,3,4)
  2. collect_entries - User submits individual IOC strings, types DONE to proceed
  3. per_ioc_sample - Bot asks sample relationship for each IOC type
  4. global_sample - Bot asks if all IOCs belong to one sample (only if all per-IOC = YES)
  5. final - Results computed and displayed

Key Functions:

  • available_model_types() - Scans models/ directory for available .joblib files
  • verdict_from_percent(p) - Maps numeric score to threshold-based verdict
  • ensure_session(sid) - Creates or retrieves session state
  • next_prompt(state) - Generates context-appropriate bot message
  • state_machine(state, user_text) - Processes user input and advances FSM
  • finalize(state) - Delegates to classify_entries(), computes cumulative scores, generates final report

Scoring Logic in finalize():

Per-entry scoring via model_tester

per_entry = classify_entries(state["entries"])

Per-IOC cumulative (only if per_ioc_same[ioc] == True)

per_ioc_cum[ioc] = avg(scores) if per_ioc_same[ioc] else None

Total sample average (only if global_same and all IOC cumulatives exist)

total_sample = avg(per_ioc_cum.values()) if conditions_met else None

2.2 history.py

Purpose: Persistent storage for session records using JSON Lines format.

Storage Format:

  • File: history.jsonl in project root
  • Format: Newline-delimited JSON (one JSON object per line)
  • Max Records: 100 (auto-pruning via prune_history())

Key Functions:

append_session(record: Dict)

  • Appends new session record to history.jsonl
  • Triggers automatic pruning if record count exceeds MAX_RECORDS

load_history() -> List[Dict]

  • Reads all records from file
  • Returns list of dictionaries
  • Skips malformed JSON lines gracefully

prune_history()

  • Keeps only the most recent 100 records
  • Overwrites file with trimmed dataset

Use Case: Enables future analytics, session replay, and model retraining from production data. Currently imported but not actively called in app.py (ready for integration).


2.3 model_tester.py

Purpose: Model loader and inference engine for IOC classification.

Architecture:

Model Caching: models_cache: Dict[str, Optional[object]] = {} Lazy-loads models on first use, caches in memory to avoid repeated disk I/O.

Custom Transformers:

IOCFeatureExtractor

  • Computes 5 numeric features: length, digits, special chars, entropy, suspicious keywords
  • Shannon entropy calculation: Σ -p(x) * log₂(p(x))
  • Registered in sys.modules['__main__'] for joblib unpickling compatibility

RegKeyNumericFeatures

  • Specialized for Windows registry paths
  • Detects registry-specific suspicious patterns (e.g., \software\, \currentversion\)

Key Functions:

get_model(ioc_type: str)

  • Loads model from models/{ioc_type}.joblib
  • Returns None if model file missing or loading fails
  • Memoizes result in models_cache

classify_entries(entries: Dict[str, List[str]])

  • Input: {"url": ["http://evil.com", ...], "ip": ["192.0.2.1", ...]}
  • Process:
    1. For each IOC type, load corresponding model via get_model()
    2. Transform IOC strings using pipeline (TF-IDF + features)
    3. Extract probability from predict_proba() (malicious class index)
    4. Scale to 0-100% range
  • Output: { "url": [ {"value": "http://evil.com", "score": 87.3, "verdict": "Malicious"}, ... ] }

Model Compatibility: Supports both:

  • Scikit-learn Pipeline objects (with predict_proba()) - used by URL and RegKey models
  • Tuple format: (tfidf, feature_extractor, classifier) - used by Domain and IP models

Inference Logic: if isinstance(model, (list, tuple)) and len(model) >= 3: # Tuple format (Domain/IP) tfidf, fe, clf = model[0], model[1], model[2] X = hstack([tfidf.transform([val]), fe.transform([val])]) proba = clf.predict_proba(X)[0] score = float(proba[1] * 100) # Malicious class probability elif hasattr(model, "predict_proba"): # Pipeline format (URL/RegKey) proba = model.predict_proba([val])[0] score = float(proba[1] * 100)

Verdict Mapping: def verdict_from_score(score: Optional[float]) -> str: if score is None: return "Unavailable" return "Malicious" if score >= 50 else "Benign"

Integration with app.py: Called during finalize() phase: per_entry = classify_entries(state["entries"])

Returns per-entry scores used for cumulative calculations

Error Handling:

  • Missing models return score=None, verdict="Unavailable"
  • Gracefully handles prediction failures with try-except blocks
  • Continues processing remaining IOCs if one model fails

2.4 model_trainer.py (ModelTrainer.ipynb)

Purpose: Training script for all four IOC classification models using SOC-aligned design principles.

Location: Implemented as Jupyter Notebook (ModelTrainer.ipynb) with four independent training cells.


2.4.1 Domain Model Training

Function: train_domain_nb(csv_path='domain.csv')

Implementation Steps:

  1. Data Loading: df = pd.read_csv(csv_path)[['value', 'label']].dropna() X = df['value'].astype(str) y = df['label']

  2. Train-Test Split:

    • 80/20 split with stratification to preserve class balance
    • Random state: 42
  3. Feature Extraction:

    • TF-IDF Vectorizer:
      • Analyzer: char (character-level)
      • N-gram range: (3, 5)
      • Min document frequency: 2 (filters rare patterns)
      • Max document frequency: 0.9 (filters common patterns)
      • Max features: 6000
    • IOCFeatureExtractor: Computes 5 numeric features
      • Length, digit count, special character count, Shannon entropy, suspicious keyword flag
  4. Model Training:

    • Algorithm: MultinomialNB (alpha=0.3 for Laplace smoothing)
    • Feature matrix: hstack([tfidf_features, numeric_features])
  5. Evaluation:

    • Accuracy: 88.10%
    • Confusion Matrix: [[34, 0], [5, 3]] (34 TN, 0 FP, 5 FN, 3 TP)
    • Precision (Malicious): 100% (zero false positives)
    • Recall (Malicious): 37.5% (conservative detection)
  6. Model Persistence: joblib.dump((tfidf, fe, nb), "domain.joblib")

    - Saved as tuple: `(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)`
    
    

Design Rationale: Conservative approach prioritizes precision over recall to avoid blocking legitimate domains used in business workflows.

2.4.2 IP Model Training

Function: train_ip_nb_model(csv_path='ip.csv')

Implementation Steps:

  1. Data Loading: Same as Domain model

  2. Feature Extraction:

    • TF-IDF Vectorizer:
      • N-gram range: (1, 3) (shorter than domains due to IP structure)
      • Other params identical to Domain
    • IOCFeatureExtractor: Same 5-feature set
  3. Model Training:

    • Algorithm: MultinomialNB (alpha=0.3)
    • Feature stacking: hstack([tfidf, numeric])
  4. Evaluation:

    • Accuracy: 85.42%
    • Confusion Matrix: [[38, 0], [7, 3]]
    • Precision (Malicious): 100%
    • Recall (Malicious): 30% (highly conservative)
  5. Model Persistence: joblib.dump((tfidf, fe, nb), "ip.joblib")

    
    

Design Rationale: Conservative like Domain - IPs are tied to infrastructure connections; false positives disrupt critical services.

2.4.3 RegKey Model Training

Function: train_regkey_nb(csv_path='regkey.csv')

Implementation Steps:

  1. Data Loading: Same pattern as other models

  2. Custom Feature Extractor:

  3. class RegKeyNumericFeatures(BaseEstimator, TransformerMixin): def transform(self, X): return [ len(v), # Total path length v.count('\'), # Path depth (backslashes) sum(c.isdigit() for c in v), # Numeric characters self._entropy(v), # Shannon entropy int(any(k in v.lower() for k in [ 'run', 'startup', 'powershell', 'cmd', 'exe', 'dll' ])) # Persistence/execution flags ]

  4. Feature Pipeline:

    • FeatureUnion combining:
      • TF-IDF (char n-grams 3-6)
      • Numeric features (with MinMaxScaler)
    • N-gram range: (3, 6) to capture registry path patterns
  5. Model Training:

    • Algorithm: MultinomialNB (alpha=0.25) - lower smoothing for aggressive detection
    • Full Pipeline: Pipeline([('features', FeatureUnion), ('nb', MultinomialNB)])
  6. Evaluation:

    • Accuracy: 83.17%
    • Confusion Matrix: [[37, 11], [6, 47]]
    • Precision (Malicious): 81.03%
    • Recall (Malicious): 88.68% (aggressive detection)
  7. Model Persistence:

joblib.dump(model, "regkey.joblib") -Saved as full Pipeline object

Design Rationale: High recall prioritizes catching all malicious registry changes - acceptable false positive rate for offline forensic analysis where system isolation is standard procedure.

2.4.4 URL Model Training

Function: train_url_lr(csv_path='url.csv')

Implementation Steps:

  1. Data Loading: Identical to other models

  2. Feature Extraction:

    • TF-IDF Only (no custom numeric features)
      • N-gram range: (3, 6)
      • Character-level analysis captures URL structure (protocols, paths, parameters)
  3. Model Training:

    • Algorithm: Logistic Regression (different from others)
      • Max iterations: 2000
      • Class weight: 'balanced' (handles imbalanced dataset)
      • Regularization: C=3.0 (moderate L2 penalty)
      • Solver: 'liblinear' (efficient for small datasets)
  4. Evaluation:

    • Accuracy: 82.14%
    • Confusion Matrix: [[15, 2], [3, 8]]
    • Precision (Malicious): 80%
    • Recall (Malicious): 72.73%
    • Balanced metrics appropriate for phishing/payload delivery detection
  5. Model Persistence:

    joblib.dump(lr_pipeline, "url.joblib")
    • Saved as full Pipeline: Pipeline([('tfidf', TfidfVectorizer), ('lr', LogisticRegression)])

Design Rationale: Logistic Regression chosen for balanced precision/recall. URLs are primary attack vectors (phishing, drive-by downloads) requiring equal sensitivity to both false positives and false negatives.


2.4.5 Common Components

IOCFeatureExtractor Class:

class IOCFeatureExtractor(BaseEstimator, TransformerMixin): def fit(self, X, y=None): return self

def transform(self, X):
    feats = []
    for v in X:
        v = str(v)
        feats.append([
            len(v),                          # String length
            sum(c.isdigit() for c in v),     # Digit count
            sum(not c.isalnum() for c in v), # Special char count
            self._entropy(v),                # Shannon entropy
            int(any(k in v.lower() for k in [
                "exe", "cmd", "powershell", "run", "c2", 
                "dll", "temp", "appdata"
            ]))  # Suspicious keyword flag
        ])
    return csr_matrix(np.array(feats), dtype=float)

def _entropy(self, s):
    if len(s) == 0: return 0.0
    probs = [s.count(c) / len(s) for c in set(s)]
    return -sum(p * math.log2(p) for p in probs if p > 0)

Key Design Decisions:

  • Sparse matrix output (csr_matrix) for memory efficiency when stacking with TF-IDF
  • Suspicious keywords tuned per IOC type operational context
  • Entropy captures randomness often found in obfuscated/encoded IOCs

2.4.6 Training Workflow Summary

For each IOC type:

  1. Load CSV (sha256, value, label)
  2. Split 80/20 (stratified)
  3. Extract features (TF-IDF + Numeric)
  4. Train classifier (NB or LR)
  5. Evaluate on test set
  6. Print confusion matrix, classification report
  7. Serialize to .joblib

Output Files:

  • domain.joblib(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)
  • ip.joblib(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)
  • url.joblibPipeline([tfidf, LogisticRegression])
  • regkey.joblibPipeline([FeatureUnion, MultinomialNB])

Integration with model_tester.py: The serialized models are loaded by model_tester.py which handles:

  • Lazy loading and caching
  • Inference on new IOC entries
  • Probability extraction via predict_proba()
  • Verdict mapping (0-100% score)

3. Prediction Logic and Decision Rules

SOCBot implements a hierarchical evaluation strategy that adapts based on user intent regarding sample relationships.

3.1 Single IOC Type Selected

Workflow:

  1. Sample Relationship Question:

    • Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"
  2. User Answers YES:

    • Classify each entry individually
    • Compute cumulative maliciousness percentage = avg(all_entry_scores)
    • Apply threshold mapping to cumulative score
    • Output:
      • Per-entry predictions with individual verdicts
      • Cumulative percentage and verdict for the IOC type
  3. User Answers NO:

    • Classify each entry individually
    • Output: Per-entry predictions only
    • No cumulative scoring

3.2 Multiple IOC Types Selected

Multi-stage decision tree ensuring valid cross-IOC aggregation.

Step 1: Per-IOC Sample Relationship

For each selected IOC type independently:

  • Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"

Per-IOC Outcomes:

  • YES: Individual classification + cumulative % computed → stored for potential global averaging
  • NO: Individual classification only → cumulative scoring disabled for this IOC → global averaging becomes impossible

Critical Rule: If ANY IOC receives "NO", the system must not ask the global question.


Step 2: Global Sample Relationship (Conditional)

Trigger Condition: All per-IOC answers were "YES"

  • Bot: "Do all IOC types belong to one sample? (Y/N)"

Global Outcomes:

  • YES:

    • Compute Total Sample Average = avg(all_ioc_cumulative_scores)
    • Map to final verdict using classification thresholds
    • Output: Per-entry + per-IOC cumulative + total sample verdict
  • NO:

    • Output: Per-entry + per-IOC cumulative only
    • No total sample average

3.3 Decision Logic Summary Table

Scenario Per-IOC Answers Global Question Asked? Output Includes
Single IOC + Same Sample = YES N/A (only 1 IOC) No Per-entry + cumulative
Single IOC + Same Sample = NO N/A No Per-entry only
Multiple IOCs + Any "NO" At least 1 NO No Per-entry + per-IOC cumulative (where YES), no total
Multiple IOCs + All "YES" All YES Yes Depends on global answer ↓
↳ Global = YES All YES Yes (answered YES) Per-entry + per-IOC cumulative + total sample
↳ Global = NO All YES Yes (answered NO) Per-entry + per-IOC cumulative, no total sample

4. Classification Thresholds

All percentage-based scores (per-entry, per-IOC cumulative, total sample) use the following uniform mapping:

Percentage Range Verdict
0% – 15% Benign
15% – 30% Possibly Benign
30% – 65% Possibly Malicious
65% – 100% Malicious

Application:

  • Individual IOC entry scores
  • IOC-level cumulative averages
  • Total sample-level final verdict

5. Test Cases

Section 1: Model Availability Tests

Objective: Verify all four IOC models load successfully

Results:

  • ✅ Domain model (1) - Working
  • ✅ URL model (2) - Working
  • ✅ IP model (3) - Working
  • ✅ RegKey model (4) - Working

Section 2: Single IOC Type Tests

Test Case 2.1: URL + Same Sample = YES

Steps:

  1. Select IOC type: 2 (URL)
  2. Submit URL entries, type DONE
  3. Q: "Do all entries for IOC URL belong to the same sample?" → Answer: YES

Expected Output:

  • Per-entry predictions with individual scores and verdicts
  • Cumulative percentage for URL type
  • Cumulative verdict based on threshold mapping

Status: ✅ Passed


Test Case 2.2: URL + Same Sample = NO

Steps:

  1. Select IOC type: 2 (URL)
  2. Submit URL entries, type DONE
  3. Q: "Do all entries for IOC URL belong to the same sample?" → Answer: NO

Expected Output:

  • Per-entry predictions with individual scores and verdicts
  • No cumulative scoring

Status: ✅ Passed


Section 3: Multiple IOC Type Tests

Test Case 3.1: URL + RegKey, All YES, Global YES

Steps:

  1. Select IOC types: 2,4 (URL, RegKey)
  2. Submit URL entries → DONE
  3. Submit RegKey entries → DONE
  4. Q: "Do all entries for IOC URL belong to the same sample?" → YES
  5. Q: "Do all entries for IOC RegKey belong to the same sample?" → YES
  6. Q: "Do all IOC types belong to one sample?" → YES

Expected Output:

  • Per-entry predictions for URL and RegKey
  • Per-IOC cumulative % for URL and RegKey
  • Total Sample Average = (URL_cumulative + RegKey_cumulative) / 2
  • Total Sample Verdict based on average

Status: ✅ Passed


Test Case 3.2: URL + RegKey, All YES, Global NO

Steps:

  1. Select: 2,4
  2. Submit entries for both
  3. Q: URL same sample? → YES
  4. Q: RegKey same sample? → YES
  5. Q: Global sample? → NO

Expected Output:

  • Per-entry predictions for URL and RegKey
  • Per-IOC cumulative % for URL and RegKey
  • No Total Sample Average

Status: ✅ Passed


Test Case 3.3: URL + RegKey, Mixed (URL=YES, RegKey=NO)

Steps:

  1. Select: 2,4
  2. Submit entries
  3. Q: URL same sample? → YES
  4. Q: RegKey same sample? → NO

Expected Output:

  • URL: Per-entry + cumulative %
  • RegKey: Per-entry only (no cumulative)
  • Global question not asked
  • No Total Sample Average

Status: ✅ Passed


Test Case 3.4: URL + RegKey, Both NO

Steps:

  1. Select: 2,4
  2. Submit entries
  3. Q: URL same sample? → NO
  4. Q: RegKey same sample? → NO

Expected Output:

  • URL: Per-entry only
  • RegKey: Per-entry only
  • Global question not asked
  • No cumulative scoring at any level

Status: ✅ Passed


6. Web Application Architecture

6.1 System Overview

┌─────────────────┐
│   Frontend      │  (HTML/CSS/JS)
│   Static Files  │
└────────┬────────┘
         │ HTTP
         ↓
┌─────────────────┐
│  Flask Backend  │  (app.py)
│  Session Store  │
└────────┬────────┘
         │ Function Call
         ↓
┌─────────────────┐
│  Model Tester   │  (model_tester.py)
│  ML Inference   │
└────────┬────────┘
         │ Joblib Load
         ↓
┌─────────────────┐
│  Trained Models │  (models/*.joblib)
│  Pipelines      │
└─────────────────┘

6.2 Request Flow

  1. User types message in frontend input field
  2. JavaScript sends POST to /api/send_message with {message, session_id}
  3. Flask state_machine() processes input, updates session state
  4. If finalization triggered:
    • Calls classify_entries() from model_tester.py
    • Loads models, runs inference
    • Computes aggregated scores
  5. Flask returns JSON response with bot message and results
  6. Frontend renders bot message in chat UI

6.3 Session Lifecycle

User connects → UUID generated → Session state initialized
      ↓
Choose IOC types → Validate selection → Store in state
      ↓
Collect entries → Loop per IOC type → Store in state
      ↓
Per-IOC questions → Store Y/N answers → Conditional progression
      ↓
Global question (if eligible) → Store Y/N → Finalize
      ↓
Classification → Display results → Session persists (in-memory)

Appendix: File Structure

SOCBot/
├── .venv/                      # Python virtual environment (library root)
├── Frontend/                   # Static web assets
│   ├── index.html              # Chat UI structure
│   ├── script.js               # Client-side logic
│   └── style.css               # WhatsApp-style theme
├── models/                     # Trained model artifacts
│   ├── domain.csv              # Training data - Domain IOCs
│   ├── domain.joblib           # Trained Domain classifier
│   ├── ip.csv                  # Training data - IP IOCs
│   ├── ip.joblib               # Trained IP classifier
│   ├── ModelTrainer.ipynb      # Jupyter notebook for model training
│   ├── modeltrainer.py         # Python script version of training code
│   ├── regkey.csv              # Training data - Registry Key IOCs
│   ├── regkey.joblib           # Trained RegKey classifier
│   ├── url.csv                 # Training data - URL IOCs
│   └── url.joblib              # Trained URL classifier
├── app.py                      # Flask backend + FSM logic
├── history.py                  # Session logging utilities
├── model_tester.py             # ML inference engine
├── requirements.txt            # Python dependencies
└── history.jsonl               # Session records (auto-generated)

Document Version: 1.0
Last Updated: December 21, 2025
Prepared By: SOCBot Development Team