AI-Powered IOC Classification System for Security Operations
Location: Frontend/ directory (index.html, script.js, style.css)
Technologies:
- Vanilla JavaScript with Fetch API for HTTP requests
- HTML5 for semantic markup
- CSS3 for styling and responsive design
- No build tools required - runs directly in browser without Node.js/Vite at runtime
Design & Features:
- WhatsApp-style chat interface with left-aligned user messages and right-aligned bot messages
- Top banner: Displays "Welcome to SOCbot. Your AI agent for identification of Malicious Changes to Secure your Infra."
- Scrollable message container for conversation history
- Right-side static instruction panel showing IOC selection menu (1=Domain, 2=URL, 3=IP, 4=RegKey)
- Bottom input bar with text field and send button
- Session persistence via
session_idmaintained across conversation lifecycle
Client-Side Validation Rules:
- IOC Selection: Accepts comma/semicolon-separated numeric values (1,2,3,4)
- Domain validation: Must match
example.comformat (no scheme or path) - URL validation: Must parse as
http(s)://...with valid scheme - IP validation: IPv4 dotted-quad format with octets 0-255
- Registry key validation: Must start with
HKEY_LOCAL_MACHINE\,HKEY_CURRENT_USER\,HKEY_CLASSES_ROOT\,HKEY_USERS\, orHKEY_CURRENT_CONFIG\ - Y/N phases: Accepts only Y/Yes or N/No (case-insensitive)
Location: app.py
Technologies:
- Flask 2.x - Python web framework
- Flask-CORS - Cross-Origin Resource Sharing support
- Python 3.x - Backend runtime
Purpose & Responsibilities:
- Static file server: Serves frontend assets from
Frontend/directory - Session management: Maintains in-memory conversation state per
session_id - Conversation orchestration: Implements finite state machine (FSM) for multi-phase dialogue
- Model delegation: Calls
model_tester.pyfor ML predictions - Result aggregation: Computes per-entry, per-IOC cumulative, and total-sample verdicts
API Endpoints:
GET /- Servesindex.htmlGET /<path>- Serves static assets (CSS, JS, images)POST /api/send_message- Main chat endpoint accepting user messages and returning bot responses
Backend-Frontend Connection:
{ "message": "1,2", "session_id": "uuid-string" }
{ "bot_message": "Select IOC Types...", "session_id": "uuid-string", "state_phase": "choose_types", "ready": false, "final_report": null }
Model Storage: models/ or Models/ directory containing .joblib serialized pipelines
Supported IOC Types:
domain.joblib- Domain name classifier (Naive Bayes)url.joblib- URL classifier (Logistic Regression)ip.joblib- IP address classifier (Naive Bayes)regkey.joblib- Windows Registry key classifier (Naive Bayes)
Original Approach (Rejected - 79% Accuracy):
- Single unified dataset with columns:
sha256,ioc_type,ioc_value,label - One-hot encoding for IOC type combined with TF-IDF features
- Single multi-class model attempting to classify all IOC types simultaneously
- Critical Failures:
- 25% false negative rate (malicious samples flagged as benign)
- 15% false positive rate (benign samples flagged as malicious)
- No contextual understanding of IOC role in attack lifecycle
Why the Original Approach Failed: Treating all IOCs uniformly ignored the fundamental differences in their operational context within the Cyber Kill Chain (CKC). A SOC analyst consultation revealed that each IOC type serves a distinct purpose and requires tailored sensitivity thresholds.
Design Principle: Each IOC type is trained independently based on its role in the Cyber Kill Chain, with algorithm and threshold tuning matching operational requirements.
SOC Head-Defined Classification Strategy:
| IOC Type | CKC Phase | Operational Role | Detection Strategy | Algorithm Choice |
|---|---|---|---|---|
| Domain | Reconnaissance / C2 Communication | Linked to IPs; determines where system connects | Conservative - Avoid false positives that block legitimate workflows | Naive Bayes (balanced priors) |
| IP | C2 Communication | Direct network connections; tied to domains | Conservative - Same as domains; must not disrupt critical services | Naive Bayes (balanced priors) |
| URL | Delivery / Exploitation | Phishing vectors, payload delivery mechanisms | Balanced - Equal weight to false positives/negatives; primary attack vector | Logistic Regression (class-balanced) |
| RegKey | Installation / Actions on Objectives | Offline persistence changes; high-impact modifications | Aggressive - Can trigger system shutdown/forensic isolation; zero-tolerance for FNs | Naive Bayes (strict threshold) |
Rationale from SOC Operations:
- Domains/IPs: Conservative filtering prevents blocking legitimate infrastructure that may share IP space or use CDNs. False positives here disrupt business operations.
- URLs: Balanced approach because URLs are the primary delivery mechanism for phishing and drive-by downloads. Requires equal sensitivity to both error types.
- Registry Keys: Aggressive detection because unauthorized registry modifications indicate persistence mechanisms or system compromise. These are forensic indicators requiring immediate response, even at the cost of false positives.
Per-IOC Training Sets:
domain.csv,ip.csv,url.csv,regkey.csv- Columns:
sha256(sample identifier),value(IOC string),label(0=Benign, 1=Malicious) - Benefits:
- Independent feature spaces optimized per IOC type
- Algorithm selection tailored to detection requirements
- Eliminates cross-contamination from unrelated IOC types
Domain Model (Naive Bayes - Conservative):
- TF-IDF: Character-level n-grams (3-5), max 6000 features
- Custom Features: Length, digits, special chars, entropy, suspicious keywords (
exe,cmd,powershell,run,c2,dll,temp,appdata) - Classifier: MultinomialNB (alpha=0.3)
- Output Format:
(tfidf, feature_extractor, classifier)tuple
IP Model (Naive Bayes - Conservative):
- TF-IDF: Character-level n-grams (1-3), max 6000 features
- Custom Features: Same 5-feature set as Domain
- Classifier: MultinomialNB (alpha=0.3)
- Output Format:
(tfidf, feature_extractor, classifier)tuple
URL Model (Logistic Regression - Balanced):
- TF-IDF Only: Character-level n-grams (3-6), max 6000 features
- Classifier: LogisticRegression (C=3.0, class_weight='balanced', solver='liblinear', max_iter=2000)
- Output Format: Scikit-learn
Pipelineobject - Note: No custom features; relies purely on character patterns due to URL structural complexity
RegKey Model (Naive Bayes - Aggressive):
- TF-IDF: Character-level n-grams (3-6), max 6000 features
- Custom Features (RegKey-Specific): Length, backslash count (path depth), digits, entropy, suspicious keywords (
run,startup,powershell,cmd,exe,dll) - Classifier: MultinomialNB (alpha=0.25) + MinMaxScaler for numeric features
- Output Format: Scikit-learn
PipelinewithFeatureUnion
Shannon Entropy Calculation: H(X) = -Σ p(x) * log₂(p(x)) Measures randomness; higher entropy indicates obfuscation or encoding (common in malicious IOCs).
Suspicious Keyword Detection:
- Domain/IP: Flags command execution indicators (
exe,cmd,powershell,dll,temp,appdata) - RegKey: Flags persistence locations (
run,startup) and execution paths (powershell,cmd,exe,dll)
RegKey Path Depth:
- Counts backslashes (
\) to measure registry hierarchy depth - Deeper paths often correlate with hidden persistence mechanisms
| Model | Accuracy | Precision (Mal) | Recall (Mal) | F1-Score (Mal) | Training Strategy |
|---|---|---|---|---|---|
| Domain | 88.10% | 100.00% | 37.50% | 54.55% | High precision, low recall (conservative) |
| IP | 85.42% | 100.00% | 30.00% | 46.15% | High precision, low recall (conservative) |
| URL | 82.14% | 80.00% | 72.73% | 76.19% | Balanced precision/recall |
| RegKey | 83.17% | 81.03% | 88.68% | 84.68% | High recall, acceptable precision (aggressive) |
Interpretation:
- Domain/IP: Perfect precision (zero false positives) at the cost of recall - aligns with conservative operational requirement
- URL: Balanced metrics - appropriate for primary attack vector
- RegKey: High recall prioritizes catching all malicious changes - aligns with aggressive forensic requirement
Common Hyperparameters:
- Test split: 20% (stratified)
- Random state: 42 (reproducibility)
- TF-IDF:
min_df=2,max_df=0.9(filter rare/common terms)
Algorithm-Specific Settings:
- Naive Bayes: Laplace smoothing (alpha=0.25-0.3) to handle unseen n-grams
- Logistic Regression: L2 regularization (C=3.0), balanced class weights to handle dataset imbalance
Purpose: Central Flask application managing conversation flow and serving frontend.
Key Components:
Global Configuration: IOC_MAP = {"1": "domain", "2": "url", "3": "ip", "4": "regkey"} THRESHOLDS = [(0, 15, "Benign"), (15, 30, "Possibly Benign"), (30, 65, "Possibly Malicious"), (65, 101, "Malicious")] sessions: Dict[str, Dict] = {} # In-memory session store
Session State Structure: { "phase": str, # Current FSM phase "selected": List[str], # Selected IOC types "current_ioc_index": int, # Index for iterating IOC collection "entries": Dict, # Collected IOC entries per type "per_ioc_same": Dict, # Per-IOC sample relationship (Y/N) "global_same": bool, # Global sample relationship "final_report": Dict # Classification results }
Finite State Machine Phases:
- choose_types - User selects IOC types (1,2,3,4)
- collect_entries - User submits individual IOC strings, types DONE to proceed
- per_ioc_sample - Bot asks sample relationship for each IOC type
- global_sample - Bot asks if all IOCs belong to one sample (only if all per-IOC = YES)
- final - Results computed and displayed
Key Functions:
available_model_types()- Scansmodels/directory for available.joblibfilesverdict_from_percent(p)- Maps numeric score to threshold-based verdictensure_session(sid)- Creates or retrieves session statenext_prompt(state)- Generates context-appropriate bot messagestate_machine(state, user_text)- Processes user input and advances FSMfinalize(state)- Delegates toclassify_entries(), computes cumulative scores, generates final report
Scoring Logic in finalize():
per_entry = classify_entries(state["entries"])
per_ioc_cum[ioc] = avg(scores) if per_ioc_same[ioc] else None
total_sample = avg(per_ioc_cum.values()) if conditions_met else None
Purpose: Persistent storage for session records using JSON Lines format.
Storage Format:
- File:
history.jsonlin project root - Format: Newline-delimited JSON (one JSON object per line)
- Max Records: 100 (auto-pruning via
prune_history())
Key Functions:
append_session(record: Dict)
- Appends new session record to
history.jsonl - Triggers automatic pruning if record count exceeds
MAX_RECORDS
load_history() -> List[Dict]
- Reads all records from file
- Returns list of dictionaries
- Skips malformed JSON lines gracefully
prune_history()
- Keeps only the most recent 100 records
- Overwrites file with trimmed dataset
Use Case:
Enables future analytics, session replay, and model retraining from production data. Currently imported but not actively called in app.py (ready for integration).
Purpose: Model loader and inference engine for IOC classification.
Architecture:
Model Caching: models_cache: Dict[str, Optional[object]] = {} Lazy-loads models on first use, caches in memory to avoid repeated disk I/O.
Custom Transformers:
IOCFeatureExtractor
- Computes 5 numeric features: length, digits, special chars, entropy, suspicious keywords
- Shannon entropy calculation:
Σ -p(x) * log₂(p(x)) - Registered in
sys.modules['__main__']for joblib unpickling compatibility
RegKeyNumericFeatures
- Specialized for Windows registry paths
- Detects registry-specific suspicious patterns (e.g.,
\software\,\currentversion\)
Key Functions:
get_model(ioc_type: str)
- Loads model from
models/{ioc_type}.joblib - Returns
Noneif model file missing or loading fails - Memoizes result in
models_cache
classify_entries(entries: Dict[str, List[str]])
- Input:
{"url": ["http://evil.com", ...], "ip": ["192.0.2.1", ...]} - Process:
- For each IOC type, load corresponding model via
get_model() - Transform IOC strings using pipeline (TF-IDF + features)
- Extract probability from
predict_proba()(malicious class index) - Scale to 0-100% range
- For each IOC type, load corresponding model via
- Output: { "url": [ {"value": "http://evil.com", "score": 87.3, "verdict": "Malicious"}, ... ] }
Model Compatibility: Supports both:
- Scikit-learn
Pipelineobjects (withpredict_proba()) - used by URL and RegKey models - Tuple format:
(tfidf, feature_extractor, classifier)- used by Domain and IP models
Inference Logic: if isinstance(model, (list, tuple)) and len(model) >= 3: # Tuple format (Domain/IP) tfidf, fe, clf = model[0], model[1], model[2] X = hstack([tfidf.transform([val]), fe.transform([val])]) proba = clf.predict_proba(X)[0] score = float(proba[1] * 100) # Malicious class probability elif hasattr(model, "predict_proba"): # Pipeline format (URL/RegKey) proba = model.predict_proba([val])[0] score = float(proba[1] * 100)
Verdict Mapping: def verdict_from_score(score: Optional[float]) -> str: if score is None: return "Unavailable" return "Malicious" if score >= 50 else "Benign"
Integration with app.py:
Called during finalize() phase:
per_entry = classify_entries(state["entries"])
Error Handling:
- Missing models return
score=None,verdict="Unavailable" - Gracefully handles prediction failures with try-except blocks
- Continues processing remaining IOCs if one model fails
Purpose: Training script for all four IOC classification models using SOC-aligned design principles.
Location: Implemented as Jupyter Notebook (ModelTrainer.ipynb) with four independent training cells.
Function: train_domain_nb(csv_path='domain.csv')
Implementation Steps:
-
Data Loading: df = pd.read_csv(csv_path)[['value', 'label']].dropna() X = df['value'].astype(str) y = df['label']
-
Train-Test Split:
- 80/20 split with stratification to preserve class balance
- Random state: 42
-
Feature Extraction:
- TF-IDF Vectorizer:
- Analyzer:
char(character-level) - N-gram range: (3, 5)
- Min document frequency: 2 (filters rare patterns)
- Max document frequency: 0.9 (filters common patterns)
- Max features: 6000
- Analyzer:
- IOCFeatureExtractor: Computes 5 numeric features
- Length, digit count, special character count, Shannon entropy, suspicious keyword flag
- TF-IDF Vectorizer:
-
Model Training:
- Algorithm: MultinomialNB (alpha=0.3 for Laplace smoothing)
- Feature matrix:
hstack([tfidf_features, numeric_features])
-
Evaluation:
- Accuracy: 88.10%
- Confusion Matrix:
[[34, 0], [5, 3]](34 TN, 0 FP, 5 FN, 3 TP) - Precision (Malicious): 100% (zero false positives)
- Recall (Malicious): 37.5% (conservative detection)
-
Model Persistence: joblib.dump((tfidf, fe, nb), "domain.joblib")
- Saved as tuple: `(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)`
Design Rationale: Conservative approach prioritizes precision over recall to avoid blocking legitimate domains used in business workflows.
Function: train_ip_nb_model(csv_path='ip.csv')
Implementation Steps:
-
Data Loading: Same as Domain model
-
Feature Extraction:
- TF-IDF Vectorizer:
- N-gram range: (1, 3) (shorter than domains due to IP structure)
- Other params identical to Domain
- IOCFeatureExtractor: Same 5-feature set
- TF-IDF Vectorizer:
-
Model Training:
- Algorithm: MultinomialNB (alpha=0.3)
- Feature stacking:
hstack([tfidf, numeric])
-
Evaluation:
- Accuracy: 85.42%
- Confusion Matrix:
[[38, 0], [7, 3]] - Precision (Malicious): 100%
- Recall (Malicious): 30% (highly conservative)
-
Model Persistence: joblib.dump((tfidf, fe, nb), "ip.joblib")
Design Rationale: Conservative like Domain - IPs are tied to infrastructure connections; false positives disrupt critical services.
Function: train_regkey_nb(csv_path='regkey.csv')
Implementation Steps:
-
Data Loading: Same pattern as other models
-
Custom Feature Extractor:
-
class RegKeyNumericFeatures(BaseEstimator, TransformerMixin): def transform(self, X): return [ len(v), # Total path length v.count('\'), # Path depth (backslashes) sum(c.isdigit() for c in v), # Numeric characters self._entropy(v), # Shannon entropy int(any(k in v.lower() for k in [ 'run', 'startup', 'powershell', 'cmd', 'exe', 'dll' ])) # Persistence/execution flags ]
-
Feature Pipeline:
- FeatureUnion combining:
- TF-IDF (char n-grams 3-6)
- Numeric features (with MinMaxScaler)
- N-gram range: (3, 6) to capture registry path patterns
- FeatureUnion combining:
-
Model Training:
- Algorithm: MultinomialNB (alpha=0.25) - lower smoothing for aggressive detection
- Full Pipeline:
Pipeline([('features', FeatureUnion), ('nb', MultinomialNB)])
-
Evaluation:
- Accuracy: 83.17%
- Confusion Matrix:
[[37, 11], [6, 47]] - Precision (Malicious): 81.03%
- Recall (Malicious): 88.68% (aggressive detection)
-
Model Persistence:
joblib.dump(model, "regkey.joblib") -Saved as full Pipeline object
Design Rationale: High recall prioritizes catching all malicious registry changes - acceptable false positive rate for offline forensic analysis where system isolation is standard procedure.
Function: train_url_lr(csv_path='url.csv')
Implementation Steps:
-
Data Loading: Identical to other models
-
Feature Extraction:
- TF-IDF Only (no custom numeric features)
- N-gram range: (3, 6)
- Character-level analysis captures URL structure (protocols, paths, parameters)
- TF-IDF Only (no custom numeric features)
-
Model Training:
- Algorithm: Logistic Regression (different from others)
- Max iterations: 2000
- Class weight: 'balanced' (handles imbalanced dataset)
- Regularization: C=3.0 (moderate L2 penalty)
- Solver: 'liblinear' (efficient for small datasets)
- Algorithm: Logistic Regression (different from others)
-
Evaluation:
- Accuracy: 82.14%
- Confusion Matrix:
[[15, 2], [3, 8]] - Precision (Malicious): 80%
- Recall (Malicious): 72.73%
- Balanced metrics appropriate for phishing/payload delivery detection
-
Model Persistence:
joblib.dump(lr_pipeline, "url.joblib")
- Saved as full Pipeline:
Pipeline([('tfidf', TfidfVectorizer), ('lr', LogisticRegression)])
- Saved as full Pipeline:
Design Rationale: Logistic Regression chosen for balanced precision/recall. URLs are primary attack vectors (phishing, drive-by downloads) requiring equal sensitivity to both false positives and false negatives.
IOCFeatureExtractor Class:
class IOCFeatureExtractor(BaseEstimator, TransformerMixin): def fit(self, X, y=None): return self
def transform(self, X):
feats = []
for v in X:
v = str(v)
feats.append([
len(v), # String length
sum(c.isdigit() for c in v), # Digit count
sum(not c.isalnum() for c in v), # Special char count
self._entropy(v), # Shannon entropy
int(any(k in v.lower() for k in [
"exe", "cmd", "powershell", "run", "c2",
"dll", "temp", "appdata"
])) # Suspicious keyword flag
])
return csr_matrix(np.array(feats), dtype=float)
def _entropy(self, s):
if len(s) == 0: return 0.0
probs = [s.count(c) / len(s) for c in set(s)]
return -sum(p * math.log2(p) for p in probs if p > 0)
Key Design Decisions:
- Sparse matrix output (
csr_matrix) for memory efficiency when stacking with TF-IDF - Suspicious keywords tuned per IOC type operational context
- Entropy captures randomness often found in obfuscated/encoded IOCs
For each IOC type:
- Load CSV (sha256, value, label)
- Split 80/20 (stratified)
- Extract features (TF-IDF + Numeric)
- Train classifier (NB or LR)
- Evaluate on test set
- Print confusion matrix, classification report
- Serialize to .joblib
Output Files:
domain.joblib→(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)ip.joblib→(TfidfVectorizer, IOCFeatureExtractor, MultinomialNB)url.joblib→Pipeline([tfidf, LogisticRegression])regkey.joblib→Pipeline([FeatureUnion, MultinomialNB])
Integration with model_tester.py:
The serialized models are loaded by model_tester.py which handles:
- Lazy loading and caching
- Inference on new IOC entries
- Probability extraction via
predict_proba() - Verdict mapping (0-100% score)
SOCBot implements a hierarchical evaluation strategy that adapts based on user intent regarding sample relationships.
Workflow:
-
Sample Relationship Question:
- Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"
-
User Answers YES:
- Classify each entry individually
- Compute cumulative maliciousness percentage =
avg(all_entry_scores) - Apply threshold mapping to cumulative score
- Output:
- Per-entry predictions with individual verdicts
- Cumulative percentage and verdict for the IOC type
-
User Answers NO:
- Classify each entry individually
- Output: Per-entry predictions only
- No cumulative scoring
Multi-stage decision tree ensuring valid cross-IOC aggregation.
Step 1: Per-IOC Sample Relationship
For each selected IOC type independently:
- Bot: "Do all entries for IOC {type} belong to the same sample? (Y/N)"
Per-IOC Outcomes:
- YES: Individual classification + cumulative % computed → stored for potential global averaging
- NO: Individual classification only → cumulative scoring disabled for this IOC → global averaging becomes impossible
Critical Rule: If ANY IOC receives "NO", the system must not ask the global question.
Step 2: Global Sample Relationship (Conditional)
Trigger Condition: All per-IOC answers were "YES"
- Bot: "Do all IOC types belong to one sample? (Y/N)"
Global Outcomes:
-
YES:
- Compute Total Sample Average =
avg(all_ioc_cumulative_scores) - Map to final verdict using classification thresholds
- Output: Per-entry + per-IOC cumulative + total sample verdict
- Compute Total Sample Average =
-
NO:
- Output: Per-entry + per-IOC cumulative only
- No total sample average
| Scenario | Per-IOC Answers | Global Question Asked? | Output Includes |
|---|---|---|---|
| Single IOC + Same Sample = YES | N/A (only 1 IOC) | No | Per-entry + cumulative |
| Single IOC + Same Sample = NO | N/A | No | Per-entry only |
| Multiple IOCs + Any "NO" | At least 1 NO | No | Per-entry + per-IOC cumulative (where YES), no total |
| Multiple IOCs + All "YES" | All YES | Yes | Depends on global answer ↓ |
| ↳ Global = YES | All YES | Yes (answered YES) | Per-entry + per-IOC cumulative + total sample |
| ↳ Global = NO | All YES | Yes (answered NO) | Per-entry + per-IOC cumulative, no total sample |
All percentage-based scores (per-entry, per-IOC cumulative, total sample) use the following uniform mapping:
| Percentage Range | Verdict |
|---|---|
| 0% – 15% | Benign |
| 15% – 30% | Possibly Benign |
| 30% – 65% | Possibly Malicious |
| 65% – 100% | Malicious |
Application:
- Individual IOC entry scores
- IOC-level cumulative averages
- Total sample-level final verdict
Objective: Verify all four IOC models load successfully
Results:
- ✅ Domain model (1) - Working
- ✅ URL model (2) - Working
- ✅ IP model (3) - Working
- ✅ RegKey model (4) - Working
Test Case 2.1: URL + Same Sample = YES
Steps:
- Select IOC type:
2(URL) - Submit URL entries, type
DONE - Q: "Do all entries for IOC URL belong to the same sample?" → Answer:
YES
Expected Output:
- Per-entry predictions with individual scores and verdicts
- Cumulative percentage for URL type
- Cumulative verdict based on threshold mapping
Status: ✅ Passed
Test Case 2.2: URL + Same Sample = NO
Steps:
- Select IOC type:
2(URL) - Submit URL entries, type
DONE - Q: "Do all entries for IOC URL belong to the same sample?" → Answer:
NO
Expected Output:
- Per-entry predictions with individual scores and verdicts
- No cumulative scoring
Status: ✅ Passed
Test Case 3.1: URL + RegKey, All YES, Global YES
Steps:
- Select IOC types:
2,4(URL, RegKey) - Submit URL entries →
DONE - Submit RegKey entries →
DONE - Q: "Do all entries for IOC URL belong to the same sample?" →
YES - Q: "Do all entries for IOC RegKey belong to the same sample?" →
YES - Q: "Do all IOC types belong to one sample?" →
YES
Expected Output:
- Per-entry predictions for URL and RegKey
- Per-IOC cumulative % for URL and RegKey
- Total Sample Average =
(URL_cumulative + RegKey_cumulative) / 2 - Total Sample Verdict based on average
Status: ✅ Passed
Test Case 3.2: URL + RegKey, All YES, Global NO
Steps:
- Select:
2,4 - Submit entries for both
- Q: URL same sample? →
YES - Q: RegKey same sample? →
YES - Q: Global sample? →
NO
Expected Output:
- Per-entry predictions for URL and RegKey
- Per-IOC cumulative % for URL and RegKey
- No Total Sample Average
Status: ✅ Passed
Test Case 3.3: URL + RegKey, Mixed (URL=YES, RegKey=NO)
Steps:
- Select:
2,4 - Submit entries
- Q: URL same sample? →
YES - Q: RegKey same sample? →
NO
Expected Output:
- URL: Per-entry + cumulative %
- RegKey: Per-entry only (no cumulative)
- Global question not asked
- No Total Sample Average
Status: ✅ Passed
Test Case 3.4: URL + RegKey, Both NO
Steps:
- Select:
2,4 - Submit entries
- Q: URL same sample? →
NO - Q: RegKey same sample? →
NO
Expected Output:
- URL: Per-entry only
- RegKey: Per-entry only
- Global question not asked
- No cumulative scoring at any level
Status: ✅ Passed
┌─────────────────┐
│ Frontend │ (HTML/CSS/JS)
│ Static Files │
└────────┬────────┘
│ HTTP
↓
┌─────────────────┐
│ Flask Backend │ (app.py)
│ Session Store │
└────────┬────────┘
│ Function Call
↓
┌─────────────────┐
│ Model Tester │ (model_tester.py)
│ ML Inference │
└────────┬────────┘
│ Joblib Load
↓
┌─────────────────┐
│ Trained Models │ (models/*.joblib)
│ Pipelines │
└─────────────────┘
- User types message in frontend input field
- JavaScript sends POST to
/api/send_messagewith{message, session_id} - Flask
state_machine()processes input, updates session state - If finalization triggered:
- Calls
classify_entries()frommodel_tester.py - Loads models, runs inference
- Computes aggregated scores
- Calls
- Flask returns JSON response with bot message and results
- Frontend renders bot message in chat UI
User connects → UUID generated → Session state initialized
↓
Choose IOC types → Validate selection → Store in state
↓
Collect entries → Loop per IOC type → Store in state
↓
Per-IOC questions → Store Y/N answers → Conditional progression
↓
Global question (if eligible) → Store Y/N → Finalize
↓
Classification → Display results → Session persists (in-memory)
SOCBot/
├── .venv/ # Python virtual environment (library root)
├── Frontend/ # Static web assets
│ ├── index.html # Chat UI structure
│ ├── script.js # Client-side logic
│ └── style.css # WhatsApp-style theme
├── models/ # Trained model artifacts
│ ├── domain.csv # Training data - Domain IOCs
│ ├── domain.joblib # Trained Domain classifier
│ ├── ip.csv # Training data - IP IOCs
│ ├── ip.joblib # Trained IP classifier
│ ├── ModelTrainer.ipynb # Jupyter notebook for model training
│ ├── modeltrainer.py # Python script version of training code
│ ├── regkey.csv # Training data - Registry Key IOCs
│ ├── regkey.joblib # Trained RegKey classifier
│ ├── url.csv # Training data - URL IOCs
│ └── url.joblib # Trained URL classifier
├── app.py # Flask backend + FSM logic
├── history.py # Session logging utilities
├── model_tester.py # ML inference engine
├── requirements.txt # Python dependencies
└── history.jsonl # Session records (auto-generated)
Document Version: 1.0
Last Updated: December 21, 2025
Prepared By: SOCBot Development Team