Skip to content

Supporting code for ECH-Resilient Malware Detection via Flow-Level Statistical Features

Notifications You must be signed in to change notification settings

FlowFrontiers/MalwareDet-JA4vsFlowStats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection

Supporting repository for the paper: "Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection" By Márton Pál Lipcsey-Magyar, Attila Ármin Madarász, and Adrian Pekar

Paper Abstract

The deployment of Encrypted Client Hello (ECH) challenges TLS fingerprinting, a widely used approach for encrypted malware detection, by encrypting the handshake fields these methods rely on. This paper presents a systematic evaluation of flow-based statistical features as a handshake-independent alternative to fingerprinting. Through validation against the official JA4+ implementation, we establish limitations in fingerprinting approaches for this corpus: only 64.9% of malware families possess unique signatures, placing an inherent ceiling on achievable recall in our evaluation. We evaluate flow-level features—packet counts, timing patterns, and size distributions—across 27 experimental configurations on a dataset of 16,542 flows spanning 101 families (59 malware and 42 benign applications). Random Forest classifiers using combined flow statistics and sequential packet length features achieve 98.11% F1-score for binary malware detection with 97.22% recall, substantially exceeding fingerprinting’s theoretical recall bound of 64.9%. For fine-grained family identification, we obtain 54.81% macro F1 across 101 classes and 48.71% macro F1 for malware-only attribution, demonstrating that flow-based methods retain meaningful discriminative power where fingerprinting abstains. Across all tasks, Random Forest consistently outperforms neural networks and k-NN, with performance gaps widening in complex multiclass scenarios. These findings highlight flow-based classification as a practical and reproducible approach that can help maintain network security visibility as ECH deployment progresses, showing that behavioral traffic patterns are expected to provide durable signals for detection even as handshake fields become encrypted.

Repository Structure

├── reproduce-research/          # Validation pipelines
│   ├── paper-pipeline/          # Reproduce using original author's data
│   ├── nfstream-pipeline/       # Reproduce using NFStream extraction
│   └── verify-ja4-calculation/  # JA4+ conformance validation
│
└── paper-code/                  # Main classification system (Python)

See paper-code/README.md for detailed usage instructions.

Key Results

Binary Classification (Malware vs Benign)

Model Feature Set Accuracy Precision Recall F1-Score
Random Forest Combined 97.07% 99.02% 97.22% 98.11%
Random Forest Core 96.55% 98.66% 96.91% 97.78%
Random Forest SPL 96.65% 98.74% 96.95% 97.84%
Neural Network Combined 90.03% 94.39% 92.78% 93.58%

Full Multiclass (101 Families)

Model Feature Set Accuracy Macro F1
Random Forest Combined 61.62% 54.81%
Random Forest Core 59.66% 52.39%
FAISS k-NN Combined 43.97% 34.30%

Comparison with TLS Fingerprinting

Metric TLS Fingerprinting (JA4+JA4S+SNI) Flow-Based ML (RF+Combined)
Recall ≤64.9% (theoretical max) 97.22%
F1-Score ≤78.6% 98.11%
Handshake-Independent No Yes
Malware Coverage 64.9% 100%

Interactive Exploration

Open paper-code/notebooks/experiments.ipynb for an interactive notebook with all experiments, visualizations, and analysis.

Dataset

The experiments use the malware traffic dataset from:

Matoušek, P., Přívora, J., & Ryšavý, O. (2024). "TLS Traffic Analysis: Malware Classification with JA4+ Fingerprints"

Dataset characteristics:

  • 16,542 flows across 101 families (59 malware, 42 benign)
  • Sources: Desktop malware, mobile malware, desktop apps, mobile apps
  • Authenticated and labeled network traces

Note: Processed CSV files with extracted features are available under reproduce-research/. For original PCAPs, refer to Matoušek et al..

Features

Core Flow Statistics (33 features)

  • Volumetric: Packet counts, byte volumes (bidirectional, src→dst, dst→src)
  • Temporal: Flow duration per direction
  • Statistical: Packet size distributions (min, mean, stddev, max)
  • Timing: Packet inter-arrival times (PIAT) distributions

Sequential Packet Lengths (25 features)

  • First 25 packet sizes in arrival order
  • Captures protocol-specific patterns
  • Early detection capability

Combined Feature Set (58 features)

  • Synergy between macro-level (flow stats) and micro-level (SPL) patterns
  • Best performance across all tasks

Experimental Design

  • 3 Classification Tasks: Binary, Full Multiclass (101 classes), Malware-only (59 classes)
  • 3 Feature Sets: Core (33), SPL (25), Combined (58)
  • 3 ML Models: Neural Network, Random Forest, FAISS k-NN
  • Total: 27 experimental configurations
  • Reproducibility: Fixed random seeds (42), stratified 80/20 splits

Authors

  • Márton Pál Lipcsey-Magyar - Budapest University of Technology and Economics
  • Attila Ármin Madarász - Budapest University of Technology and Economics
  • Adrian Pekar - Budapest University of Technology and Economics & CUJO LLC

Contact

For questions about the paper or code:

Acknowledgments

Supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and Celtic-Next project RAI-6Green: Robust and AI Native 6G for Green Networks (C2023/1-9, funded by 2024-1.2.6-EUREKA-2024-00009).


Note: This repository contains the complete implementation and validation pipelines supporting the paper. All experimental results are reproducible using the provided code and methodology.

About

Supporting code for ECH-Resilient Malware Detection via Flow-Level Statistical Features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published