Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection

Supporting repository for the paper: "Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection" By Márton Pál Lipcsey-Magyar, Attila Ármin Madarász, and Adrian Pekar

Paper Abstract

The deployment of Encrypted Client Hello (ECH) challenges TLS fingerprinting, a widely used approach for encrypted malware detection, by encrypting the handshake fields these methods rely on. This paper presents a systematic evaluation of flow-based statistical features as a handshake-independent alternative to fingerprinting. Through validation against the official JA4+ implementation, we establish limitations in fingerprinting approaches for this corpus: only 64.9% of malware families possess unique signatures, placing an inherent ceiling on achievable recall in our evaluation. We evaluate flow-level features—packet counts, timing patterns, and size distributions—across 27 experimental configurations on a dataset of 16,542 flows spanning 101 families (59 malware and 42 benign applications). Random Forest classifiers using combined flow statistics and sequential packet length features achieve 98.11% F1-score for binary malware detection with 97.22% recall, substantially exceeding fingerprinting’s theoretical recall bound of 64.9%. For fine-grained family identification, we obtain 54.81% macro F1 across 101 classes and 48.71% macro F1 for malware-only attribution, demonstrating that flow-based methods retain meaningful discriminative power where fingerprinting abstains. Across all tasks, Random Forest consistently outperforms neural networks and k-NN, with performance gaps widening in complex multiclass scenarios. These findings highlight flow-based classification as a practical and reproducible approach that can help maintain network security visibility as ECH deployment progresses, showing that behavioral traffic patterns are expected to provide durable signals for detection even as handshake fields become encrypted.

Repository Structure

├── reproduce-research/          # Validation pipelines
│   ├── paper-pipeline/          # Reproduce using original author's data
│   ├── nfstream-pipeline/       # Reproduce using NFStream extraction
│   └── verify-ja4-calculation/  # JA4+ conformance validation
│
└── paper-code/                  # Main classification system (Python)

See paper-code/README.md for detailed usage instructions.

Key Results

Binary Classification (Malware vs Benign)

Model	Feature Set	Accuracy	Precision	Recall	F1-Score
Random Forest	Combined	97.07%	99.02%	97.22%	98.11%
Random Forest	Core	96.55%	98.66%	96.91%	97.78%
Random Forest	SPL	96.65%	98.74%	96.95%	97.84%
Neural Network	Combined	90.03%	94.39%	92.78%	93.58%

Full Multiclass (101 Families)

Model	Feature Set	Accuracy	Macro F1
Random Forest	Combined	61.62%	54.81%
Random Forest	Core	59.66%	52.39%
FAISS k-NN	Combined	43.97%	34.30%

Comparison with TLS Fingerprinting

Metric	TLS Fingerprinting (JA4+JA4S+SNI)	Flow-Based ML (RF+Combined)
Recall	≤64.9% (theoretical max)	97.22%
F1-Score	≤78.6%	98.11%
Handshake-Independent	No	Yes
Malware Coverage	64.9%	100%

Interactive Exploration

Open paper-code/notebooks/experiments.ipynb for an interactive notebook with all experiments, visualizations, and analysis.

Dataset

The experiments use the malware traffic dataset from:

Matoušek, P., Přívora, J., & Ryšavý, O. (2024). "TLS Traffic Analysis: Malware Classification with JA4+ Fingerprints"

Dataset characteristics:

16,542 flows across 101 families (59 malware, 42 benign)
Sources: Desktop malware, mobile malware, desktop apps, mobile apps
Authenticated and labeled network traces

Note: Processed CSV files with extracted features are available under reproduce-research/. For original PCAPs, refer to Matoušek et al..

Features

Core Flow Statistics (33 features)

Volumetric: Packet counts, byte volumes (bidirectional, src→dst, dst→src)
Temporal: Flow duration per direction
Statistical: Packet size distributions (min, mean, stddev, max)
Timing: Packet inter-arrival times (PIAT) distributions

Sequential Packet Lengths (25 features)

First 25 packet sizes in arrival order
Captures protocol-specific patterns
Early detection capability

Combined Feature Set (58 features)

Synergy between macro-level (flow stats) and micro-level (SPL) patterns
Best performance across all tasks

Experimental Design

3 Classification Tasks: Binary, Full Multiclass (101 classes), Malware-only (59 classes)
3 Feature Sets: Core (33), SPL (25), Combined (58)
3 ML Models: Neural Network, Random Forest, FAISS k-NN
Total: 27 experimental configurations
Reproducibility: Fixed random seeds (42), stratified 80/20 splits

Authors

Márton Pál Lipcsey-Magyar - Budapest University of Technology and Economics
Attila Ármin Madarász - Budapest University of Technology and Economics
Adrian Pekar - Budapest University of Technology and Economics & CUJO LLC

Contact

For questions about the paper or code:

Adrian Pekar: [email protected]

Acknowledgments

Supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and Celtic-Next project RAI-6Green: Robust and AI Native 6G for Green Networks (C2023/1-9, funded by 2024-1.2.6-EUREKA-2024-00009).

Note: This repository contains the complete implementation and validation pipelines supporting the paper. All experimental results are reproducible using the provided code and methodology.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
paper-code		paper-code
reproduce-research		reproduce-research
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection

Paper Abstract

Repository Structure

Key Results

Binary Classification (Malware vs Benign)

Full Multiclass (101 Families)

Comparison with TLS Fingerprinting

Interactive Exploration

Dataset

Features

Core Flow Statistics (33 features)

Sequential Packet Lengths (25 features)

Combined Feature Set (58 features)

Experimental Design

Authors

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

FlowFrontiers/MalwareDet-JA4vsFlowStats

Folders and files

Latest commit

History

Repository files navigation

Beyond JA4+: Flow Statistics vs. TLS Fingerprinting for Encrypted Malware Detection

Paper Abstract

Repository Structure

Key Results

Binary Classification (Malware vs Benign)

Full Multiclass (101 Families)

Comparison with TLS Fingerprinting

Interactive Exploration

Dataset

Features

Core Flow Statistics (33 features)

Sequential Packet Lengths (25 features)

Combined Feature Set (58 features)

Experimental Design

Authors

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages