Add joss paper

phamquiluan · phamquiluan · commit c5c191fdd067 · 2025-12-09T22:41:54.000+11:00
diff --git a/paper.bib b/paper.bib
@@ -0,0 +1,67 @@
+@article{Soldani2018microservice,
+  title = {The pains and gains of microservices: A Systematic grey literature review},
+  journal = {Journal of Systems and Software},
+  volume = {146},
+  pages = {215-232},
+  year = {2018},
+  doi = {10.1016/j.jss.2018.09.016},
+  author = {Jacopo Soldani and Damian Andrew Tamburri and Willem-Jan {Van Den Heuvel}}
+}
+
+@article{Soldani2022rcasurvey,
+  author = {Soldani, Jacopo and Brogi, Antonio},
+  title = {Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A Survey},
+  year = {2022},
+  volume = {55},
+  number = {3},
+  doi = {10.1145/3501297},
+  journal = {ACM Computing Surveys}
+}
+
+
+@inproceedings{pham2024baro,
+  title = {BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection},
+  author = {Pham, Luan and Ha, Huong and Zhang, Hongyu},
+  booktitle = {Proceedings of the ACM on Software Engineering},
+  volume = {1},
+  number = {FSE},
+  pages = {2214--2237},
+  year = {2024},
+  doi = {10.1145/3660810}
+}
+
+@article{Xin2023CausalRCA,
+  title = {CausalRCA: Causal inference based precise fine-grained root cause localization for microservice applications},
+  journal = {Journal of Systems and Software},
+  volume = {203},
+  pages = {111724},
+  year = {2023},
+  doi = {10.1016/j.jss.2023.111724},
+  author = {Ruyue Xin and Peng Chen and Zhiming Zhao}
+}
+
+@inproceedings{pham2024root,
+  title = {Root Cause Analysis for Microservices based on Causal Inference: How Far Are We?},
+  author = {Pham, Luan and Ha, Huong and Zhang, Hongyu},
+  booktitle = {Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering},
+  pages = {706--715},
+  year = {2024},
+  doi = {10.1145/3691620.3695063}
+}
+
+@inproceedings{pham2025rcaeval,
+  title = {RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data},
+  author = {Pham, Luan and Zhang, Hongyu and Ha, Huong and Salim, Flora and Zhang, Xiuzhen},
+  booktitle = {Companion Proceedings of the ACM Web Conference 2025},
+  pages = {777--780},
+  year = {2025},
+  doi = {10.1145/3701716.3715290}
+}
+
+@inproceedings{altenbernd2025amocrca,
+  title={Amocrca: at most one change segmentation and relative correlation ranking for root cause analysis},
+  author={Altenbernd, Anton and Wu, Zhiyuan and Kao, Odej},
+  booktitle={Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering},
+  pages={1386--1393},
+  year={2025}
+}
diff --git a/paper.md b/paper.md
@@ -0,0 +1,51 @@
+---
+title: 'RCAEval: A Benchmark for Multimodal Root Cause Analysis'
+tags:
+  - Python
+  - root cause analysis
+  - multimodal data
+  - benchmark
+  - telemetry data
+  - causal inference
+authors:
+  - name: Luan Pham
+    orcid: 0000-0001-7243-3225
+    affiliation: "1"
+affiliations:
+  - name: RMIT University, Australia
+    index: 1
+date: 10 December 2025
+bibliography: paper.bib
+repository: https://github.com/phamquiluan/RCAEval
+archive_doi: 10.5281/zenodo.15616876
+---
+
+# Summary
+
+RCAEval is an open-source Python framework for root cause analysis (RCA) methods using multimodal data. When failures or incidents occur in software systems, engineers must quickly identify root causes from massive amounts of observable data including time-series metrics, textual logs, and topological tracing data. RCAEval addresses the lack of standardized, reproducible tools and benchmarks in this domain by providing (1) ready-to-use RCA methods spanning metric-based, trace-based, and multi-source approaches, and (2) comprehensive datasets containing 735 failure cases collected from real-world software systems.
+
+RCAEval is the first framework to support many reproducible RCA tools and comprehensive benchmark datasets with diverse fault types and modality, enabling researchers to evaluate RCA methods under realistic conditions. The library is pip-installable, provides a simple Python API for running experiments, and includes standardized evaluation metrics (AC@k, Avg@k) for fair comparison across methods.
+
+# Statement of need
+
+Modern cloud applications generate massive amounts of telemetry data including metrics, logs, and traces [@Soldani2022rcasurvey]. When failures occur, they can propagate across multiple components, making it challenging for operators to identify root causes from the overwhelming volume of observable data. Root cause analysis (RCA) aims to pinpoint the faulty component and the specific indicators (e.g., CPU usage, error logs) responsible for the failure [@Soldani2022rcasurvey].
+
+Despite growing research interest in automated RCA, the field lacks a standardized, reproducible benchmark. Existing studies typically evaluate on limited systems with few fault types, often using private datasets that prevent fair comparison [@pham2024root]. Available resources provide only single-modality analysis (e.g., metrics only) without support for multimodal data combining logs and traces. Commercial observability platforms offer RCA capabilities but are proprietary and not reproducible for research purposes.
+
+RCAEval fills this gap by providing: (1) three large-scale datasets with 735 failure cases across three software systems, covering resource faults (CPU, memory, disk), network faults (delay, packet loss), and code-level faults; (2) multimodal telemetry data including metrics, logs, and traces; (3) 15 reproducible RCA tools implementations including state-of-the-art methods, e.g., BARO [@pham2024baro] and CausalRCA [@Xin2023CausalRCA]; and (4) standardized evaluation metrics for consistent comparison.
+
+RCAEval targets researchers developing new RCA algorithms, practitioners evaluating methods for production deployment, and educators teaching AIOps and site reliability engineering concepts.
+
+# State of the field
+
+Several tools and datasets exist for RCA, but none provide comprehensive coverage of multimodal telemetry with reproducible methods. Existing libraries focus on metric-based RCA, supporting methods like Bayesian networks and Granger causality, but lack support for log and trace analysis. Available datasets provide limited fault types and no benchmarking framework for systematic evaluation. Commercial observability platforms offer automated root cause analysis features, but their proprietary nature prevents reproducible research comparisons.
+
+RCAEval distinguishes itself by providing the first open-source benchmark framework that combines 15 reproducible RCA tools [@pham2025rcaeval] with large-scale datasets with multimodal observability data. This enables fair, systematic comparison of RCA methods under realistic failure scenarios.
+
+![Overview of RCAEval benchmark framework.\label{fig:overview}](docs/readme.jpg){ width=100% }
+
+# Acknowledgements
+
+We acknowledge contributions from the open-source community and the developers of the software systems (Online Boutique, Sock Shop, Train Ticket) used in our benchmark datasets.
+
+# References