You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RCAEval is an open-source Python framework for root cause analysis (RCA) methods using multimodal data. When failures or incidents occur in software systems, engineers must quickly identify root causes from massive amounts of observable data including time-series metrics, textual logs, and topological tracing data. RCAEval addresses the lack of standardized, reproducible tools and benchmarks in this domain by providing (1) ready-to-use RCA methods spanning metric-based, trace-based, and multi-source approaches, and (2) comprehensive datasets containing 735 failure cases collected from real-world software systems.
25
+
RCAEval is an open-source Python framework for root cause analysis (RCA) methods using multimodal data. When failures or incidents occur in large and dynamic systems, humans must quickly identify root causes from massive amounts of observable data including time-series metrics, textual logs, and topological tracing data. RCAEval addresses the lack of standardized, reproducible tools and benchmarks in this domain by providing (1) ready-to-use RCA methods spanning metric-based, trace-based, and multi-source approaches, and (2) comprehensive datasets containing 735 failure cases collected from real-world software systems.
26
26
27
27
RCAEval is the first framework to support many reproducible RCA tools and comprehensive benchmark datasets with diverse fault types and modality, enabling researchers to evaluate RCA methods under realistic conditions. The library is pip-installable, provides a simple Python API for running experiments, and includes standardized evaluation metrics (AC@k, Avg@k) for fair comparison across methods.
28
28
29
29
# Statement of need
30
30
31
-
Modern cloud applications generate massive amounts of telemetry data including metrics, logs, and traces[@Soldani2022rcasurvey]. When failures occur, they can propagate across multiple components, making it challenging for operators to identify root causes from the overwhelming volume of observable data. Root cause analysis (RCA) aims to pinpoint the faulty component and the specific indicators (e.g., CPU usage, error logs) responsible for the failure[@Soldani2022rcasurvey].
31
+
Modern large and dynamic systems generate massive amounts of observability data including metrics, logs, and traces. When failures occur, they can propagate across multiple components, making it challenging for operators to identify root causes from the overwhelming volume of observable data. Root cause analysis (RCA) aims to pinpoint the faulty component and the specific indicators responsible for the failure.
32
32
33
-
Despite growing research interest in automated RCA, the field lacks a standardized, reproducible benchmark. Existing studies typically evaluate on limited systems with few fault types, often using private datasets that prevent fair comparison[@pham2024root]. Available resources provide only single-modality analysis (e.g., metrics only) without support for multimodal data combining logs and traces. Commercial observability platforms offer RCA capabilities but are proprietary and not reproducible for research purposes.
33
+
Despite growing research interest in automated RCA, the field lacks a standardized, reproducible benchmark. Existing studies typically evaluate on limited systems with few fault types, often using private datasets that prevent fair comparison. Available resources provide only single-modality analysis (e.g., metrics only) without support for multimodal data combining logs and traces. Commercial observability platforms offer RCA capabilities but are proprietary and not reproducible for research purposes.
34
34
35
35
RCAEval fills this gap by providing: (1) three large-scale datasets with 735 failure cases across three software systems, covering resource faults (CPU, memory, disk), network faults (delay, packet loss), and code-level faults; (2) multimodal telemetry data including metrics, logs, and traces; (3) 15 reproducible RCA tools implementations including state-of-the-art methods, e.g., BARO [@pham2024baro] and CausalRCA [@Xin2023CausalRCA]; and (4) standardized evaluation metrics for consistent comparison.
36
36
37
-
RCAEval targets researchers developing new RCA algorithms, practitioners evaluating methods for production deployment, and educators teaching AIOps and site reliability engineering concepts.
37
+
RCAEval targets researchers developing new RCA algorithms, practitioners evaluating methods for production deployment, and educators teaching algorithms.
38
38
39
39
# State of the field
40
40
@@ -46,6 +46,6 @@ RCAEval distinguishes itself by providing the first open-source benchmark framew
46
46
47
47
# Acknowledgements
48
48
49
-
We acknowledge contributions from the open-source community and the developers of the software systems (Online Boutique, Sock Shop, Train Ticket) used in our benchmark datasets.
49
+
We would like to express our sincere gratitude to the researchers and developers who created the baselines used in our library. Their work has been instrumental in making this project possible. We deeply appreciate the time, effort, and expertise that have gone into developing and maintaining these resources. This project would not have been feasible without their contributions. This library is built upon my previous published work [@pham2024baro;@pham2024root;@pham2025rcaeval].
0 commit comments