CausalCompass is a flexible and extensible benchmark suite for evaluating the robustness of time-series causal discovery (TSCD) methods under misspecified modeling assumptions.
- Abstract
- Key Features
- Data Generation
- Benchmark Scenarios
- Running Experiments
- Result Analysis
- Project Structure
- Citation
- License
- Contributing
- Contact
Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark suite designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications.
- 8 assumption-violation scenarios: Confounders, nonstationarity, measurement error, standardization, missing data, mixed data, min-max normalization, and trend/seasonality
- 2 vanilla models: VAR (linear) and Lorenz-96 (nonlinear)
- 11 TSCD algorithms spanning 6 major methodological categories:
- Granger causality-based: VAR, LGC
- Constraint-based: PCMCI
- Noise-based: VARLiNGAM
- Score-based: DYNOTEARS, NTS-NOTEARS
- Topology-based: TSCI
- Deep learning-based: cMLP, cLSTM, CUTS, CUTS+
- Rigorous experimental protocols:
- Multiple random seeds for statistical reliability
- Comprehensive hyperparameter grids
- Automated infrastructure:
- Shell scripts for reproducible experiment execution
- LaTeX table generation for publication-ready results
- Origin-compatible data export for radar plots
All datasets can be generated using the scripts in the data_generation/ directory.
Generate all datasets for a specific scenario:
cd data_generation
# Vanilla datasets
python vanilla.py
# Assumption violation scenarios
python confounder.py
python measurement_error.py
python missing.py
python mixed_data.py
python nonstationary.py
python standardized.py # Includes z-score and min-max normalization
python trendseason.pyfrom data_generation.measurement_error import simulate_var_with_measure_error
# Generate VAR data with measurement error
p = 10 # Number of variables
T = 1000 # Time steps
lag = 3 # Lag order
gamma = 1.2 # Error variance = 1.2 × data variance
seed = 0 # Random seed for reproducibility
data, beta, gc = simulate_var_with_measure_error(
p=p, T=T, lag=lag, gamma=gamma, seed=seed
)
print(f"Data shape: {data.shape}") # (1000, 10)
print(f"Ground truth GC: {gc.shape}") # (10, 10)Generated datasets will be saved in the following structure:
datasets/
├── vanilla/
├── confounder/
├── measurement_error/
├── missing/
├── mixed_data/
├── nonstationary/
├── standardized/
└── trendseason/
The generated datasets follow the naming convention:
[scenario]_[params]_[model]_p[p]_T[T]_[optional]_seed[seed].npz
Example: confounder_rho0.5_VAR_p10_T1000_seed0.npz
Each .npz file contains:
data: Time series observations (T × D)gc: Ground truth causality graph (D × D)- Additional scenario-specific metadata
The datasets/ directory contains sample datasets. Complete datasets can be generated using the provided scripts.
For convenience and reproducibility, the complete datasets archive is publicly available at Google Drive.
Standard VAR and Lorenz-96 systems without assumption violations.
Hidden confounders create spurious correlations between observed variables.
Gaussian noise proportional to data variance is added to observations.
Random missing values with specified probability, interpolated using zero-order hold.
Mixture of continuous and discrete variables.
Time-varying noise variance and time-varying coefficients.
Z-score and min-max normalization applied to time series.
Trends and seasonal patterns added to observations.
Run all TSCD algorithms automatically using the provided shell scripts:
# Navigate to scripts directory
cd scripts
# Run all experiments (11 algorithms)
chmod +x run_all.sh
./run_all.sh
# Or run individual algorithms
chmod +x run_*.sh
./run_var.sh # VAR
./run_lgc.sh # LGC
./run_pcmci.sh # PCMCI
./run_varlingam.sh # VARLiNGAM
./run_dynotears.sh # DYNOTEARS
./run_ntsnotears.sh # NTS-NOTEARS
./run_tsci.sh # TSCI
./run_ngc.sh # NGC (cMLP and cLSTM)
./run_cuts.sh # CUTS
./run_cutsplus.sh # CUTS+
The run_all.sh script orchestrates all 11 algorithms and handles:
- Automatic error detection and reporting
- Progress tracking with timestamps
- Failed script counting and exit code management
Note: Results are saved in JSON format with performance metrics (AUPRC, AUROC) and hyperparameter configurations.
Convert experimental results to publication-ready LaTeX tables:
python result2latex.py
This generates:
- Comparison tables across all scenarios and methods
- Performance metrics (AUPRC/AUROC) with best results highlighted
- Separate tables for VAR and Lorenz-96 with different parameters
Output files: table_VAR_p10_T1000.tex, table_Lorenz_p10_T1000_F10.tex, etc.
Export results for radar plots and visualization:
python generate_origin_tables.py
These scripts generate .txt files compatible with Origin for creating:
- Radar plots comparing method performance across scenarios
- Hyperparameter sensitivity visualizations
CausalCompass/
│
├── algs/ # Algorithm implementations
│ ├── cuts/ # CUTS implementation
│ ├── cutsplus/ # CUTS+ implementation
│ ├── lgc/ # LGC implementation
│ ├── ngc/ # NGC implementation
│ ├── ntsnotears/ # NTS-NOTEARS implementation
│ ├── tsci/ # TSCI implementation
│ ├── var/ # VAR implementation
│ ├── varlingam/ # VARLiNGAM implementation
│ └── __init__.py # Package initialization
│
├── data_generation/ # Data generation scripts
│ ├── vanilla.py # VAR and Lorenz-96
│ ├── confounder.py # Confounders scenario
│ ├── measurement_error.py # Measurement error scenario
│ ├── missing.py # Missing data scenario
│ ├── mixed_data.py # Mixed data scenario
│ ├── non_gaussian.py # Non-Gaussian noise scenario
│ ├── nonstationary.py # Nonstationarity scenario
│ ├── standardized.py # z-score and min-max scenario
│ └── trendseason.py # Trend and seasonality scenario
│
├── datasets/ # Sample datasets (fully reproducible via scripts)
│ └── [scenario]/ # Organized by scenario
│
├── scripts/ # Experiment execution scripts
│ ├── run_all.sh # Master script to run all experiments
│ ├── run_var.sh # VAR experiments
│ ├── run_lgc.sh # LGC experiments
│ ├── run_pcmci.sh # PCMCI experiments
│ ├── run_varlingam.sh # VARLiNGAM experiments
│ ├── run_dynotears.sh # DYNOTEARS experiments
│ ├── run_ntsnotears.sh # NTS-NOTEARS experiments
│ ├── run_tsci.sh # TSCI experiments
│ ├── run_ngc.sh # NGC experiments
│ ├── run_cuts.sh # CUTS experiments
│ └── run_cutsplus.sh # CUTS+ experiments
│
├── result2latex.py # Generate LaTeX tables from results
├── generate_origin_tables.py # Generate Origin data files
│
└── README.md # This file
If you use this code or datasets in your research, please cite:
@misc{yi2026causalcompass,
title = {{CausalCompass}: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios},
author = {Yi, Huiyang and Shen, Xiaojian and Wu, Yonggang and Chen, Duxin and Wang, He and Yu, Wenwu},
year = {2026},
note = {Under review as a conference paper}
}Note: The final bibliographic information (e.g., venue and proceedings details) will be updated upon paper acceptance.
- The code in this repository is released under the MIT License.
- The datasets generated and provided by this repository are released under the CC BY 4.0 License.
Contributions are welcome! If you encounter bugs, have suggestions for improvements, or would like to extend CausalCompass with additional assumption-violation scenarios or evaluation protocols, please feel free to open an issue or submit a pull request.
For questions or issues, please:
- Open an issue in this repository
- Email: yihuiyang@seu.edu.cn
