This is the source code of paper DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation
- Install python3.12
- Install required packages in requirements.txt
python install -r requirements.txt
- Install our sampler package that response for generating training data dynamically during training
python install ./MySampler/setup.py install
- Put the JOB datasets into
./datasets/job
, all csv table should have headers - You can update the true cards by first removing all
{wokload}.pkl
file in./queries
and run the./queries/ConvertMSCNTestWorkload.py
, which will automatically calculates true cards and convert the test workloads to MSCN's format - Use
./queries/GetJoinWithoutPredicatesCard.py
to pre-calculates the cardinality of queries' join schemas if needed
- Use
./Configs/IMDB/IMDB.yaml
to set experiments, or you can use the default one to perform our experiments in the paper
- Run
python train.py
- Copy the
exp mark
in the output for latter testing, which is a timestamp
- Run
python eval-IMDB-all.py --config=IMDB --no_wandb
and enter theexp mark
to evaluate the workloads configurated in theIMDB.yaml
file, it will cover all five join conditions on that workload - Check the results in the output and the ./results/DistJoin