Skip to content

automl/tabularpriors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tabularpriors

A python module implementing interfaces for various public tabular priors.

You can use tabularpriors as a command-line-tool to pre-generate data from a prior, e.g. via

python -m tabularpriors --lib tabicl \
       --num_batches 1000 --batch_size 4 \
       --min_features 3 --max_features 3 \
       --max_seq_len 50 --max_classes 3 \
       --save_path tabicl_4k_50x3.h5

which can afterwards be loaded via

from tabularpriors.dataloader import PriorDumpDataLoader
prior = PriorDumpDataLoader('tabicl_4k_50x3.h5', num_steps=20, batch_size=4, device='cpu')

You can also just let it create the data on-the-fly via:

from tabularpriors.dataloader import TabICLPriorDataLoader
prior = TabICLPriorDataLoader(
    num_steps=20,
    batch_size=4,
    num_datapoints_max=50,
    min_features=3,
    max_features=3,
    max_num_classes=3,
    device='cpu'
)

You can check out next(iter(prior)) if you want to see an example batch.

Check out visualization_demo.ipynb for some more examples.

Supported Priors

Future work

We are planning to extend this repository by

  • adding interfaces for more priors (e.g. TabPFNv1, TabForestPFN)
  • enable easy mixing of different priors
  • improving the storage format for pre-generated datasets
    • supporting varying number of datapoints
    • efficiently storing/loading tables with the same shape

About

A python module implementing interfaces for various public tabular priors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •