Here is a brief overview of library with links to the detailed descriptions.
Library modules:
-
ptls.preprocessing- transforms data toptls-compatible format withpandasorpyspark. Categorical encoding, datetime transformation, numerical feature preprocessing. -
ptls.data_load- all that you need for prepare your data to training and validation.ptls.data_load.datasets- PyTorchDatasetAPI implementation for data access.ptls.data_load.iterable_processing- generator-style filters for data transformation.ptls.data_load.augmentations- functions for data augmentation.
-
ptls.frames- tools for training encoders with popular frameworks like CoLES, SimCLR, CPC, VICReg, ...ptls.frames.coles- Contrastive leaning on sub-sequences.ptls.frames.cpc- Contrastive learning for future event state prediction.ptls.frames.bert- methods, inspired by NLP and transformer models.ptls.framed.supervised- modules fo supervised training.ptls.frames.inference- inference module.
-
ptls.nn- layers for model creation:ptls.nn.trx_encoder- layers to produce the representation for a single transactions.ptls.nn.seq_encoder- layers for sequence processing, likeRNNofTransformer.ptls.nn.pb-PaddedBatchcompatible layers, similar totorch.nnmodules, but works withptls-data.ptls.nn.head- composite layers for final embedding transformation.ptls.nn.seq_step.py- change the sequence along the time axis.ptls.nn.binarization,ptls.nn.normalization- other groups of layers.
- Prepare your data.
- Use
Pysparkin local or cluster mode for big dataset andPandasfor small. - Split data into required parts (train, valid, test, ...).
- Use
ptls.preprocessingfor simple data preparation. - Transform features to compatible format using
PysparkorPandasfunctions. You can also useptls.data_load.preprocessingfor common data transformation patterns. - Split sequences to
ptls-dataformat withptls.data_load.split_tools. Save prepared data intoParquetformat or keep it in memory (Picklealso works). - Use one of the available
ptls.data_load.datasetsto define input for the models.
- Use
- Choose framework for encoder train.
- There are both supervised of unsupervised frameworks in
ptls.frames. - Keep in mind that each framework requires its own batch format. Tools for batch collate can be found in the selected framework package.
- There are both supervised of unsupervised frameworks in
- Build encoder.
- All parts are available in
ptls.nn. - You can also use pretrained layers.
- All parts are available in
- Train your encoder with selected framework and
pytorch_lightning.- Provide data with one of the DataLoaders that is compatible with selected framework.
- Monitor the progress on tensorboard.
- Optionally tune hyperparameters.
- Save trained encoder for future use.
- You can use it as single solution (e.g. get class label probabilities).
- Or it can be a pretrained part of other neural network.
- Use encoder in your project.
- Run predict for your data and get logits, probas, scores or embeddings.
- Use
ptls.data_loadandptls.data_load.datasetstools to keep your data transformation and collect batches for inference.
It is possible create specific component for every library modules. Here are the links to the detailed description: