Releases: pyg-team/pytorch-frame
0.3.0: Broader Compatibility & Usability Enhancements
What's Changed
- Add
Amphibianstotorch_frame/datasets.__init__.pyby @akihironitta in #504 - Update docs by @akihironitta in #506
- Fix minor code formatting in docs by @akihironitta in #507
return_stypeargument inTensorFrame.get_col_featby @rusty1s in #509- Add simple TabPFN example by @zechengz in #510
- Add
num_bytesutility by @rusty1s in #516 - Update .gitignore by @akihironitta in #518
- Support 0-dim tensors in tensor slicing by @rusty1s in #519
- Add default
dimtocat(similar totorch.cat) by @rusty1s in #521 - Fix NumPy and PyTorch incompatibility error in CI by @akihironitta in #525
- Support CatBoost in Python 3.13 by @akihironitta in #523
- Only trigger automerge workflow on opening a PR by @akihironitta in #527
- Support PyTorch 2.7 by @akihironitta in #528
- Enable
flake8-bugbearby @akihironitta in #530 - Migrate to modern logger interface by @emmanuel-ferdman in #537
- Fix auto-merge workflow by @akihironitta in #539
- Fix device mismatch in computing
num_rowson empty tensor frames by @rusty1s in #546 - Support PyTorch 2.8 by @akihironitta in #551
- Update dependabot.yml by @akihironitta in #552
- Drop support for Python 3.9 by @akihironitta in #558
- Add license information by @jamesmyatt in #534
- Support PyTorch 2.9 by @akihironitta in #574
New Contributors
- @emmanuel-ferdman made their first contribution in #537
- @jamesmyatt made their first contribution in #534
Full Changelog: 0.2.5...0.3.0
0.2.5: Python 3.13 and PyTorch 2.6 support
What's Changed
- Add support for PyTorch 2.6 by @akihironitta in #494
- Support Python 3.12 and Python 3.13 by @akihironitta in #496
- Add copy button for code in docs by @akihironitta in #489
- CI: Consolidate unit test CI workflows by @akihironitta in #493
- CI: Add concurrency to workflows triggered on PRs by @akihironitta in #495
- Let pre-commit fix formatting issue in master by @akihironitta in #498
- Automate package build and release by @akihironitta in #497
- Fix auto-merging bot PRs by @akihironitta in #501
- lint: switch
pyupgradeto Ruff's rule UP by @Borda in #499 - Prepare
0.2.5release by @akihironitta in #502
New Contributors
Full Changelog: 0.2.4...0.2.5
PyTorch Frame 0.2.4
What's Changed
- fix multicategorical stype inference and add test case by @yiweny in #420
- coorectly infer boolean stypes by @yiweny in #421
- support xgboost early stopping by @yiweny in #424
- Update testing torch version by @zechengz in #428
- Update Excelformer benchmark results on small binary and regression tasks by @zechengz in #427
- update xgboost numbers by @yiweny in #425
- Update excelformer benchmark results by @zechengz in #431
- Remove CUDA synchronizations by slicing input tensor with
intinstead of CUDA tensors innn.LinearEmbeddingEncoderby @akihironitta in #432 - Don't put assertions on N/A imputation correctness by @akihironitta in #433
- Don't create the same tensor every iteration in N/A handling by @akihironitta in #434
- chore: Update pre-commit by @akihironitta in #435
- Add benchmark results for large-scale multiclass classification task by @akihironitta in #436
- Fixed warning and added safe globals by @NeelKondapalli in #423
- fix error in xgboost by @puririshi98 in #443
- Add
is_floating_point()to multi tensors by @akihironitta in #445 - Fix size mismatch error when
CatToNumTransformsees only a subset of labels at test time by @akihironitta in #446 - add pytorch tabular benchmark by @yiweny in #398
- Compare more models across frame and tabular by @wsad1 in #444
- Add benchmark result from
ExcelFormeron a large-scale multi-class classification task by @akihironitta in #447 - Fail
torch.load(weights=True)gracefully by @akihironitta in #448 - Fix offset in LinearEmbeddingEncoder by @toenshoff in #455
- Fix docs build in CI by @akihironitta in #456
- Removing the deprecated
categorical_featureparameter fromlightgbm.train(...)function calls. by @drivanov in #454 - Tighten assert condition in graph break tests by @akihironitta in #458
- Update pytorch_tabular_benchmark.py by @wsad1 in #457
- Drop support for Python 3.8 by @akihironitta in #462
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #461
- Update benchmark numbers by @yiweny in #411
- Add support for PyTorch 2.5 by @akihironitta in #464
- Allow empty
TensorFramewith non-zero number of rows by @rusty1s in #466 - Support index select for empty
TensorFrameby @rusty1s in #467 - Consistent PyPI name
pytorch-frameby @akihironitta in #468 - Raise a friendly message when a
stris provided toTensorFrame(col_names_dict)instead of alist[str]by @akihironitta in #469 - Update README.md by @akihironitta in #471
- Materialize train test by @HoustonJ2013 in #472
- Add an example of training a tabular model on multiple GPUs by @akihironitta in #474
- Support
pin_memory()inMulti{Embedding,Nested}TensorandTensorFrameby @akihironitta in #437 - Run
MultiNestedTensortests on both CPU and GPU by @akihironitta in #476 - Optimize the
Tromptexample to reduce training time by ~30% by @akihironitta in #477 - Add dependabot and auto-merge PRs by dependabot once CI passes by @akihironitta in #478
- Bump tj-actions/changed-files from 41 to 45 by @dependabot in #479
- Bump codecov/codecov-action from 2 to 5 by @dependabot in #481
- Bump dangoslen/changelog-enforcer from 2 to 3 by @dependabot in #480
- Bump actions/labeler from 4 to 5 by @dependabot in #482
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #483
- Update
.pre-commit-config.yamlweekly by @akihironitta in #484 - Fix documentation build by @akihironitta in #486
- Label bot PRs
skip-changelogby @akihironitta in #487 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #485
- update version to
0.2.4by @weihua916 in #488
New Contributors
- @NeelKondapalli made their first contribution in #423
- @puririshi98 made their first contribution in #443
- @wsad1 made their first contribution in #444
- @HoustonJ2013 made their first contribution in #472
Full Changelog: 0.2.3...0.2.4
PyTorch Frame 0.2.3
What's Changed
- Fix
test_trompt.pyby @weihua916 in #373 - Add
torchmetricstopyproject.pyfull dependencies by @zechengz in #374 - Add light-weight MLP by @weihua916 in #372
- Handle label imbalance in binary classification tasks on text benchmark by @vid-koci in #376
- Fix
MLPnormalization argument by @weihua916 in #377 - Add retry to get OpenAI embeddings by @zechengz in #378
- Make
DataFrameTextBenchmarkscriptpos_weightoptional by @zechengz in #379 - Fix text dataset stats and benchmark materialize return by @zechengz in #380
- Add citation by @weihua916 in #383
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #382
- Update RAEDME by @zechengz in #384
- Fix README image size by @zechengz in #385
- Add PyTorch Frame paper link to readme by @zechengz in #386
- Make sure binary classification
FakeDatasethas both pos/neg labels by @weihua916 in #392 - Update the key implementation and corresponding compatibility for ExcelFormer by @jyansir in #391
- Better error message for
CatToNumTransformby @weihua916 in #394 - Fix
split_by_sepinmulticategoricalstype by @weihua916 in #395 - add support for autoinfer bool type by @yiweny in #399
- Add R2 metric by @rishabh-ranjan in #403
- [FutureWarn] Fix FutureWarning in
CategoricalTensorMapper. by @drivanov in #401 - fix readme link by @yiweny in #407
- update benchmark by @yiweny in #400
- Add
MovieLens 1Mdataset by @xnuohz in #397 - Fixing Bug in Version Handling. by @drivanov in #410
- update benchmark numbers by @yiweny in #408
- [UserWarning] Fixing UserWarnings in two tests. by @drivanov in #409
- fix embedding script by @yiweny in #412
- Allow column indexing with custom stypes by @rusty1s in #413
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #414
- Fix ExcelFormer Example Link by @crunai in #415
- Towards supporting
MultiCategoricalencoder for target in torchframe by @XinweiHe in #417 - Update to version
0.2.3by @weihua916 in #418
New Contributors
- @jyansir made their first contribution in #391
- @rishabh-ranjan made their first contribution in #403
- @drivanov made their first contribution in #401
- @crunai made their first contribution in #415
Full Changelog: 0.2.2...0.2.3
PyTorch Frame 0.2.2
This release introduces image_embedded stype to handle image columns, fixes bugs on MultiNestedTensor indexing, and makes efficiency improvements in terms of missing value imputation and categorical column encoders.
Added
- Avoided for-loop in
EmbeddingEncoder(#366) - Added
image_embeddedand one tabular image dataset (#344) - Added benchmarking suite for encoders (#360)
- Added dataframe text benchmark script (#354, #367)
- Added
DataFrameTextBenchmarkdataset (#349) - Added support for empty
TensorFrame(#339)
Changed
- Changed a workflow of Encoder's
na_forwardmethod resulting in performance boost (#364) - Removed ReLU applied in
FCResidualBlock(#368)
Fixed
PyTorch Frame 0.2.1
This PR makes the following fixes and extensions to 0.2.0.
Added
- Support more stypes in
LinearModelEncoder(#325) - Added
stype_encoder_dictto some models (#319) - Added HuggingFaceDatasetDict (#287)
Changed
- Supported decoder embedding model in
examples/transformers_text.py(#333) - Removed implicit clones in StypeEncoder (#286)
Fixed
PyTorch Frame 0.2.0
We are excited to announce the second release of PyTorch Frame πΆ
PyTorch Frame 0.2.0 is the cumulation of work from many contributors from and outside Kumo who have worked on features and bug-fixes for a total of over 120 commits since torch-frame==0.1.0.
PyTorch Frame is featured in the Relational Deep Learning paper and used as the encoding layer for PyG.
Kumo is also hiring interns working on cool deep learning projects. If you are interested, feel free to apply through this link.
If you have any questions or would like to contribute to PyTorch Frame, feel free to send a question at our slack channel.
Highlights
Support for multicategorical, timestamp,text_tokenized and embedding stypes
We have added support for four more semantic types. Adding the new stypes allows for more flexibility to encode raw data. To understand how to specify different semantic types for your data, you can take a look at the tutorial. We also added many new StypeEncoder for the different new semantic types.
Integration with Large Language Models
We now support two types of integration with LLMs--embedding and fine-tuning.
You can use any embeddings generated by LLMs with PyTorch Frame, either by directly feeding the embeddings as raw data of embedding stype or using text as raw data of text_embedded stype and specifying the text_embedder for each column. Here is an example of how you can use PyTorch Frame with text embeddings generated by OpenAI, Cohere, VoyageAI and HuggingFace transformers.
text_tokenized enables users to fine-tune Large Language Models on text columns, along with other types of raw tabular data, on any downstream task. In this example, we fine-tuned both the full distilbert-base-uncased model and with LoRA.
More Benchmarks
We added more benchmark results in the benchmark section. LightGBM is included in the list of GBDTs that we compare with the deep learning models. We did initial experiments on various LLMs as well.
Breaking Changes
text_tokenized_cfgandtext_embedder_cfgare renamed tocol_to_text_tokenized_cfgandcol_to_text_embedder_cfgrespectively (#257). This allows users to specify different embedders, tokenizers for different text columns.- Now
Tromptoutputs 2-dim embeddings inforward.
Features
-
We now support the following new encoders:
LinearEmbeddingEncoderforembeddingstype,TimestampEncoderfortimestampstype andMultiCategoricalEmbeddingEncoderformulticategoricalstype. -
LightGBMis added to GDBTs module. -
Auto-inference of stypes from raw DataFrame columns is supported through
infer_df_stypefunction. However, the correctness of the inference is not guaranteed and we suggest you to double-check.
Bugfixes
We fixed the in_channels calculation of ResNet(#220) and improved the overall user experience on handling dirty data (#171 #234 #264).
Full Changelog
Full Changelog: 0.1.0...0.2.0
PyTorch Frame 0.1.0
We are excited to announce the initial release of PyTorch Frame πππ
PyTorch Frame is a deep learning extension for PyTorch, designed for heterogeneous tabular data with different column types, including numerical, categorical, time, text, and images.
To get started, please refer to:
- our README.md for the overview of PyTorch Frame,
- "Introduction by Example" tutorial and its code at
examples/tutorial.pyto get started with using PyTorch Frame, and - "Modular Design of Deep Tabular Models" tutorial in our documentation and the existing implementations in
torch_frame/nn/models/directory to create your own PyTorch Frame model for tabular data.
Highlights
Models, datasets and examples
In our initial release, we introduce 6 models, 9 feature encoders, 5 table convolution layers, 3 decoders, and 14 datasets.
-
Models
Trompt: "Trompt: Towards a Better Deep Neural Network for Tabular Data" (examples/trompt.py)FTTransformer: "Revisiting Deep Learning Models for Tabular Data" (examples/ft_transformer_text.py)ExcelFormer: "ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data" (examples/excelformer.py)TabNet: "TabNet: Attentive Interpretable Tabular Learning" (examples/tabnet.py)Resnet: "Revisiting Deep Learning Models for Tabular Data" (examples/revisiting.py)TabTransformer: "TabTransformer: Tabular Data Modeling Using Contextual Embeddings" (examples/tab_transformer.py)
-
Encoders
FeatureEncoderStypeWiseFeatureEncoderStypeEncoderEmbeddingEncoderLinearEncoderLinearBucketEncoder: "On Embeddings for Numerical Features in Tabular Deep Learning"LinearPeriodicEncoder: "On Embeddings for Numerical Features in Tabular Deep Learning"ExcelFormerEncoder: "ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data"StackEncoder
-
Table Convolution Layers
TableConvFFTransformerConvs: "Revisiting Deep Learning Models for Tabular Data"TromptConv: "Trompt: Towards a Better Deep Neural Network for Tabular Data"ExcelFormerConv: "ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data"TabTransformerConv: "TabTransformer: Tabular Data Modeling Using Contextual Embeddings"
-
Decoders
-
Datasets:
AdultCensusIncome,BankMarketing,DataFrameBenchmark,Dota2,FakeDataset,ForestCoverType,KDDCensusIncome,Mercari,MultimodalTextBenchmark,Mushroom,PokerHand,TabularBenchmark,Titanic,Yandex
Benchmarks
With our initial set of models and datasets under torch_frame.nn and torch_frame.datasets, we benchmarked their performance on binary classification and regression tasks. The row denotes the model names and the column denotes the dataset idx. In each cell, we include the mean and standard deviation of the model performance, as well as the total time spent, including Optuna-based hyper-parameter search and final model training.
Note
- For the latest benchmark scripts and results, see [
benchmark/](https://github.com/pyg-team/pytorch-frame/tree/master/benchmark#leaderbo...