All notable changes to this project will be documented in this file.
- Add contrastive loss pre-training for graph-level datasets like PCQM4M-v2
- Add inferring graph-level embeddings with model trained with contrastive loss
- Code refactoring for edge-level example
ogbl-ppa
- Add generation functionality analogous to discrete diffusion LM after pre-trained with
pretrain-mlmobjective
- Manage configurations using
omegaconf,hydraand yaml files.
- Upgrade
pythonto 3.10 - Upgrade
pytorchto 2.5.1 - Upgrade
transformersto 4.53.3
- Refactor model architectures
- Release 4 checkpoints for the PCQM4M-v2 dataset in ModelScope
- Code refactoring.
- Update paper.
- Add edge-level example
ogbl-citation2andogbl-wikikg2 - Add node-level example
ogbn-products
- Code refactoring.
- Update README to include details of Eulerian sequence
- Add drop path to regularize large models, and it works quite well for deep models
- Add one package dependency:
timm, to implement EMA - Update README to include details of Eulerian sequence and cyclic node re-index.
- Code refactoring.
- Tokenization config json refactoring.
- Update vocab by adding some special tokens, e.g.,
<bos>,<new>,<mask>and etc. - Turn off optimizer offload in deepspeed config to boost the training speed.
- Add toy examples using dataset
TUDataset/reddit_threads. - Add
src/utils/dataset_utils.py::StructureDatasetto generate random non-attributed graphs. for structural understanding pre-training
- Add a new type of node re-indexing:
cyclic.
- Add a new pre-train objective, i.e.,
Masked Language Modelingfrom Bert.
- Code refactoring.
- Upgrade two package dependencies:
deepspeed==0.14.0andtransformers==4.38.2. - Add one package dependency:
tensorboardX.