Skip to content

Zehong-Wang/Awesome-Weight-Space-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Weight-Space-Learning Awesome

This repo will be continuously updated. Don't forget to star it and keep tuned!

Weight Space Learning

Weight Space Learning is a research perspective that shifts focus from studying neural networks only through their input–output functions to directly analyzing and leveraging their parameters. Unlike conventional training, which treats weights merely as optimization variables, weight space learning regards them as a meaningful domain of study and operation. Existing works in this area can be organized along three complementary dimensions: (1) weight space understanding, which investigates the geometry, symmetry, and statistical properties of weights; (2) weight space discrimination, which treats weights as a modality for tasks such as embedding, retrieval, and behavior prediction; and (3) weight space generation, which explores how new parameters can be produced via generative models, hypernetworks, or model merging. This framing highlights weight space learning as distinct from function-space or purely optimization-centric views, aiming to build a systematic foundation for reasoning about, operating on, and reusing neural network parameters.

Table of Contents

Weight Space Understanding

Structural Foundations

Invariance

  • [ICML 24] Improved generalization of weight space networks via augmentations [PDF] [Code]
  • [NeurIPS-NeurReps 23] Data Augmentations in Deep Weight Spaces [PDF]
  • [2021] Lossless Compression of Structured Convolutional Models via Lifting [PDF] [Code]
  • [ICLR 23] Git Re-Basin: Merging Models modulo Permutation Symmetries [PDF] [Code]
  • [ICLR 22] The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks [PDF] [Code]
  • [2022] Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape [PDF]
  • [ICML 21] Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances [PDF]
  • [2025] Understanding Mode Connectivity via Parameter Space Symmetry [PDF]

Equivariance

  • [2021] Universal approximation and model compression for radial neural networks [PDF]
  • [2025] Generalized Linear Mode Connectivity for Transformers [PDF]
  • [ICML 23] Equivariant Architectures for Learning in Deep Weight Spaces [PDF] [Code]
  • [NeurIPS 23] Permutation Equivariant Neural Functionals [PDF] [Code]
  • [NeurIPS 24] Universal neural functionals [PDF] [Code]
  • [CVPR 25] Few-shot Implicit Function Generation via Equivariance [PDF] [Code]

Practical Implications

Model Compression

  • [2021] Lossless Compression of Structured Convolutional Models via Lifting [PDF] [Code]
  • [2021] Universal approximation and model compression for radial neural networks [PDF]
  • [CVPR 21] Permute, quantize, and fine-tune: Efficient compression of neural networks [PDF]
  • [MIPR 24] TQCompressor: improving tensor decomposition methods in neural networks via permutations [PDF]
  • [ICLR 24] Merge, Then Compress: Demystify Efficient {SM}oE with Hints from Its Routing Policy [PDF]

Model Optimization

  • [TMLR 23] Weight-balancing fixes and flows for deep learning [PDF]
  • [NeurIPS 15] Path-sgd: Path-normalized optimization in deep neural networks [PDF] [Code]
  • [ICLR 19] G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space [PDF]
  • [PR 20] Projection based weight normalization: Efficient method for optimization on oblique manifold in DNNs [PDF]
  • [UAI 22] Accelerating training of batch normalization: A manifold perspective [PDF]
  • [Mathematics 23] Neural optimizer adaptations for weight spaces [PDF]
  • [NeurIPS 20] Optimizing deep models: practical methods [PDF] [Code]

Weight Space Augmentation

  • [NeurIPS-NeurReps 23] Data Augmentations in Deep Weight Spaces [PDF]
  • [ICML 24] Improved generalization of weight space networks via augmentations [PDF] [Code]
  • [CVPR 25] Few-shot Implicit Function Generation via Equivariance [PDF] [Code]
  • [ICML 24] Equivariant Deep Weight Space Alignment [PDF] [Code]

Weight Space Representation

Representation Approaches

Model-based

  • [ECAL 20] Classifying the classifier: dissecting the weight space of neural networks [PDF] [Code]
  • Predicting Neural Network Accuracy from Weights [PDF] [Code]
  • [Natural Communications 21] Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data [PDF] [Code]
  • [ICML 23] Equivariant Architectures for Learning in Deep Weight Spaces [PDF] [Code]
  • [NeurIPS 23] Permutation Equivariant Neural Functionals [PDF] [Code]
  • [NeurIPS 24] Universal neural functionals [PDF] [Code]
  • [NeurIPS 17] Deep sets [PDF] [Code]
  • [ICML 18] Deep models of interactions across sets [PDF]
  • [NeurIPS 23] Permutation Equivariant Neural Functionals [PDF] [Code]
  • [NeurIPS 21] Self-supervised representation learning on neural network weights for model characteristic prediction [PDF] [Code]
  • [NeurIPS 23] Neural Functional Transformers [PDF] [Code]
  • [ICML 25] Equivariant Polynomial Functional Networks [PDF] [Code]
  • [NeurIPS 24] Monomial matrix group equivariant neural functional networks [PDF] [Code]
  • [ICML 25] Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion [PDF] [Code]
  • [ICLR 25] Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations [PDF] [Code]
  • [ICLR 23] NeRN: Learning Neural Representations for Neural Networks [PDF] [Code]
  • [NeurIPS 24] Set-based Neural Network Encoding Without Weight Tying [PDF]
  • [ICML-TAGML 23] On genuine invariance learning without weight-tying [PDF]
  • [ICLR 24] Graph Neural Networks for Learning Equivariant Representations of Neural Networks [PDF] [Code]
  • [ICLR 24] Graph Metanetworks for Processing Diverse Neural Architectures [PDF]
  • [NeurIPS 24] Scale equivariant graph metanetworks [PDF] [Code]
  • [2025] Weight Space Representation Learning on Diverse NeRF Architectures [PDF]

Model-free

  • [2025] Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights [PDF]
  • [2024] Deep Linear Probe Generators for Weight Space Learning [PDF]
  • [ICLR 24] Graph Neural Networks for Learning Equivariant Representations of Neural Networks [PDF] [Code]
  • [ICML 24] Learning Useful Representations of Recurrent Neural Network Weight Matrices [PDF]
  • [CVPR 25] Learning on Model Weights using Tree Experts [PDF] [Code]

Practical Implications

Function Prediction

  • [ECAL 20] Classifying the Classifier: Dissecting the Weight Space of Neural Networks [PDF] [Code]
  • [2020] Predicting Neural Network Accuracy from Weights [PDF] [Code]
  • [ICLR 24] Graph Metanetworks for Processing Diverse Neural Architectures [PDF]
  • [ICLR 24] Graph Neural Networks for Learning Equivariant Representations of Neural Networks [PDF] [Code]
  • [2024] Deep Linear Probe Generators for Weight Space Learning [PDF]
  • [NeurIPS 22] Model Zoos: A Dataset of Diverse Populations of Neural Network Models [PDF] [Code]
  • [ICLR-SNN 23] Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models [PDF] [Code]
  • [ICML 24] Towards Scalable and Versatile Weight Space Learning [PDF] [Code]
  • [2025] Learning Model Representations Using Publicly Available Model Hubs [PDF]

Model Retrieval

  • [2025] Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights [PDF]
  • [CVPR 25] Learning on Model Weights using Tree Experts [PDF] [Code]
  • [2025] We Should Chart an Atlas of All the World's Models [PDF] [Code]

Model Editing

  • [NeurIPS 24] Interpreting the Weight Space of Customized Diffusion Models [PDF] [Code]
  • [ICML 24] Towards Scalable and Versatile Weight Space Learning [PDF] [Code]
  • [NeurIPS 23] Permutation Equivariant Neural Functionals [PDF] [Code]
  • [ICLR 24] Graph Metanetworks for Processing Diverse Neural Architectures [PDF]
  • [ICLR 24] Graph Neural Networks for Learning Equivariant Representations of Neural Networks [PDF] [Code]
  • [NeurIPS 23] Neural Functional Transformers [PDF] [Code]

Weight Space Generation

Generation Approaches

Hypernetworks

  • [ICLR 17] HyperNetworks [PDF] [Code]
  • [2017] Bayesian Hypernetworks [PDF]
  • [2017] Implicit weight uncertainty in neural networks [PDF] [Code]
  • [ICLR 18] SMASH: One-Shot Model Architecture Search through HyperNetworks [PDF] [Code]
  • [ICLR 19] Graph HyperNetworks for Neural Architecture Search [PDF]
  • [NeurIPS 21] Parameter prediction for unseen deep architectures [PDF] [Code]
  • [ICLR 20] Continual learning with hypernetworks [PDF] [Code]
  • [ECCV 20] DHP: Differentiable Meta Pruning via HyperNetworks [PDF] [Code]
  • [CVPR 22] Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection [PDF]
  • [TIP 24] Learning to Generate Parameters of ConvNets for Unseen Image Data [PDF] [Code]
  • [ICCV 19] Deep meta functionals for shape representation [PDF]
  • [ICML 21] Personalized Federated Learning using Hypernetworks [PDF] [Code]
  • [NeurIPS-ML 21] Meta-learning via hypernetworks [PDF]
  • [CVPR 21] Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation [PDF]
  • [ACL-IJCNLP 21] Parameterefficient multi-task fine-tuning for transformers via shared hypernetworks [PDF] [Code]
  • [CVPR 24] Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models [PDF] [Code]
  • [CVPR 22] HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing [PDF] [Code]
  • [AAAI 24] Hypereditor: Achieving both authenticity and cross-domain capability in image editing via hypernetworks [PDF]
  • [CVPR 22] HyperInverter: Improving StyleGAN Inversion via Hypernetwork [PDF] [Code]
  • [NeurIPS 22] Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks [PDF]
  • [ICML 22] HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning [PDF] [Code]
  • [2018] Approximating the predictive distribution via adversarially-trained hypernetworks [PDF]
  • [ICML 19] Hypergan: A generative model for diverse, performant neural networks [PDF] [Code]

Generative Models

  • [NeurIPS 21] Self-supervised representation learning on neural network weights for model characteristic prediction [PDF] [Code]
  • [NeurIPS 22] Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights [PDF] [Code]
  • [ICLR-WSL 25] Instruction-Guided Autoregressive Neural Network Parameter Generation [PDF]
  • [ICLR-WSL 25] Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction [PDF] [Code]
  • Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights [PDF] [Code]
  • [IJCAI 25] In-Context Meta LoRA Generation [Code]
  • [ICML 24] Towards Scalable and Versatile Weight Space Learning [PDF] [Code]
  • [2025] Learning Model Representations Using Publicly Available Model Hubs [PDF]
  • [ICLR-WSL 25] Flow to Learn: Flow Matching on Neural Network Parameters [PDF]
  • [2025] NeuroGen: Neural Network Parameter Generation via Large Language Models [PDF]
  • [2022] Learning to Learn with Generative Models of Neural Network Checkpoints [PDF] [Code]
  • [ICCV 23] HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion [PDF] [Code]
  • [ICLR 24] Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation [PDF] [Code]
  • [2024] BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [PDF]
  • [2024] Neural Network Diffusion [PDF] [Code]
  • [MM 25] Text2Weight: Bridging Natural Language and Neural Network Weight Spaces [PDF]
  • [2024] DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [PDF]
  • [2025] Recurrent Diffusion for Large-Scale Parameter Generation [PDF] [Code]
  • [2025] ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion [PDF]
  • [ICLR 25] Diffusion-Based Neural Network Weights Generation [PDF] [Code]

Practical Implications

Conditional Weight Generation

  • [CVPR 22] Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection [PDF]
  • [TMLR 23] Meta-Learning via Classifier(-free) Diffusion Guidance [PDF]
  • [MM 25] Text2Weight: Bridging Natural Language and Neural Network Weight Spaces [PDF]
  • [2025] Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [PDF]

Real-Time Weight Optimization

  • [CVPR 21] Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation [PDF]
  • [CVPR 22] HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing [PDF] [Code]
  • [ICML-EXAIT 25] Reimagining Parameter Space Exploration with Diffusion Models [PDF]

Model Merging

  • [NeurIPS 22] Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights [PDF] [Code]
  • [2025] Generative Modeling of Weights: Generalization or Memorization? [PDF] [Code]
  • [ICML 24] Equivariant Deep Weight Space Alignment [PDF] [Code]
  • [ICML 23] Equivariant Architectures for Learning in Deep Weight Spaces [PDF] [Code]

Weight Initialization

  • [CVPR 24] Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models [PDF] [Code]
  • [2022] Learning to Learn with Generative Models of Neural Network Checkpoints [PDF] [Code]
  • [ICML 23] Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? [PDF] [Code]
  • [ICLR 25] Accelerating Training with Neuron Interaction and Nowcasting Networks [PDF] [Code]

Training Acceleration

  • [ICML 23] Learning to Boost Training by Periodic Nowcasting Near Future Weights [PDF] [Code]
  • [ICLR 25] Accelerating Training with Neuron Interaction and Nowcasting Networks [PDF] [Code]

Data Generation

  • [AISTATS 22] Generative Models as Distributions of Functions [PDF] [Code]
  • [ECCV 24] Neural Metamorphosis [PDF] [Code]
  • [CVPR 21] pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis [PDF] [Code]

Applications to Related Domains

Implicit Neural Representations

  • [NeurIPS-NeurReps 23] Data Augmentations in Deep Weight Spaces [PDF]
  • [ICML 24] Improved Generalization of Weight Space Networks via Augmentations [PDF] [Code]
  • [CVPR 25] Few-shot Implicit Function Generation via Equivariance [PDF] [Code]
  • [NeurIPS 23] Neural Functional Transformers [PDF] [Code]
  • [ICLR 23] Deep Learning on Implicit Neural Representations of Shapes [PDF] [Code]
  • [ICML 22] From data to functa: Your data point is a function and you can treat it like one [PDF] [Code]
  • Spatial Functa: Scaling Functa to ImageNet Classification and Generation [PDF] [Code]
  • From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields [PDF] [Code]
  • [NeurIPS 20] Graf: Generative radiance fields for 3d-aware image synthesis [PDF] [Code]
  • [CVPR 21] pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis [PDF] [Code]
  • [ICCV 23] HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion [PDF] [Code]
  • [ECCV 24] Neural Metamorphosis [PDF] [Code]
  • [CVPR 25] End-to-End Implicit Neural Representations for Classification [PDF] [Code]

Model Unification

  • [NeurIPS 21] Learning signal-agnostic manifolds of neural fields [PDF] [Code]
  • [ICML 22] From data to functa: Your data point is a function and you can treat it like one [PDF] [Code]
  • Spatial Functa: Scaling Functa to ImageNet Classification and Generation [PDF] [Code]
  • GNN-based Unified Deep Learning [PDF] [Code]

Continual Leanring

  • [ICLR 20] Continual learning with hypernetworks [PDF] [Code]
  • [NeurIPS 24] Weight Diffusion for Future: Learn to Generalize in Non-Stationary Environments [PDF] [Code]
  • [2025] Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [PDF]

Meta Learning

  • [AAAI 24] MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning [PDF]
  • Learning to Learn Weight Generation via Local Consistency Diffusion [PDF]
  • [NeurIPS-ML 21] Meta-learning via hypernetworks [PDF]

Federated Learning

  • [CIKM 24] Beyond Aggregation: Efficient Federated Model Consolidation with Heterogeneity-Adaptive Weights Diffusion [PDF]
  • [ICML 21] Personalized Federated Learning using Hypernetworks [PDF] [Code]
  • [AAAI 25] pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning [PDF]

Neural Architecture Search

  • [ICLR 19] Graph HyperNetworks for Neural Architecture Search [PDF]
  • [NeurIPS 21] Parameter prediction for unseen deep architectures [PDF] [Code]

Benchmarks

Model Zoo

  • [NeurIPS 22] Model Zoos: A Dataset of Diverse Populations of Neural Network Models [PDF] [Code]
  • [ICLR-SNN 23] Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models [PDF] [Code]
  • [ICLR 25] Unsupervised Model Tree Heritage Recovery [PDF] [Code]
  • [NeurIPS 24] Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [PDF] [Code]
  • [NeurIPS 24] Interpreting the Weight Space of Customized Diffusion Models [PDF] [Code]
  • Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training [PDF] [Code]
  • [ICLR-WSL 25] A Model Zoo of Vision Transformers [PDF] [Code]
  • [Electronics] An Open Dataset of Neural Networks for Hypernetwork Research [PDF]
  • [ICCS 25] Towards Weight-Space Interpretation of Low-Rank Adapters for Diffusion Models [PDF]
  • [ECAL 20] Classifying the classifier: dissecting the weight space of neural networks [PDF] [Code]
  • Predicting Neural Network Accuracy from Weights [PDF] [Code]
  • [NeurIPS 21] Self-supervised representation learning on neural network weights for model characteristic prediction [PDF] [Code]
  • [ICML 24] Learning Useful Representations of Recurrent Neural Network Weight Matrices [PDF]

Others

Survey

  • A Brief Review of Hypernetworks in Deep Learning [PDF]
  • Implicit Neural Representation in Medical Imaging: A Comparative Survey [PDF]
  • Learning from Models Beyond Fine-Tuning [PDF]
  • Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [PDF]
  • Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey [PDF]
  • Symmetry in Neural Network Parameter Spaces [PDF]
  • [EDBT 25] Model Lakes [PDF]

Thesis

  • [PhD Thesis] Hyper-Representations: Learning from Populations of Neural Networks [PDF]
  • [PhD Thesis] Acquiring and Adapting Priors for Novel Tasks via Neural Meta-Architectures [PDF]
  • [MSc Thesis] Geometric Flow Models over Neural Network Weights [PDF] [Code]

Releases

No releases published

Packages

 
 
 

Contributors