added pypi configs

AlexGraefe · AlexGraefe · commit 7b720cb75b36 · 2025-05-20T16:42:29.000+02:00
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 ## Introduction
 
 This repository offers a tool for training JAX models using mixed precision, called **mpx**. It builds upon [JMP](https://github.com/google-deepmind/jmp)—another mixed precision library for JAX—but extends its capabilities. 
-I discovered that JMP does not support arbitrary PyTrees and is particularly incompatible with models developed using [Equinox](https://docs.kidger.site/equinox/). To overcome these limitations, I created mpx, which leverages Equinox's flexibility to work with any PyTree.
+JMP does not support arbitrary PyTrees and is particularly incompatible with models developed using [Equinox](https://docs.kidger.site/equinox/). mpx overcomes these limitations, by leveraging Equinox's flexibility to work with any PyTree.
 
 ## Basics of Mixed Precision Training
 
diff --git a/doc/paper/main.tex b/doc/paper/main.tex
@@ -2,15 +2,17 @@
 \usepackage[left=2cm,right=2cm]{geometry}
 
 \usepackage[utf8]{inputenc}
-\usepackage{hyperref}
+\usepackage[hidelinks]{hyperref}
 \usepackage{amsmath}
 \usepackage{amssymb}
 \usepackage{graphicx}
 \usepackage{listings}
 \usepackage{xcolor}
 \usepackage{enumitem}
 
-\title{MPX: Mixed Precision Training for JAX}
+\newcommand{\mpx}{\textsc{MPX}}
+
+\title{\mpx{}: Mixed Precision Training for JAX}
 \author{}
 \date{}
 
@@ -19,7 +21,7 @@
 \maketitle
 
 \section{Introduction}
-This paper presents \textbf{mpx}, a tool for training JAX models using mixed precision. The library extends the capabilities of JMP (JAX Mixed Precision) \cite{jmp}, addressing its limitations in handling arbitrary PyTrees and compatibility with models developed using Equinox \cite{equinox}. By leveraging Equinox's flexibility, mpx provides a solution that works with any PyTree structure.
+This paper presents \mpx{}, a tool for training JAX models using mixed precision. The library extends the capabilities of JMP (JAX Mixed Precision) \cite{jmp}, addressing its limitations in handling arbitrary PyTrees and compatibility with models developed using Equinox \cite{kidger2021equinox}. By leveraging Equinox's flexibility, \mpx{} provides a solution that works with any PyTree structure.
 
 \section{Basics of Mixed Precision Training}
 This section summarizes the original Mixed Precision method from NVIDIA's Automatic Mixed Precision \cite{nvidia_amp} and the paper by Micikevicius et al. \cite{mixed_precision_paper}.
@@ -46,15 +48,15 @@ \subsection{Loss Scaling}
 \end{itemize}
 
 \section{Implementation Details}
-mpx is to provides transformations that allow users to transform their existing training pipeline into mixed precision.
+\mpx{} is to provides transformations that allow users to transform their existing training pipeline into mixed precision.
 For this, it provides several functions that allow to cast
 
-The mpx library provides essential transformations for mixed precision training while maintaining JAX's low-level approach. Key components include:
+The \mpx{} library provides essential transformations for mixed precision training while maintaining JAX's low-level approach. Key components include:
 
 \begin{enumerate}
-    \item \textbf{Transformations to Cast PyTrees}: mpx features the following functions to cast arbitrary PyTrees: \texttt{cast\_tree(tree, dtype)} \texttt{cast\_to\_half\_precision(x)}, \texttt{cast\_to\_half\_precision(x)}, \texttt{cast\_to\_float16(x)}, \texttt{cast\_to\_bfloat16(x)}, \texttt{cast\_to\_float32(x)}. All these functions cast all leaves of the input that are JAX arrays and of type float to the corresponding float datatype. All other leaves, including arrays that are of non-float types, like int32, remain unchanged. 
-    \item \textbf{Transformations to Cast Functions}: mpx contains a transformation \texttt{cast\_function(func, dtype, return\_dtype=None)} for functions. This transformation returns a function that casts all its inputs to the desired input datatype (using \texttt{cast\_tree(tree, dtype)}), calls the function and then casts the outputs of the function. Moreover, mpx contains \texttt{force\_full\_precision(func, return\_dtype)}, which forces a function to perform its computations with full precision. This is important as some operations, such as sum, mean or softmax, are sensitive to overflows when calculated in float16.
-    \item \textbf{Transformations to Cast Gradients}: mpx contains the Equinox equivalents \texttt{filter\_grad(func, scaling, has\_aux=False, use\_mixed\_precision=True)} and \texttt{filter\_value\_and\_grad(func, scaling, has\_aux=False, use\_mixed\_precision=True)} that calculate the gradient of a function using mixed precision with loss scaling (as described above). Additional to calculating the gradient, the functions also perform the automatic adaption of the loss scaling value.
+    \item \textbf{Transformations to Cast PyTrees}: \mpx{} features the following functions to cast arbitrary PyTrees: \texttt{cast\_tree(tree, dtype)} \texttt{cast\_to\_half\_precision(x)}, \texttt{cast\_to\_half\_precision(x)}, \texttt{cast\_to\_float16(x)}, \texttt{cast\_to\_bfloat16(x)}, \texttt{cast\_to\_float32(x)}. All these functions cast all leaves of the input that are JAX arrays and of type float to the corresponding float datatype. All other leaves, including arrays that are of non-float types, like int32, remain unchanged. 
+    \item \textbf{Transformations to Cast Functions}: \mpx{} contains a transformation \texttt{cast\_function(func, dtype, return\_dtype=None)} for functions. This transformation returns a function that casts all its inputs to the desired input datatype (using \texttt{cast\_tree(tree, dtype)}), calls the function and then casts the outputs of the function. Moreover, \mpx{} contains \texttt{force\_full\_precision(func, return\_dtype)}, which forces a function to perform its computations with full precision. This is important as some operations, such as sum, mean or softmax, are sensitive to overflows when calculated in float16.
+    \item \textbf{Transformations to Cast Gradients}: \mpx{} contains the Equinox equivalents \texttt{filter\_grad(func, scaling, has\_aux=False, use\_mixed\_precision=True)} and \texttt{filter\_value\_and\_grad(func, scaling, has\_aux=False, use\_mixed\_precision=True)} that calculate the gradient of a function using mixed precision with loss scaling (as described above). Additional to calculating the gradient, the functions also perform the automatic adaption of the loss scaling value.
     These drop-in replacements allow users to reuse their existing Equinox training pipelines without major changes to their structure (cf. Section todo).
 \end{enumerate}
 
@@ -68,35 +70,42 @@ \subsection{Automatic Loss Scaling Implementation}
 \end{itemize}
 
 \subsection{Optimizer}
-mpx works with all optax optimizers. However, as explained above, one might need to skip optimizer updates if gradients became infinite due to loss scaling.
+\mpx{} works with all optax optimizers. However, as explained above, one might need to skip optimizer updates if gradients became infinite due to loss scaling.
 The  \texttt{optimizer\_update(model, optimizer, optimizer\_state, grads, grads\_finite)} function handles model updates based on gradient finiteness.
-This means, instead of calling \texttt{optimizer.update}, followed by \texttt{eqx.apply\_updates} as done in regular Equinox training pipelines, one just have to call \texttt{mpx.optimizer\_update}.
+This means, instead of calling \texttt{optimizer.update}, followed by \texttt{eqx.apply\_updates} as done in regular Equinox training pipelines, one just have to call \texttt{\mpx{}.optimizer\_update}.
 
 \section{Example}
-I t
+Here, we provide an example and show which parts in a training pipeline need to be changed for mixed precision training.
+
+\section{Model Implementation}
+For the largest part, the implementation of the model must not be changed. 
+As \mpx{} works with arbitrary PyTrees, every Toolbox that defines their model/parameters as PyTrees, like Flax~\cite{flax2020github} or Equinox~\cite{kidger2021equinox} can be used in conjunction with \mpx{}. 
 
 \section{Acknowledgements}
 We express our gratitude to Patrick Kidger for Equinox and Google DeepMind for JMP, which served as the foundation for this implementation.
 
 The authors acknowledge the computing time provided by the NHR Center NHR4CES at RWTH Aachen University (project number p0021919), funded by the Federal Ministry of Education and Research, and participating state governments through the GWK resolutions for national high performance computing at universities.
 
-\begin{thebibliography}{9}
-\bibitem{jmp} 
-JMP: JAX Mixed Precision
-\newblock \url{https://github.com/google-deepmind/jmp}
+\bibliographystyle{plain}
+\bibliography{references}
+
+% \begin{thebibliography}{9}
+% \bibitem{jmp} 
+% JMP: JAX Mixed Precision
+% \newblock \url{https://github.com/google-deepmind/jmp}
 
-\bibitem{equinox}
-Equinox: Neural Networks in JAX
-\newblock \url{https://docs.kidger.site/equinox/}
+% \bibitem{equinox}
+% Equinox: Neural Networks in JAX
+% \newblock \url{https://docs.kidger.site/equinox/}
 
-\bibitem{nvidia_amp}
-NVIDIA Automatic Mixed Precision
-\newblock \url{https://developer.nvidia.com/automatic-mixed-precision}
+% \bibitem{nvidia_amp}
+% NVIDIA Automatic Mixed Precision
+% \newblock \url{https://developer.nvidia.com/automatic-mixed-precision}
 
-\bibitem{mixed_precision_paper}
-P. Micikevicius et al.
-\newblock ``Mixed Precision Training''
-\newblock arXiv:1710.03740, 2017
-\end{thebibliography}
+% \bibitem{mixed_precision_paper}
+% P. Micikevicius et al.
+% \newblock ``Mixed Precision Training''
+% \newblock arXiv:1710.03740, 2017
+% \end{thebibliography}
 
 \end{document}
diff --git a/doc/paper/references.bib b/doc/paper/references.bib
@@ -0,0 +1,27 @@
+@misc{jmp,
+    title = {JMP: JAX Mixed Precision},
+    howpublished = {\url{https://github.com/google-deepmind/jmp}},
+    note = {Accessed: 2024-06-09}
+}
+@article{mixed_precision_paper,
+  title={Mixed precision training},
+  author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
+  journal={arXiv preprint arXiv:1710.03740},
+  year={2017}
+}
+
+@software{flax2020github,
+  author = {Jonathan Heek and Anselm Levskaya and Avital Oliver and Marvin Ritter and Bertrand Rondepierre and Andreas Steiner and Marc van {Z}ee},
+  title = {{F}lax: A neural network library and ecosystem for {JAX}},
+  url = {http://github.com/google/flax},
+  version = {0.10.6},
+  year = {2024},
+}
+
+
+@article{kidger2021equinox,
+    author={Patrick Kidger and Cristian Garcia},
+    title={{E}quinox: neural networks in {JAX} via callable {P}y{T}rees and filtered transformations},
+    year={2021},
+    journal={Differentiable Programming workshop at Neural Information Processing Systems 2021}
+}
diff --git a/mpx/__init__.py b/mpx/__init__.py
@@ -2,6 +2,8 @@
 Mixed Precision for JAX - A library for mixed precision training in JAX
 """
 
+__version__ = "0.1.2"
+
 from .cast import (
     cast_tree,
     cast_to_float32,
diff --git a/mpx/loss_scaling.py b/mpx/loss_scaling.py
@@ -73,7 +73,29 @@ def wrapper(*_args, **_kwargs):
 
 
 class DynamicLossScaling(eqx.Module):
-    """Basic structure taken from jmp."""
+    """
+    Implements dynamic loss scaling for mixed precision training in JAX.
+    The basic structure is taken from jmp.
+    This class automatically adjusts the loss scaling factor during training to prevent
+    numerical underflow/overflow when using reduced precision (e.g., float16). The scaling
+    factor is increased periodically if gradients are finite, and decreased if non-finite
+    gradients are detected, within specified bounds.
+    Attributes:
+        loss_scaling (jnp.ndarray): Current loss scaling factor.
+        min_loss_scaling (jnp.ndarray): Minimum allowed loss scaling factor.
+        counter (jnp.ndarray): Counter for tracking update periods.
+        factor (int): Multiplicative factor for adjusting loss scaling.
+        period (int): Number of steps between potential increases of loss scaling.
+    Methods:
+        scale(tree):
+            Scales all leaves of a pytree by the current loss scaling factor.
+        unscale(tree):
+            Unscales all leaves of a pytree by the inverse of the current loss scaling factor,
+            casting the result to float32.
+        adjust(grads_finite: jnp.ndarray) -> 'DynamicLossScaling':
+            Returns a new DynamicLossScaling instance with updated loss scaling and counter,
+            depending on whether the gradients are finite.
+    """
     loss_scaling: jnp.ndarray
     min_loss_scaling: jnp.ndarray
     counter: jnp.ndarray
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,49 @@
+[build-system]
+build-backend = "hatchling.build"
+requires = ["hatchling"]
+
+
+[project]
+name = "mixed-precision-for-JAX"
+dynamic = ["version"]
+dependencies = [
+  "equinox",
+  "optax",
+  "jax>=0.4.38", 
+  "jaxtyping>=0.2.20", 
+  "typing_extensions>=4.5.0", 
+  "wadler_lindig>=0.1.0"
+]
+requires-python = ">=3.10"
+authors = [
+  {name = "Alexander Graefe", email = "alexander.graefe@dsme.rwth-aachen.de"},
+]
+maintainers = [
+  {name = "Alexander Graefe", email = "alexander.graefe@dsme.rwth-aachen.de"},
+]
+description = "A toolbox for mixed precision training via JAX."
+readme = "README.md"
+license = "MIT"
+license-files = ["LICENCSE"]
+keywords = ["JAX", "Neural Network", "Mixed Precision"]
+classifiers = [
+  "Programming Language :: Python",
+  "Development Status :: 3 - Alpha",
+  "Intended Audience :: Education",
+  "Intended Audience :: Developers",
+  "Intended Audience :: Information Technology",
+  "Intended Audience :: Science/Research",
+  "Topic :: Scientific/Engineering :: Artificial Intelligence",
+  "Topic :: Scientific/Engineering :: Information Analysis",
+  "Topic :: Scientific/Engineering :: Mathematics"
+]
+
+[tool.hatch.build]
+include = ["mpx/*"]
+
+[tool.hatch.version]
+path = "mpx/__init__.py"
+
+[project.urls]
+Repository = "https://github.com/AlexGraefe/mixed_precision_for_JAX"
+"Bug Tracker" = "https://github.com/AlexGraefe/mixed_precision_for_JAX/issues"