Skip to content

Commit c672a97

Browse files
committed
Doc
1 parent 475c6e1 commit c672a97

File tree

4 files changed

+59
-10
lines changed

4 files changed

+59
-10
lines changed

docs/source/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ API
88
Scalar Functions
99
----------------
1010

11-
.. autofunction:: decode_float
1211
.. autofunction:: round_float
1312
.. autofunction:: encode_float
13+
.. autofunction:: decode_float
1414

1515
Array Functions
1616
---------------

docs/source/index.rst

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,32 @@ of:
1717
* Precision (p)
1818
* Maximum exponent (emax)
1919

20-
with additional fields defining the encoding of infinities, Not-a-number (NaN) values,
21-
and negative zero, among others (see :class:`gfloat.FormatInfo`.)
20+
with additional fields defining the presence/encoding of:
21+
22+
* Infinities
23+
* Not-a-number (NaN) values
24+
* Negative zero
25+
* Subnormal numbers
26+
* Signed/unsigned
27+
* Two's complement encoding (of the significand)
2228

2329
This allows an implementation of generic floating point encode/decode logic,
2430
handling various current and proposed floating point types:
2531

2632
- `IEEE 754 <https://en.wikipedia.org/wiki/IEEE_754>`_: Binary16, Binary32
27-
- `OCP Float8 <https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-06-20-pdf>`_: E5M2, E4M3, and MX formats
33+
- `Brain floating point <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_: BFloat16
34+
- `OCP Float8 <https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-06-20-pdf>`_: E5M2, E4M3
2835
- `IEEE WG P3109 <https://github.com/awf/P3109-Public/blob/main/Shared%20Reports/P3109%20WG%20Interim%20report.pdf>`_: P{p} for p in 1..7
36+
- Types from the `OCP MX <https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf>`_ spec: E8M0, INT8, and FP4, FP6 types
37+
2938

30-
The library favours readability and extensibility over speed - for fast
31-
implementations of these datatypes see, for example,
39+
GFloat, being a pure Python library, favours readability and extensibility over speed
40+
(although the `*_ndarray` functions are reasonably fast for large arrays).
41+
For fast implementations of these datatypes see, for example,
3242
`ml_dtypes <https://github.com/jax-ml/ml_dtypes>`_,
3343
`bitstring <https://github.com/scott-griffiths/bitstring>`_,
34-
`MX PyTorch Emulation Library <https://github.com/microsoft/microxcaling>`_.
44+
`MX PyTorch Emulation Library <https://github.com/microsoft/microxcaling>`_,
45+
`APyTypes <https://apytypes.github.io/apytypes>`_.
3546

3647
To get started with the library, we recommend perusing the notebooks,
3748
otherwise you may wish to jump straight into the API.

src/gfloat/decode_ndarray.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ def decode_ndarray(
99
fi: FormatInfo, codes: np.ndarray, np: ModuleType = np
1010
) -> np.ndarray:
1111
r"""
12-
Vectorized version of :function:`decode_float`
12+
Vectorized version of :meth:`decode_float`
1313
1414
Args:
1515
fi (FormatInfo): Floating point format descriptor.

src/gfloat/round_ndarray.py

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,30 @@ def round_ndarray(
1818
np: ModuleType = np,
1919
) -> np.ndarray:
2020
"""
21-
Vectorized version of round_float.
21+
Vectorized version of :meth:`round_float`.
22+
23+
Round inputs to the given :py:class:`FormatInfo`, given rounding mode and
24+
saturation flag
25+
26+
Input NaNs will convert to NaNs in the target, not necessarily preserving payload.
27+
An input Infinity will convert to the largest float if :paramref:`sat`,
28+
otherwise to an Inf, if present, otherwise to a NaN.
29+
Negative zero will be returned if the format has negative zero, otherwise zero.
30+
31+
Args:
32+
fi (FormatInfo): Describes the target format
33+
v (float): Input value to be rounded
34+
rnd (RoundMode): Rounding mode to use
35+
sat (bool): Saturation flag: if True, round overflowed values to `fi.max`
36+
np (Module): May be `numpy`, `jax.numpy` or another module cloning numpy
37+
38+
Returns:
39+
An array of floats which is a subset of the format's value set.
40+
41+
Raises:
42+
ValueError: The target format cannot represent an input
43+
(e.g. converting a `NaN`, or an `Inf` when the target has no
44+
`NaN` or `Inf`, and :paramref:`sat` is false)
2245
"""
2346
p = fi.precision
2447
bias = fi.expBias
@@ -109,7 +132,22 @@ def round_ndarray(
109132

110133
def encode_ndarray(fi: FormatInfo, v: np.ndarray) -> np.ndarray:
111134
"""
112-
Vectorized version of encode_float.
135+
Vectorized version of :meth:`encode_float`.
136+
137+
Encode inputs to the given :py:class:`FormatInfo`.
138+
139+
Will round toward zero if :paramref:`v` is not in the value set.
140+
Will saturate to `Inf`, `NaN`, `fi.max` in order of precedence.
141+
Encode -0 to 0 if not `fi.has_nz`
142+
143+
For other roundings and saturations, call :func:`round_ndarray` first.
144+
145+
Args:
146+
fi (FormatInfo): Describes the target format
147+
v (float array): The value to be encoded.
148+
149+
Returns:
150+
The integer code point
113151
"""
114152
k = fi.bits
115153
p = fi.precision

0 commit comments

Comments
 (0)