Normal distribution of `jax.random.normal` with `dtype` `float16` or `bfloat16` has surprisingly worse quality than `float32` + `astype` #13798

skirsten · 2022-12-26T17:58:49Z

skirsten
Dec 26, 2022

So I am not sure if I am doing something wrong here, its a bug, or it is expected (jax v0.4.1, cuda):

The following code generates random noise using

jax.random.normal with different dtypes
(optionally) convert to other dtype
as last step, convert whatever it is to float32 to avoid any potential resolution issues in histogram.

import jax
import jax.numpy as jnp
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw

sample_size = 1000000
bins = 500

sample_fp32 = jax.random.normal(
    jax.random.PRNGKey(0), (sample_size,), dtype=jnp.float32
)
sample_fp16 = jax.random.normal(
    jax.random.PRNGKey(0), (sample_size,), dtype=jnp.float16
)
sample_bf16 = jax.random.normal(
    jax.random.PRNGKey(0), (sample_size,), dtype=jnp.bfloat16
)


@jax.jit
def calc_hist(sample):
    counts, bin_edges = jnp.histogram(sample, bins=bins, range=(-5, 5))
    return (bin_edges[:-1], counts)


def custom_plot(counts):
    assert len(counts.shape) == 1
    dim = counts.shape[0]

    counts = np.array(counts)
    max = np.max(counts)
    values = (counts / max) * dim

    img = Image.new("1", (dim, dim))
    draw = ImageDraw.Draw(img)
    for i in range(dim):
        draw.rectangle([(i, dim), (i, dim - values[i])], "white")

    return img


fig, axs = plt.subplots(3, 2)
fig.tight_layout(h_pad=2)

sample = sample_fp32
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[0, 0].plot(bin, counts)
axs[0, 0].set_title("fp32")
custom_plot(counts).show("fp32")

sample = sample_fp16
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[1, 0].plot(bin, counts)
axs[1, 0].set_title("fp16")
custom_plot(counts).show("fp16")

sample = sample_bf16
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[2, 0].plot(bin, counts)
axs[2, 0].set_title("bf16")
custom_plot(counts).show("bf16")

sample = sample_fp32.astype(jnp.float32)
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[0, 1].plot(bin, counts)
axs[0, 1].set_title("fp32 -> fp32")
custom_plot(counts).show("fp32 -> fp32")

sample = sample_fp32.astype(jnp.float16)
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[1, 1].plot(bin, counts)
axs[1, 1].set_title("fp32 -> fp16")
custom_plot(counts).show("fp32 -> fp16")

sample = sample_fp32.astype(jnp.bfloat16)
bin, counts = calc_hist(sample.astype(jnp.float32))
axs[2, 1].plot(bin, counts)
axs[2, 1].set_title("fp32 -> bf16")
custom_plot(counts).show("fp32 -> bf16")

plt.show()

The left side is the normal distribution if they are generated in the specified dtype.
The right side is the normal distribution if they are generated in float32 and then casted to the specified dtype.

As can be seen, the direct version (left side) has considerably lower quality.

The only reason I can think of for this loss in precision is that during the sampling of the normal its losing a lot of precision or the rng generator does not work well with non-float32 datatypes. Any ideas, explanations or input is appreciated.

Answered by jakevdp

Dec 26, 2022

Thanks for pointing this out – I think the issue is that in converting random bits to a uniform distribution (which is the first step in generating normal random values) we use a procedure that only randomizes the mantissa of the floating point representation. For float16, this means there are essentially only $2^{10} = 1024$ possible values, and for bfloat16 it's only $2^7 = 128$ possible values.

There's probably a better method out there for generating uniformly-distributed floating point values at low precisions, but I'm not sure what the alternative approach would be.

View full answer

jakevdp · 2022-12-26T20:33:03Z

jakevdp
Dec 26, 2022
Maintainer

Thanks for pointing this out – I think the issue is that in converting random bits to a uniform distribution (which is the first step in generating normal random values) we use a procedure that only randomizes the mantissa of the floating point representation. For float16, this means there are essentially only $2^{10} = 1024$ possible values, and for bfloat16 it's only $2^7 = 128$ possible values.

There's probably a better method out there for generating uniformly-distributed floating point values at low precisions, but I'm not sure what the alternative approach would be.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normal distribution of `jax.random.normal` with `dtype` `float16` or `bfloat16` has surprisingly worse quality than `float32` + `astype` #13798

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Normal distribution of jax.random.normal with dtype float16 or bfloat16 has surprisingly worse quality than float32 + astype #13798

Uh oh!

Uh oh!

skirsten Dec 26, 2022

Replies: 2 comments

Uh oh!

jakevdp Dec 26, 2022 Maintainer

Normal distribution of `jax.random.normal` with `dtype` `float16` or `bfloat16` has surprisingly worse quality than `float32` + `astype` #13798

skirsten
Dec 26, 2022

jakevdp
Dec 26, 2022
Maintainer