Numerical precision in BCOO dot/add operations #16528

quattro · 2023-06-22T23:24:29Z

quattro
Jun 22, 2023

Hi all,

I'm toying around with a data structure that represents a centered, variance-scaled dataset using a sparse representation and centering scaling operations as a linear operator. While initially I had rolled my own, I've had fun using the lineax framework to simplify some of internal mechanics.

I have run into some issues with numerical stability in that the output between dense matrix/vector produces differs from my data structure's matrix/vector products. I've tried digging into when/how the errors cumulate, but haven't been successful atm. Individual operations seem to be numerical close, but their sum results in too much atol/rtol for allclose to be happy.

I'm including a minimal working example below to highlight where the issue arises using an 'un-rolled' representation of the linear operator.

#! /usr/bin/env python
import argparse as ap
import os
import sys

import jax.experimental.sparse as sparse
import jax.numpy as jnp
import jax.random as rdm

from jax.config import config


config.update("jax_enable_x64", True)
config.update("jax_default_matmul_precision", "highest")


def _binomial(key, N, p, shape):
    B = jnp.sum(rdm.bernoulli(key, p, shape=(N,) + shape).astype(int), axis=0)
    return B


def main(args):
    argp = ap.ArgumentParser(description="")
    argp.add_argument("-s", "--seed", type=int, default=0)
    argp.add_argument("-n", type=int, default=50)
    argp.add_argument("-p", type=int, default=100)
    argp.add_argument("-o", "--output", type=ap.FileType("w"), default=sys.stdout)

    args = argp.parse_args(args)
    key = rdm.PRNGKey(args.seed)

    maf = 0.1
    N, P = args.n, args.p
    key, g_key = rdm.split(key)

    # simulate matrix
    G = _binomial(g_key, 2, maf, (N, P)).astype(jnp.int8)

    # grab center/scaling values
    M = jnp.mean(G, axis=0)
    S = 1.0 / jnp.std(G, axis=0)

    # centtered G, centered-scaled G
    C = G - M
    Z = C * S

    # sparse representation
    Gsp = sparse.BCOO.fromdense(G)

    # (Gsp - T @ M) @ Sd should be mathematically equiv to Z
    # does it behave the same numerically?
    T = jnp.ones((N, 1))
    Mj = M.reshape((1, P))
    Sd = jnp.diag(S)

    key, r_key = rdm.split(key)
    R = rdm.normal(r_key, shape=(P,))

    SdR = Sd @ R

    # standard dot product between G @ R
    args.output.write(
        f"Stable[G @ R] = {jnp.allclose(G @ R, Gsp @ R)}" + os.linesep
    )  # dot product is same

    # centered-G @ R
    args.output.write(
        f"Stable[C @ R] = {jnp.allclose(C @ R, Gsp @ R - T @ Mj @ R)}" + os.linesep
    )  # centered dot product is same; centering is okay!

    # scaled-G @ R
    args.output.write(
        f"Stable[G/S @ R] = {jnp.allclose(G @ SdR, Gsp @ SdR)}" + os.linesep
    )  # scaled dot product is same; scaling is okay!

    # save for atol/rtol
    expected = Z @ R
    observed = Gsp @ SdR - T @ Mj @ SdR
    atol = jnp.abs(expected - observed)
    rtol = atol / jnp.minimum(jnp.abs(expected), jnp.abs(observed))

    # centered-scaled-G @ R
    args.output.write(
        f"Stable[Z @ R] = {jnp.allclose(expected, observed)}" + os.linesep
    )  # ope! scaling + centering is not okay!
    args.output.write(f"Max atol[Z @ R] = {jnp.max(atol)}" + os.linesep)
    args.output.write(f"Max rtol[Z @ R] = {jnp.max(rtol)}" + os.linesep)

    return 0


if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))

Running python jax.example.py -n 500 -p 10000 --seed 2 outputs,

Stable[G @ R] = True
Stable[C @ R] = True
Stable[G/S @ R] = True
Stable[Z @ R] = False
Max atol[Z @ R] = 1.036703942247641e-05
Max rtol[Z @ R] = 1.1815304396489549e-05

I should note that this is on CPU/Apple M1, jax version '0.4.11' jaxlib version '0.4.10'.

Answered by jakevdp

Jun 23, 2023

It looks like the issue is that you're comparing float32 arithmetic to float64 arithmetic. If I change this line:

M = jnp.mean(G, axis=0)

to this:

M = jnp.mean(G, axis=0).astype('float64')

Then I get this output:

Stable[G @ R] = True
Stable[C @ R] = True
Stable[G/S @ R] = True
Stable[Z @ R] = True
Max atol[Z @ R] = 7.105427357601002e-13
Max rtol[Z @ R] = 5.763955293994456e-13

View full answer

davisyoshida · 2023-06-23T04:44:27Z

davisyoshida
Jun 23, 2023
Collaborator

For what it's worth all the allclose checks pass when I run them on GPU or CPU (not Apple), the same jax/jaxlib versions. Seems to suggest it could be an Apple specific thing?

0 replies

jakevdp · 2023-06-23T05:18:53Z

jakevdp
Jun 23, 2023
Maintainer

It looks like the issue is that you're comparing float32 arithmetic to float64 arithmetic. If I change this line:

M = jnp.mean(G, axis=0)

to this:

M = jnp.mean(G, axis=0).astype('float64')

Then I get this output:

Stable[G @ R] = True
Stable[C @ R] = True
Stable[G/S @ R] = True
Stable[Z @ R] = True
Max atol[Z @ R] = 7.105427357601002e-13
Max rtol[Z @ R] = 5.763955293994456e-13

4 replies

davisyoshida Jun 23, 2023
Collaborator

>>> config.update('jax_enable_x64', True)
>>> jnp.ones(3, jnp.uint8).mean().dtype
dtype('float32')
>>> np.ones(3, np.uint8).mean().dtype
dtype('float64')
>>> jnp.ones(3).dtype
dtype('float64')

This feels a bit counterintuitive, although I do think the type promotion document implies that this should be the behavior.

I guess it's probably not possible to change the default dtype on jnp.ones/zeros/etc at this point.

jakevdp Jun 23, 2023
Maintainer

Yeah, we worked for a while on trying to migrate that (The undocumented JAX_DEFAULT_DTYPE_BITS flag still exists as a remnant of that) but it turns out that if you mess with default dtypes, it has a huge downstream impact in the form of silently returning different results than before. Some for the better, some for the worse, but with the number of projects depending on JAX, changes like that have a pretty outsized impact

davisyoshida Jun 23, 2023
Collaborator

Makes sense. I guess it's not even that bad since if someone needs high precision they'll notice their output isn't the dtype they wanted.

quattro Jun 23, 2023
Author

Thanks as always @jakevdp . I wasn't aware of the type promotion semantics for this particular case, but this is really helpful going forward.

Numerical precision in BCOO dot/add operations #16528

Uh oh!

Uh oh!

quattro Jun 22, 2023

Replies: 2 comments · 4 replies

Uh oh!

davisyoshida Jun 23, 2023 Collaborator

Uh oh!

Uh oh!

jakevdp Jun 23, 2023 Maintainer

Uh oh!

davisyoshida Jun 23, 2023 Collaborator

Uh oh!

Uh oh!

jakevdp Jun 23, 2023 Maintainer

Uh oh!

davisyoshida Jun 23, 2023 Collaborator

Uh oh!

quattro Jun 23, 2023 Author

quattro
Jun 22, 2023

Replies: 2 comments 4 replies

davisyoshida
Jun 23, 2023
Collaborator

jakevdp
Jun 23, 2023
Maintainer

davisyoshida Jun 23, 2023
Collaborator

jakevdp Jun 23, 2023
Maintainer

davisyoshida Jun 23, 2023
Collaborator

quattro Jun 23, 2023
Author