Segment_sum + jit + grad #9762

charlessievers · 2022-03-03T19:38:53Z

charlessievers
Mar 3, 2022

I have a jagged array of inputs, which I have called descriptors. The dimension of this array is (i, n_i, k). I would like to keep the memory and computation time low through loops of update. Below is a failed attempt at trying to get around jagged arrays by flattening the first two dimensions to get an array (j, k), then batch predictions, and then sum over indices specified by n_i to get obtain a predictions array of length i. (Note: This does work without jit)

def predict(params, descriptors):
    """
    logit output is a single float, not an array
    """
    activations = descriptors
    for w, b in params[:-1]:
        outputs = jnp.dot(w, activations) + b
        activations = nn.relu(outputs)

    final_w, final_b = params[-1]
    logits = jnp.dot(final_w, activations) + final_b
    return logits

batched_predict = vmap(predict, in_axes=(None, 0))

def loss(params, descriptors, targets, sum_indices):
    predictions = segment_sum(batched_predict(params, descriptors), sum_indices)
    return jnp.mean(jnp.square(predictions-targets))

@jit
def update(params, descriptors, targets, sum_indices, step_size):
    grads = grad(loss)(params, descriptors, targets, sum_indices)
    return [(w - step_size * dw, b - step_size * db)
                for (w, b), (dw, db) in zip(params, grads)]

I have a working jit version where I pad my jagged array with zeros with resulting dimension (i, max(n_i), k). The snippet below uses this array. While epochs with this version are twice as fast as the previous version without jit, n_i can vary from 2 up to 1000 and the SGD/Adam nevers converges. I imagine that there should be a way to get around padding the array. Maybe padding with sparsity?


def predict(params, descriptors):
    activations = descriptors
    for w, b in params[:-1]:
        outputs = jnp.dot(w, activations) + b
        activations = nn.relu(outputs)

    final_w, final_b = params[-1]
    logits = jnp.dot(final_w, activations) + final_b
    return logits

batched_predict = vmap(vmap(predict, in_axes=(None, 0)), in_axes=(None, 0))

def loss(params, descriptors, targets):
    predictions = jnp.sum(batched_predict(params, descriptors), axis=1)
    return jnp.mean(jnp.square(predictions-targets))

@jit
def update(self, params, descriptors, targets, step_size):
    grads = grad(loss)(params, descriptors, targets)
    return [(w - step_size * dw, b - step_size * db)
                for (w, b), (dw, db) in zip(params, grads)]

See dummy example below of first scenario with jit commented out.

import jax.numpy as jnp
from jax import jit, nn, vmap, grad, random
from jax.ops import segment_sum
import numpy as np


def random_layer_params(m, n, key, scale=1e-2):
    """
    A helper function to randomly initialize weights and biases
    for a dense neural network layer
    """
    w_key, b_key = random.split(key)
    return scale * random.normal(w_key, (n, m)), scale * random.normal(b_key, (n,))


def predict(params, descriptors):
    """
    logit output is a single float, not an array
    """
    activations = descriptors
    for w, b in params[:-1]:
        outputs = jnp.dot(w, activations) + b
        activations = nn.relu(outputs)

    final_w, final_b = params[-1]
    logits = jnp.dot(final_w, activations) + final_b
    return logits


batched_predict = vmap(predict, in_axes=(None, 0))


def loss(params, descriptors, targets, sum_indices):
    predictions = segment_sum(batched_predict(params, descriptors), sum_indices)
    return jnp.mean(jnp.square(predictions-targets))


# @jit
def update(params, descriptors, targets, sum_indices, step_size):
    grads = grad(loss)(params, descriptors, targets, sum_indices)
    return [(w - step_size * dw, b - step_size * db)
            for (w, b), (dw, db) in zip(params, grads)]


def generate_dummy_data(target_length, num_desc):
    descriptors = []
    targets = np.zeros((target_length, 1))
    sum_indices = []
    for i in range(target_length):
        targets[i] = np.random.normal()
        for j in range(np.random.randint(2, 100)):
            sum_indices.append(i)
            descriptors.append(np.random.rand(num_desc))
    sum_indices = jnp.array(sum_indices)
    descriptors = jnp.array(np.vstack(descriptors))
    targets = jnp.array(targets)
    return descriptors, targets, sum_indices


step_size = 1e-5
num_epoch = 10
descriptor_len = 4
# output layer size is always 1
layer_sizes = [descriptor_len, np.random.randint(2, descriptor_len-1), 1]
key = random.PRNGKey(0)
keys = random.split(key, len(layer_sizes))
params = [random_layer_params(m, n, k) for m, n, k in zip(layer_sizes[:-1], layer_sizes[1:], keys)]
random_length = np.random.randint(3, 8)
x, y, indices = generate_dummy_data(random_length, descriptor_len)

for epoch in range(num_epoch):
    params = update(params, x, y, indices, step_size)
    print(loss(params, x, y, indices))

Answered by jakevdp

Mar 3, 2022

I'm finding it difficult to guess the correct shapes/sizes of input arrays to reproduce the results/errors you're seeing. Could you edit your question to add a minimal reproducible example?

View full answer

jakevdp · 2022-03-03T19:44:44Z

jakevdp
Mar 3, 2022
Maintainer

I'm finding it difficult to guess the correct shapes/sizes of input arrays to reproduce the results/errors you're seeing. Could you edit your question to add a minimal reproducible example?

4 replies

charlessievers Mar 3, 2022
Author

Sorry about that, I was being lazy. I have edited the question to contain what I hope is a more useful representation of my problem.

jakevdp Mar 3, 2022
Maintainer

Thanks - the issue is that the compiler has no way of knowing the size of the output for the num_segments call, since it depends on the values in the sum_indices array. The error you get is this:

ConcretizationTypeError: Abstract tracer value encountered where concrete value is expected: Traced<ShapedArray(int32[])>with<DynamicJaxprTrace(level=0/1)>
segment_sum() `num_segments` argument.

You can fix it by passing a static value for num_segments to the segment_sum operation, so that the compiler knows how large the output array will be. It would look something like this:

from functools import partial

...

def loss(params, descriptors, targets, sum_indices, num_segments):
    predictions = segment_sum(batched_predict(params, descriptors), sum_indices, num_segments=num_segments)
    return jnp.mean(jnp.square(predictions-targets))


@partial(jit, static_argnames=['num_segments'])
def update(params, descriptors, targets, sum_indices, step_size, num_segments):
    grads = grad(loss)(params, descriptors, targets, sum_indices, num_segments=num_segments)
    return [(w - step_size * dw, b - step_size * db)
            for (w, b), (dw, db) in zip(params, grads)]

...

for epoch in range(num_epoch):
    params = update(params, x, y, indices, step_size, num_segments=random_length)
    print(loss(params, x, y, indices, num_segments=random_length))

charlessievers Mar 3, 2022
Author

That makes total sense. Thanks for the help!

jdthorpe Jan 22, 2023

Wow! With this trick my jit'ed call time went from ~115 seconds to 0.0014 seconds!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segment_sum + jit + grad #9762

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Segment_sum + jit + grad #9762

Uh oh!

Uh oh!

charlessievers Mar 3, 2022

Replies: 1 comment · 4 replies

Uh oh!

jakevdp Mar 3, 2022 Maintainer

Uh oh!

charlessievers Mar 3, 2022 Author

Uh oh!

Uh oh!

jakevdp Mar 3, 2022 Maintainer

Uh oh!

charlessievers Mar 3, 2022 Author

Uh oh!

Uh oh!

jdthorpe Jan 22, 2023

charlessievers
Mar 3, 2022

Replies: 1 comment 4 replies

jakevdp
Mar 3, 2022
Maintainer

charlessievers Mar 3, 2022
Author

jakevdp Mar 3, 2022
Maintainer

charlessievers Mar 3, 2022
Author