Associative scan with multiple args via byte packing/unpacking? #2357

jackd · 2023-09-21T05:18:34Z

jackd
Sep 21, 2023

I'm trying to perform an associative_scan using a pair of tensors. Unfortunately, the scan implementation currently only supports a single tensor.

ValueError('Current implementation only support single tensor input')

I was thinking of getting around this combining the bit representations of each vector outside the associative scan, and splitting it into it's component parts inside the scan. In numpy it would look something like the following:

def merge(a, b):
    stacked = np.stack((a, b), axis=-1)
    return np.frombuffer(stacked.tobytes(), dtype=np.float64).reshape(a.shape)


def unmerge(merged):
    assert merged.dtype == np.float64
    stacked = np.frombuffer(merged.tobytes(), dtype=np.float32).reshape(
        *merged.shape, 2
    )
    return stacked[..., 0], stacked[..., 1]

Is there some equivalent way of doing this in triton? Can I access a pointer's address for use in a different pointer with a different dtype? Or am I better off waiting for multi-arg associative scan support?

Answered by jackd

Sep 21, 2023

Ended up solving it:

@triton.jit
def bitcast_merge_triton(a, b):
    tl.static_assert(a.dtype == tl.float32)
    tl.static_assert(b.dtype == tl.float32)
    a = a.to(dtype=tl.int32, bitcast=True).to(tl.int64)  # directly converted to int32
    a = a << 32  # shifted by 32 bits
    b = b.to(dtype=tl.int32, bitcast=True).to(tl.int64)  # directly converted to int32
    return a | b
    
    
@triton.jit
def bitcast_unmerge_triton(merged):
    tl.static_assert(merged.dtype == tl.int64)
    b = (merged & 0xFFFFFFFF).to(tl.int32).to(tl.float32, bitcast=True)
    a = (merged >> 32).to(tl.int32).to(tl.float32, bitcast=True)  # shifted by 32 bits
    return a, b

View full answer

jackd · 2023-09-21T09:16:07Z

jackd
Sep 21, 2023
Author

Ended up solving it:

@triton.jit
def bitcast_merge_triton(a, b):
    tl.static_assert(a.dtype == tl.float32)
    tl.static_assert(b.dtype == tl.float32)
    a = a.to(dtype=tl.int32, bitcast=True).to(tl.int64)  # directly converted to int32
    a = a << 32  # shifted by 32 bits
    b = b.to(dtype=tl.int32, bitcast=True).to(tl.int64)  # directly converted to int32
    return a | b
    
    
@triton.jit
def bitcast_unmerge_triton(merged):
    tl.static_assert(merged.dtype == tl.int64)
    b = (merged & 0xFFFFFFFF).to(tl.int32).to(tl.float32, bitcast=True)
    a = (merged >> 32).to(tl.int32).to(tl.float32, bitcast=True)  # shifted by 32 bits
    return a, b

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Associative scan with multiple args via byte packing/unpacking? #2357

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Associative scan with multiple args via byte packing/unpacking? #2357

Uh oh!

jackd Sep 21, 2023

Replies: 1 comment

Uh oh!

jackd Sep 21, 2023 Author

jackd
Sep 21, 2023

jackd
Sep 21, 2023
Author