Skip to content

BIR error, Access pattern out of bound, Instruction: I-6012-337-accel_sg0000 while training Resnet50 with --optlevel=1 #1122

@hannahingham

Description

@hannahingham

When I train a Resnet50 model on the Trainium and run it with the following command:

--target=trn1 --framework=XLA --optlevel=1

I am getting this compiler error:

2025-03-26T03:43:31Z ERROR 42759 [neuronxcc.driver.CommandDriver]: ***************************************************************
2025-03-26T03:43:31Z ERROR 42759 [neuronxcc.driver.CommandDriver]:  An Internal Compiler Error has occurred
2025-03-26T03:43:31Z ERROR 42759 [neuronxcc.driver.CommandDriver]: ***************************************************************
2025-03-26T03:43:31Z ERROR 42759 [neuronxcc.driver.CommandDriver]: 
2025-03-26T03:43:31Z USER 42759 [neuronxcc.driver.CommandDriver]: Warning: Non-output memory location with no reader: {bias_memset.719}@SB<0,0>(128x2)#Internal DebugInfo: <bias_memset.719||UNDEF||[128, 1, 1]>
[NLA001]  Unhandled exception with message: === BIR error ===
Reason: Access pattern out of bound.
Instruction: I-6012-337-accel_sg0000
Opcode: Memset
Instruction Source: (bfloat16<27 x 460> $6012[i0_250_0_0_0, i0_250_0_0_1, i0_250_0_1, i2_250_0, i0_250_1_0, i2_163_2067_0_i2_163_2067_1_1_0_0_0, i2_163_2067_0_i2_163_2067_1_1_0_0_1, c0_2878_1_1_4745, c1_2879_4745_0_1_1, c1_2879_4745_1]:6012)0:
Argument AP:
Access Pattern: [[231,27],[1,232],[1,1]]
Offset: 0
Memory Location: {_convolution.13.4743_i90_sg0000}@SB<0,55840>(27x924)#Internal DebugInfo: <_convolution.13.4743||UNDEF||[27, 460, 1]>
 - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

I have attached two files to help with understanding the issue (the traffic.txt file is actually a Python file, but I could not upload it as it was)

log-neuron-cc (1).txt

traffic.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcompiler

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions