Skip to content

[Bug] layer_norm` Relax/TIR module crashes the process on Windows #19582

@lrcyyds1

Description

@lrcyyds1

[Bug] `

layer_norm_only_driver.py
layer_norm_only_reproducer.py

Expected behavior

TVM should either:

  • compile and execute the provided Relax/TIR module successfully, or
  • raise a normal Python / TVM exception if the module is unsupported.

It should not terminate the whole process with a native crash.

Actual behavior

TVM terminates the process with a native exit while compiling or executing the module.

Observed failure signature:

  • process_error
  • return code: 3221225477

This issue was first found by donor-based differential fuzzing around FuseOps, but it was later isolated to a standalone reproducer that no longer depends on:

  • multi-donor composition
  • a specific NNSmith seed
  • aggressive bridge / adapter expansion
  • the relu path

The current evidence points to the layer_norm path itself.

Environment

  • OS: Windows
  • Python: 3.11.15
  • TVM Python version: 0.24.dev0
  • TVM git commit: 14751b34913394bd4b621338f702874c51cffb9a
  • TVM short SHA: 14751b3
  • TVM checkout: D:\code\keyan\apache-tvm
  • TVM_HOME: D:\code\keyan\apache-tvm
  • TVM_LIBRARY_PATH: D:\code\keyan\apache-tvm\build-llvm18\RelWithDebInfo
  • PYTHONPATH: D:\code\keyan\apache-tvm\python
  • Target: llvm

Steps to reproduce

Minimal reproducer files:

  • layer_norm_only_reproducer.py
  • layer_norm_only_driver.py

The reproducer module contains:

  • one TIR layer_norm function
  • one Relax main
  • one R.call_tir(cls.layer_norm, ...)

Standalone runner:

from pathlib import Path

import numpy as np
import tvm
from tvm import relax
from tvm.script import ir as I, relax as R, tirx as T


def load_module():
    script = Path("layer_norm_only_reproducer.py").read_text(encoding="utf-8")
    namespace = {"I": I, "R": R, "T": T, "tvm": tvm}
    exec(script, namespace)  # noqa: S102
    return namespace["Module"]


mod = load_module()
rng = np.random.default_rng(0)
x = rng.standard_normal((1, 512, 64, 64), dtype=np.float32)
mean = rng.standard_normal((64, 64), dtype=np.float32)
var = rng.standard_normal((64, 64), dtype=np.float32)

ex = tvm.compile(mod, target="llvm")
vm = relax.VirtualMachine(ex, tvm.cpu())
out = vm["main"](
    tvm.runtime.tensor(x),
    tvm.runtime.tensor(mean),
    tvm.runtime.tensor(var),
)
print(type(out).__name__)

Observed result:

  • the process exits abnormally instead of returning normally

Why this does not look like a false positive

The following have already been checked:

  1. Donor-combination isolation
  • the crash still happens with donor 0010__test_layer_norm_silu alone
  1. Seed isolation
  • the crash still happens on an exact-shape synthetic host
  1. Bridge isolation
  • the exact-shape host removes the need for aggressive scalar-to-large-tensor expansion
  • the crash still remains
  1. Component split
  • relu_only runs successfully
  • layer_norm_only crashes

So the issue appears to be concentrated in the layer_norm path.

Shape sensitivity

This issue is shape-sensitive.

For fixed:

  • n = 1
  • h = 64
  • w = 64

Observed:

  • c = 90 -> OK
  • c = 91 -> crash

For fixed:

  • n = 1
  • c = 91

Observed:

  • h = w = 64 -> crash
  • h = w = 32 -> crash
  • h = w = 16 -> crash
  • h = w = 8 -> crash
  • h = w = 4 -> crash
  • h = w = 2 -> crash
  • h = w = 1 -> OK

This suggests the issue is not random and is sensitive to exact shape / lowering conditions.

Triage

  • needs-triage

Additional supporting evidence

If needed, I can also provide:

  • the original fuzzing manifest
  • the exact-shape isolation case
  • the internal minimization dossier
  • the standalone reproducer files directly

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions