Performance issue when doing scalar add for non-traced array

I am seeing a massive performance hit when performing a reduction with a scalar traced variable and a rather large non-traced array. Interestingly, if instead of doing a mapreduce I split the map and then a reduction, performance rebounds. 

### MWE

```julia
using Reactant 
using BenchmarkTools

function testmapreduce(a, A)
    return sum(A) do Ax
        log(a + Ax)
    end
end

function testmap_reduce(a, A)
    return sum(log.(a .+ A))
end

ρr = ConcreteRNumber(2.0)
x = rand(16, 16)

f1 = @compile sync=true testmapreduce(ρr, x)
f2 = @compile sync=true testmap_reduce(ρr, x)



@benchmark f1($ρr, $x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  175.838 μs …  6.051 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     185.833 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   198.689 μs ± 87.604 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄██▇▆▅▄▃▂▁                           ▁▁    ▁                 ▂
  ████████████▇▆▄▃▅▅▃▄▄▁▁▁▁▃▄▃▄▃▃▁▃▄▅▅▇███████▇██▇▇▇▆▇▇▆▆▅▆▅▆▅ █
  176 μs        Histogram: log(frequency) by time       379 μs <

 Memory estimate: 704 bytes, allocs estimate: 16.


@benchmark f2($ρr, $x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.210 μs …  11.455 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.670 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.200 μs ± 154.119 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▇█▅▄▁                                                      ▁
  ▇▇█████▇▅▅▅▄▃▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▆▁▃▃▃▃▅▆ █
  29.2 μs       Histogram: log(frequency) by time       259 μs <

 Memory estimate: 704 bytes, allocs estimate: 16.
```

This seems to get worse and worse as the array `x` gets bigger. 
Looking at `@code_hlo` `testmapreduce` appears to be getting unrolled into a giant function, and the array `x` is getting split into a bunch of 1-element tensors. I don't understand Reactant's internals, so I have no idea why it chose to split the array.

### Compute environment

```julia
Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_VSCODE_REPL = 1
  JULIA_NUM_THREADS = 1
```

Project.toml
```julia
  [6e4b80f9] BenchmarkTools v1.6.3
  [3c362404] Reactant v0.2.193
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue when doing scalar add for non-traced array #2074

MWE

Compute environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance issue when doing scalar add for non-traced array #2074

Description

MWE

Compute environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions