Skip to content

Segmentation fault with categorical variables #322

@simsurace

Description

@simsurace

After encountering a problem when fitting with categorical features, I found the following minimal reproducing example for triggering a segfault:

julia> versioninfo()
Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 12 × Apple M3 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
Environment:
  JULIA_EDITOR = code

julia> using CategoricalArrays

julia> using DataFrames

julia> using EvoTrees

julia> for max_value in 1:255
           values = 1:max_value

           df = DataFrame()
           df.cat_feature = map(1:10^3) do _
               rand(values)
           end |> categorical 
           df.y = rand(10^3)

           config = EvoTreeRegressor()

           EvoTrees.fit(
               config,
               df;
               target_name = "y",
               feature_names = ["cat_feature"],
           )
           @info "Successfull for max_value = $max_value"
       end
[ Info: Successfull for max_value = 1
... # (left out intermediate log messages for simplicity)
[ Info: Successfull for max_value = 67

[98526] signal 11 (2): Segmentation fault: 11
in expression starting at REPL[4]:1
dataids at ./abstractarray.jl:1561 [inlined]
dataids at ./abstractarray.jl:1562 [inlined]
mightalias at ./multidimensional.jl:1065 [inlined]
unalias at ./abstractarray.jl:1500 [inlined]
broadcast_unalias at ./broadcast.jl:946 [inlined]
preprocess at ./broadcast.jl:953 [inlined]
preprocess_args at ./broadcast.jl:956 [inlined]
preprocess_args at ./broadcast.jl:955 [inlined]
preprocess at ./broadcast.jl:952 [inlined]
copyto! at ./broadcast.jl:969 [inlined]
copyto! at ./broadcast.jl:925 [inlined]
materialize! at ./broadcast.jl:883 [inlined]
materialize! at ./broadcast.jl:880 [inlined]
get_best_split at /Users/simone/.julia/packages/EvoTrees/LRLAF/src/fit-utils.jl:429
macro expansion at /Users/simone/.julia/packages/EvoTrees/LRLAF/src/fit.jl:94 [inlined]
#980#threadsfor_fun#149 at ./threadingconstructs.jl:253
#980#threadsfor_fun at ./threadingconstructs.jl:220 [inlined]
#1 at ./threadingconstructs.jl:154
unknown function (ip: 0x120f20053)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/./julia.h:2157 [inlined]
start_task at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/task.c:1204
Allocations: 38616310 (Pool: 38614614; Big: 1696); GC: 28
[1]    98526 segmentation fault  julia --project=.

When running Julia with bounds checks, I get:

ERROR: TaskFailedException

    nested task error: TaskFailedException
    
        nested task error: BoundsError: attempt to access 3×64×1 Array{Float64, 3} at index [1, 65, 1]
        Stacktrace:
          [1] throw_boundserror(A::Array{Float64, 3}, I::Tuple{Int64, Int64, Int64})
            @ Base ./essentials.jl:14
          [2] checkbounds
            @ ./abstractarray.jl:699 [inlined]
          [3] getindex
            @ ./array.jl:929 [inlined]
          [4] getindex
            @ ./abstractarray.jl:1315 [inlined]
          [5] macro expansion
            @ ~/.julia/packages/EvoTrees/LRLAF/src/fit-utils.jl:329 [inlined]
          [6] macro expansion
            @ ./simdloop.jl:77 [inlined]
          [7] macro expansion
            @ ~/.julia/packages/EvoTrees/LRLAF/src/fit-utils.jl:327 [inlined]
          [8] (::EvoTrees.var"#899#threadsfor_fun#139"{EvoTrees.var"#899#threadsfor_fun#138#140"{…}})(tid::Int64; onethread::Bool)
            @ EvoTrees ./threadingconstructs.jl:253
          [9] #899#threadsfor_fun
            @ ./threadingconstructs.jl:220 [inlined]
         [10] (::Base.Threads.var"#1#2"{EvoTrees.var"#899#threadsfor_fun#139"{EvoTrees.var"#899#threadsfor_fun#138#140"{…}}, Int64})()
            @ Base.Threads ./threadingconstructs.jl:154
    Stacktrace:
     [1] threading_run(fun::EvoTrees.var"#899#threadsfor_fun#139"{EvoTrees.var"#899#threadsfor_fun#138#140"{…}}, static::Bool)
       @ Base.Threads ./threadingconstructs.jl:173
     [2] macro expansion
       @ ./threadingconstructs.jl:190 [inlined]
     [3] update_hist!
       @ ~/.julia/packages/EvoTrees/LRLAF/src/fit-utils.jl:326 [inlined]
     [4] macro expansion
       @ ~/.julia/packages/EvoTrees/LRLAF/src/fit.jl:82 [inlined]
     [5] (::EvoTrees.var"#952#threadsfor_fun#150"{EvoTrees.var"#952#threadsfor_fun#147#151"{…}})(tid::Int64; onethread::Bool)
       @ EvoTrees ./threadingconstructs.jl:253
     [6] #952#threadsfor_fun
       @ ./threadingconstructs.jl:220 [inlined]
     [7] (::Base.Threads.var"#1#2"{EvoTrees.var"#952#threadsfor_fun#150"{EvoTrees.var"#952#threadsfor_fun#147#151"{…}}, Int64})()
       @ Base.Threads ./threadingconstructs.jl:154
Stacktrace:
 [1] threading_run(fun::EvoTrees.var"#952#threadsfor_fun#150"{EvoTrees.var"#952#threadsfor_fun#147#151"{…}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:173
 [2] macro expansion
   @ ./threadingconstructs.jl:190 [inlined]
 [3] grow_tree!(tree::EvoTrees.Tree{…}, nodes::Vector{…}, params::EvoTreeRegressor, ∇::Matrix{…}, js::Vector{…}, is::Vector{…}, left::Vector{…}, right::Vector{…}, x_bin::Matrix{…}, feattypes::Vector{…}, monotone_constraints::Vector{…})
   @ EvoTrees ~/.julia/packages/EvoTrees/LRLAF/src/fit.jl:81
 [4] grow_evotree!(m::EvoTree{…}, cache::EvoTrees.CacheBaseCPU{…}, params::EvoTreeRegressor)
   @ EvoTrees ~/.julia/packages/EvoTrees/LRLAF/src/fit.jl:21
 [5] fit(params::EvoTreeRegressor, dtrain::DataFrame; target_name::String, feature_names::Vector{…}, weight_name::Nothing, offset_name::Nothing, deval::Nothing, print_every_n::Int64, verbosity::Int64)
   @ EvoTrees ~/.julia/packages/EvoTrees/LRLAF/src/fit.jl:339
 [6] top-level scope
   @ REPL[4]:12
Some type information was truncated. Use `show(err)` to see complete types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions