Why do the data types of `X` and `y` need to match? #996

ma-sadeghi · 2025-07-25T22:41:23Z

ma-sadeghi
Jul 25, 2025

Example:

$$ y(x) = jx + x^a + b,\quad j = \sqrt{-1} $$

Here, $x$, $a$, and $b$ are real, but $y$ is complex. To make this work in SymbolicRegression.jl, we can add a fictitious 0im to X (X = X .+ 0im). This works for simple cases but slows down complex ones, since the search now explores the complex domain, even though all constants are real.

Any way to avoid this and keep constants real? I know there's a workaround via writing custom loss functions, but I'm asking more generally why are we enforcing the type match in the first place?

MilesCranmer · 2025-07-26T11:06:37Z

MilesCranmer
Jul 26, 2025
Maintainer

If you use the Julia backend directly, there is no such limitation. The Python side is a bit more rigid about this, which is partly for ease of use (so that most stuff “just works”) but also because of limitations in Scikit-Learn. For example, Scikit-Learn’s interface doesn’t even allow complex inputs to begin with (!), so PySR has to monkey patch it to force it to allow complex numbers. But MLJ.jl which is the Julia equivalent to Scikit-Learn, which SymbolicRegression.jl’s backend hooks into, allows arbitrary input and output types. (Even strings are allowed! https://ai.damtp.cam.ac.uk/symbolicregression/stable/examples/custom_types/)

I guess in principle we could try to set up this on the Python side but it might be a pain and I feel the return on investment is low. Not sure

5 replies

ma-sadeghi Jul 29, 2025
Author

Thanks for the quick reply. I'm using the Julia backend directly (via the low-level API, not MLJ). Here's an MWE:

using SymbolicRegression

n = 100

X = reshape(range(0, 1, n), 1, n)
y = @. X + X^2 * im

options = Options(;
    binary_operators=[+, *],
    populations=40,
    verbosity=1,
    should_simplify=true,
)

@time hall_of_fame = equation_search(
    X, y; niterations=60, options=options, parallelism=:multithreading
);

ERROR: Element type of `x` is Float64 is different from element type of `y` which is ComplexF64.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _loss(::Vector{Float64}, ::Vector{ComplexF64}, ::L2DistLoss)
    @ SymbolicRegression.LossFunctionsModule ~/.julia/packages/SymbolicRegression/qoxyo/src/LossFunctions.jl:25
  [3] _eval_loss
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/LossFunctions.jl:109 [inlined]
  [4] eval_loss(tree::Expression{…}, dataset::SymbolicRegression.CoreModule.DatasetModule.BasicDataset{…}, options::Options{…}; regularization::Bool, idx::Nothing)
    @ SymbolicRegression.LossFunctionsModule ~/.julia/packages/SymbolicRegression/qoxyo/src/LossFunctions.jl:155
  [5] eval_loss
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/LossFunctions.jl:139 [inlined]
  [6] update_baseline_loss!
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/LossFunctions.jl:225 [inlined]
  [7] _validate_options(datasets::Vector{…}, ropt::SymbolicRegression.SearchUtilsModule.RuntimeOptions{…}, options::Options{…})
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:597
  [8] _equation_search(datasets::Vector{…}, ropt::SymbolicRegression.SearchUtilsModule.RuntimeOptions{…}, options::Options{…}, saved_state::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:567
  [9] equation_search(datasets::Vector{…}; options::Options{…}, saved_state::Nothing, runtime_options::Nothing, runtime_options_kws::@Kwargs{…})
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:561
 [10] equation_search
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:542 [inlined]
 [11] #equation_search#23
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:511 [inlined]
 [12] macro expansion
    @ ./timing.jl:581 [inlined]
 [13] top-level scope
    @ ~/Code/ecm-sr/archive.jl:36
Some type information was truncated. Use `show(err)` to see complete types.

MilesCranmer Jul 29, 2025
Maintainer

Is the expectation here that SymbolicRegression.jl would automatically promote types?

ma-sadeghi Jul 29, 2025
Author

Sorry, this was a bad example. Currently, the only way for y to be complex is iff x is complex. Here's the refined version:

Suppose we add a new unary operator g(x) = x*im. Now, we can potentially produce complex y with real-valued x. Now, SR.jl complains that the type of the output of g must match its input:

using SymbolicRegression

n = 100

g(x) = im * x

X = reshape(range(0, 1, n), 1, n)
y = @. X + (X * im)^2

options = Options(;
    binary_operators=[+, *], unary_operators=[g], populations=40, verbosity=1, should_simplify=true
)

@time hall_of_fame = equation_search(
    X, y; niterations=60, options=options, parallelism=:multithreading
);

ERROR: The operator `g` returned an output of type `ComplexF64`, when it was given an input -100.0 of type `Float64`. Please ensure that your operators return the same type as their inputs.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] test_operator(op::Function, x::Float64, y::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/Configure.jl:21
  [3] test_operator(op::Function, x::Float64)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/Configure.jl:6
  [4] assert_operators_well_defined(T::Type, options::Options{…})
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/Configure.jl:55
  [5] test_option_configuration(parallelism::Symbol, datasets::Vector{…}, options::Options{…}, verbosity::Int64)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/Configure.jl:85
  [6] _validate_options(datasets::Vector{…}, ropt::SymbolicRegression.SearchUtilsModule.RuntimeOptions{…}, options::Options{…})
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:593
  [7] _equation_search(datasets::Vector{…}, ropt::SymbolicRegression.SearchUtilsModule.RuntimeOptions{…}, options::Options{…}, saved_state::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:567
  [8] equation_search(datasets::Vector{…}; options::Options{…}, saved_state::Nothing, runtime_options::Nothing, runtime_options_kws::@Kwargs{…})
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:561
  [9] equation_search
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:542 [inlined]
 [10] #equation_search#23
    @ ~/.julia/packages/SymbolicRegression/qoxyo/src/SymbolicRegression.jl:511 [inlined]
 [11] macro expansion
    @ ./timing.jl:581 [inlined]
 [12] top-level scope
    @ ~/Code/ecm-sr/temp.jl:315
Some type information was truncated. Use `show(err)` to see complete types.

MilesCranmer Jul 29, 2025
Maintainer

Ah, I see. Yes so this is the correct behaviour and is expected. We want the operators to return the same output type as input. There are expectations of type stability throughout the library. This is so that arrays can be preallocated in the correct type

MilesCranmer Jul 29, 2025
Maintainer

You can use GenericOperatorEnum to get around such restrictions though; just know it will be slower

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do the data types of `X` and `y` need to match? #996

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why do the data types of X and y need to match? #996

Uh oh!

ma-sadeghi Jul 25, 2025

Replies: 1 comment · 5 replies

Uh oh!

MilesCranmer Jul 26, 2025 Maintainer

Uh oh!

Uh oh!

ma-sadeghi Jul 29, 2025 Author

Uh oh!

MilesCranmer Jul 29, 2025 Maintainer

Uh oh!

ma-sadeghi Jul 29, 2025 Author

Uh oh!

MilesCranmer Jul 29, 2025 Maintainer

Uh oh!

MilesCranmer Jul 29, 2025 Maintainer

Why do the data types of `X` and `y` need to match? #996

ma-sadeghi
Jul 25, 2025

Replies: 1 comment 5 replies

MilesCranmer
Jul 26, 2025
Maintainer

ma-sadeghi Jul 29, 2025
Author

MilesCranmer Jul 29, 2025
Maintainer

ma-sadeghi Jul 29, 2025
Author

MilesCranmer Jul 29, 2025
Maintainer

MilesCranmer Jul 29, 2025
Maintainer