Skip to content

Serializability of SVMs with user-defined/callable kernels #91

@till-m

Description

@till-m

As has been noted by @barucden in #88, SVMs with user-defined/callable kernels are generally not (de-)serializable. Since the issue has recently been brought up again in conjunction with downstream changes in JuliaAI/MLJLIBSVMInterface.jl#13 it would probably be worth having an issue one can reference to track the problem and collate discussion.

Current situation:

An SVM with a user-defined/callable kernel can be serialized and deserialized without problem, while the kernel function is available:

using LIBSVM
using Serialization

X = [-2 -1 -1 1 1 2;
     -1 -1 -2 1 2 1]
y = [1, 1, 1, 2, 2, 2]

kernel(x1, x2) = x1' * x2

model = svmtrain(X, y, kernel=kernel)

ỹ, _ = svmpredict(model, X)
print(y == ỹ) #true

serialize("serialized_svm.jls", model)

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ŷ, _ = svmpredict(model, T)
print([1, 2, 2] == ŷ) #ŧrue

After exiting and re-entering REPL, kernel is undefined:

using LIBSVM
using Serialization

model = deserialize("serialized_svm.jls") #error

execution fails with

model = deserialize("serialized_svm.jls")
ERROR: UndefVarError: #kernel not defined
Stacktrace:
 [1] open(f::typeof(deserialize), args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./io.jl:330
 [2] open
   @ ./io.jl:328 [inlined]
 [3] deserialize(filename::String)
   @ Serialization /usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:798
 [4] top-level scope
   @ REPL[3]:1

If kernel is defined at the time deserialize is called, the code works:

using LIBSVM
using Serialization

kernel(x1, x2) = x1' * x2

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ỹ, _ = svmpredict(model, T)

print([1, 2, 2] == ỹ) #true

In contrast, serialization using built-in kernels works without a problem:

using LIBSVM
using Serialization

X = [-2 -1 -1 1 1 2;
     -1 -1 -2 1 2 1]
y = [1, 1, 1, 2, 2, 2]

model = svmtrain(X, y, kernel=Kernel.Linear)

ỹ, _ = svmpredict(model, X)

print(y == ỹ) #true

serialize("serialized_svm.jls", model)

After exiting and re-entering REPL:

using LIBSVM
using Serialization

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ỹ, _ = svmpredict(model, T)

print([1, 2, 2] == ỹ) #true

Possible Courses

I don't have too much experience with Julia and Serialization.jl in particular, but I see a few ways of tackling this issue:

  • Leaving the current state, since there is no "misleading" behaviour. The error message seems pretty clear, at least to me.
  • Additionally, adding a note in the README mentioning that serialization doesn't work for user-defined/callable kernels (I haven't gotten around to Doc: add callable kernel example in README #89, will try to work on that over easter or slightly after)
  • Maybe it is possible to provide custom serialization strategies that allow us to properly serialize trained models with custom/user-defined kernels. This is probably not possible using Serialization.jl, since its functionality seems to be rather restricted, but JLD.jl can do it, I think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions