Skip to content

Commit cc5fe23

Browse files
committed
✨ Prevent standardization warning
1 parent 569e087 commit cc5fe23

File tree

2 files changed

+7
-92
lines changed

2 files changed

+7
-92
lines changed

docs/src/tutorials/standardization/notebook.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ first(df, 5)
5757

5858
## Coerce columns to the right scientific types
5959
df = coerce(df,
60-
:NPreg => Count, # Number of pregnancies is a count
60+
:NPreg => Continuous, # Number of pregnancies will be treated as continuous
6161
:Glu => Continuous, # Glucose level is continuous
6262
:BP => Continuous, # Blood pressure is continuous
6363
:Skin => Continuous, # Skin thickness is continuous
@@ -66,6 +66,7 @@ df = coerce(df,
6666
:Age => Continuous, # Age is continuous
6767
:Type => Multiclass, # Diabetes status is our target (Yes/No)
6868
);
69+
# Notice we treat `NPreg` as continuous for broader compatibility with various MLJ models.
6970

7071
# Let's verify that our schema looks correct:
7172
schema(df)

docs/src/tutorials/standardization/notebook.md

Lines changed: 5 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -40,54 +40,6 @@ using RDatasets # To load sample datasets
4040
using Random # For reproducibility
4141
using ScientificTypes # For proper data typing
4242
using Plots # For visualizations
43-
using MLJLinearModels # For Logistic Regression
44-
````
45-
46-
````
47-
Precompiling LIBSVM...
48-
922.3 ms ✓ liblinear_jll
49-
919.7 ms ✓ libsvm_jll
50-
837.3 ms ✓ LIBLINEAR
51-
1393.2 ms ✓ LIBSVM
52-
4 dependencies successfully precompiled in 3 seconds. 33 already precompiled.
53-
Precompiling RDatasets...
54-
3340.2 ms ✓ TimeZones
55-
3632.3 ms ✓ RData
56-
2434.4 ms ✓ RDatasets
57-
3 dependencies successfully precompiled in 10 seconds. 67 already precompiled.
58-
Precompiling FileIOExt...
59-
1067.6 ms ✓ FileIO → HTTPExt
60-
2541.2 ms ✓ Plots → FileIOExt
61-
2 dependencies successfully precompiled in 3 seconds. 182 already precompiled.
62-
Precompiling TimeZonesRecipesBaseExt...
63-
470.1 ms ✓ TimeZones → TimeZonesRecipesBaseExt
64-
1 dependency successfully precompiled in 1 seconds. 27 already precompiled.
65-
Precompiling MLJLinearModels...
66-
437.0 ms ✓ OpenSpecFun_jll
67-
1694.4 ms ✓ SpecialFunctions
68-
2526.4 ms ✓ ForwardDiff
69-
972.1 ms ✓ DifferentiationInterface → DifferentiationInterfaceForwardDiffExt
70-
966.6 ms ✓ NLSolversBase
71-
1396.3 ms ✓ LineSearches
72-
2234.5 ms ✓ Optim
73-
2572.4 ms ✓ MLJLinearModels
74-
8 dependencies successfully precompiled in 13 seconds. 73 already precompiled.
75-
2 dependencies precompiled but different versions are currently loaded. Restart julia to access the new versions. Otherwise, loading dependents of these packages may trigger further precompilation to work with the unexpected versions.
76-
Precompiling DifferentiationInterfaceStaticArraysExt...
77-
581.9 ms ✓ DifferentiationInterface → DifferentiationInterfaceStaticArraysExt
78-
1 dependency successfully precompiled in 1 seconds. 10 already precompiled.
79-
Precompiling FiniteDiffStaticArraysExt...
80-
573.6 ms ✓ FiniteDiff → FiniteDiffStaticArraysExt
81-
586.8 ms ✓ ConstructionBase → ConstructionBaseStaticArraysExt
82-
2 dependencies successfully precompiled in 1 seconds. 20 already precompiled.
83-
1 dependency precompiled but a different version is currently loaded. Restart julia to access the new version. Otherwise, loading dependents of this package may trigger further precompilation to work with the unexpected version.
84-
Precompiling NNlibForwardDiffExt...
85-
630.6 ms ✓ KernelAbstractions → LinearAlgebraExt
86-
703.5 ms ✓ ForwardDiff → ForwardDiffStaticArraysExt
87-
1217.8 ms ✓ NNlib → NNlibForwardDiffExt
88-
3 dependencies successfully precompiled in 2 seconds. 46 already precompiled.
89-
1 dependency precompiled but a different version is currently loaded. Restart julia to access the new version. Otherwise, loading dependents of this package may trigger further precompilation to work with the unexpected version.
90-
9143
````
9244

9345
## Data Preparation
@@ -127,7 +79,7 @@ We'll convert our columns to their appropriate types:
12779
````julia
12880
# Coerce columns to the right scientific types
12981
df = coerce(df,
130-
:NPreg => Count, # Number of pregnancies is a count
82+
:NPreg => Continuous, # Number of pregnancies will be treated as continuous
13183
:Glu => Continuous, # Glucose level is continuous
13284
:BP => Continuous, # Blood pressure is continuous
13385
:Skin => Continuous, # Skin thickness is continuous
@@ -138,17 +90,19 @@ df = coerce(df,
13890
);
13991
````
14092

93+
Notice we treat `NPreg` as continuous for broader compatibility with various MLJ models.
94+
14195
Let's verify that our schema looks correct:
14296

14397
````julia
144-
ScientificTypes.schema(df)
98+
schema(df)
14599
````
146100

147101
````
148102
┌───────┬───────────────┬─────────────────────────────────┐
149103
│ names │ scitypes │ types │
150104
├───────┼───────────────┼─────────────────────────────────┤
151-
│ NPreg │ Count │ Int32
105+
│ NPreg │ Continuous │ Float64
152106
│ Glu │ Continuous │ Float64 │
153107
│ BP │ Continuous │ Float64 │
154108
│ Skin │ Continuous │ Float64 │
@@ -278,26 +232,6 @@ end
278232
````
279233

280234
````
281-
┌ Warning: The number and/or types of data arguments do not match what the specified model
282-
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
283-
284-
│ Run `@doc MLJLinearModels.LogisticClassifier` to learn more about your model's requirements.
285-
286-
│ Commonly, but non exclusively, supervised models are constructed using the syntax
287-
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
288-
│ constructed with `machine(model, X)`. Here `X` are features, `y` a target, and `w`
289-
│ sample or class weights.
290-
291-
│ In general, data in `machine(model, data...)` is expected to satisfy
292-
293-
│ scitype(data) <: MLJ.fit_data_scitype(model)
294-
295-
│ In the present case:
296-
297-
│ scitype(data) = Tuple{ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}}}, AbstractVector{ScientificTypesBase.Multiclass{2}}}
298-
299-
│ fit_data_scitype(model) = Tuple{ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}}, AbstractVector{<:ScientificTypesBase.Finite}}
300-
└ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:237
301235
[ Info: Training machine(LogisticClassifier(lambda = 2.220446049250313e-16, …), …).
302236
┌ Info: Solver: MLJLinearModels.LBFGS{Optim.Options{Float64, Nothing}, @NamedTuple{}}
303237
│ optim_options: Optim.Options{Float64, Nothing}
@@ -308,26 +242,6 @@ end
308242
┌ Info: Solver: MLJLinearModels.LBFGS{Optim.Options{Float64, Nothing}, @NamedTuple{}}
309243
│ optim_options: Optim.Options{Float64, Nothing}
310244
└ lbfgs_options: @NamedTuple{} NamedTuple()
311-
┌ Warning: The number and/or types of data arguments do not match what the specified model
312-
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
313-
314-
│ Run `@doc LIBSVM.SVC` to learn more about your model's requirements.
315-
316-
│ Commonly, but non exclusively, supervised models are constructed using the syntax
317-
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
318-
│ constructed with `machine(model, X)`. Here `X` are features, `y` a target, and `w`
319-
│ sample or class weights.
320-
321-
│ In general, data in `machine(model, data...)` is expected to satisfy
322-
323-
│ scitype(data) <: MLJ.fit_data_scitype(model)
324-
325-
│ In the present case:
326-
327-
│ scitype(data) = Tuple{ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}}}, AbstractVector{ScientificTypesBase.Multiclass{2}}}
328-
329-
│ fit_data_scitype(model) = Union{Tuple{ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}}, AbstractVector{<:ScientificTypesBase.Finite}}, Tuple{ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}}, AbstractVector{<:ScientificTypesBase.Finite}, Any}}
330-
└ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:237
331245
[ Info: Training machine(SVC(kernel = RadialBasis, …), …).
332246
[ Info: Training machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
333247
[ Info: Training machine(:standardizer, …).

0 commit comments

Comments
 (0)