You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# We notice that one-hot-encoding was the most performant here followed by target encoding.
95
-
# Ordinal encoding also produced decent results because we can perceive all the categorical variables to be ordered
96
-
# On the other hand, frequency encoding lagged behind. Observe that this method doesn't distinguish categories from one another if they occur with similar frequencies.
97
-
#
126
+
#
127
+
# ### Performance Summary
128
+
# The results show OneHot encoding performing best, followed by Target encoding, with Ordinal and Frequency encoders showing lower performance.
129
+
#
130
+
# The confidence intervals should be interpreted with caution and primarily serve to illustrate uncertainty rather than provide definitive statistical significance tests.
131
+
# See Bengio & Grandvalet, 2004: "No Unbiased Estimator of the Variance of K-Fold Cross-Validation"). That said, reporting the interval is still more informative than reporting only the mean.
132
+
133
+
# Prepare data for plotting
134
+
labels = results.pipeline
135
+
mean_acc = results.accuracy
136
+
ci_lower = results.ci_lower
137
+
ci_upper = results.ci_upper
138
+
139
+
# Error bars: distance from mean to CI bounds
140
+
lower_err = mean_acc .- ci_lower
141
+
upper_err = ci_upper .- mean_acc
142
+
143
+
bar(
144
+
labels,
145
+
mean_acc,
146
+
yerror = (lower_err, upper_err),
147
+
legend =false,
148
+
xlabel ="Encoder + SVM",
149
+
ylabel ="Accuracy",
150
+
title ="Mean Accuracy with 95% Confidence Intervals",
0 commit comments