Skip to content

Commit 3be5ed0

Browse files
committed
fix: error for large sample sizes in Multinomial NB
I was seeing SystemStackError issues with Multinomial NB when running with larger sample sizes. I believe this is due to splatting the samples into `Numo::DFloat` as params. That param list can be very large with large sample sets, which overflows Ruby's default heap size. I think instead we can use `Numo::DFloat.cast` here, passing in the array, as that will just be a single param. I've also lifted this cast to the `bin_x` variable so we don't have to do it per class.
1 parent fd12b80 commit 3be5ed0

File tree

2 files changed

+14
-2
lines changed

2 files changed

+14
-2
lines changed

rumale-naive_bayes/lib/rumale/naive_bayes/multinomial_nb.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@ def decision_function(x)
6565
x = ::Rumale::Validation.check_convert_sample_array(x)
6666

6767
n_classes = @classes.size
68-
bin_x = x.gt(0)
68+
bin_x = Numo::DFloat.cast(x.gt(0))
6969
log_likelihoods = Array.new(n_classes) do |l|
70-
Math.log(@class_priors[l]) + (Numo::DFloat[*bin_x] * Numo::NMath.log(@feature_probs[l, true])).sum(axis: 1)
70+
Math.log(@class_priors[l]) + (bin_x * Numo::NMath.log(@feature_probs[l, true])).sum(axis: 1)
7171
end
7272
Numo::DFloat[*log_likelihoods].transpose.dup
7373
end

rumale-naive_bayes/spec/rumale/naive_bayes/multinomial_nb_spec.rb

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,16 @@
5151
expect(probs.shape[1]).to eq(n_classes)
5252
expect(predicted_by_probs).to eq(y)
5353
end
54+
55+
context 'with large sample sizes' do
56+
let(:x) { Numo::DFloat.new(1_000_000, 4).rand }
57+
let(:y) { Numo::Int32.new(1_000_000).rand(-1, 1) }
58+
59+
it 'does not raise SystemStackError', :aggregate_failures do
60+
expect do
61+
estimator
62+
score
63+
end.not_to raise_error
64+
end
65+
end
5466
end

0 commit comments

Comments
 (0)