Skip to content

Commit bc8dc88

Browse files
willy-liuhcho3
andauthored
fix: Depth Calculation for iForest importer (#617)
* fix: Depth Calculation for iForest importer Corrected the path depth calculation for `IsolationForest` model conversion to match sklearn's implementation. In Isolation Forest, the anomaly score is computed as: s(x) = 2^(-E[h(x)] / c(n)) Where: - E[h(x)] is the expected path length for sample x - c(n) = 2 * H(n - 1) - (2 * (n - 1)) / n - H(n) is the nth harmonic number Previously, Treelite used the digamma-based formulation for the harmonic number: H(n) = ψ(n + 1) + γ and computed c(n) as: c(n) = float(2 * (harmonic(n) - 1)) While mathematically valid, this differs from sklearn’s implementation and the original paper. sklearn approximates H(n) as: H(n) ≈ ln(n) + γ and computes: c(n) = 2 * H(n - 1) - (2 * (n - 1)) / n This PR changes Treelite’s calculation to match sklearn's implementation, ensuring consistency in path length and anomaly score computations, especially for small n where numerical differences may affect results. * Fix formatting * Update pytest --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Hyunsu Cho <phcho@nvidia.com>
1 parent a78b6cf commit bc8dc88

File tree

3 files changed

+4
-5
lines changed

3 files changed

+4
-5
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ repos:
6363
rev: v1.17.1
6464
hooks:
6565
- id: mypy
66-
additional_dependencies: [types-setuptools, numpy]
66+
additional_dependencies: [types-setuptools]
6767
- repo: https://github.com/astral-sh/ruff-pre-commit
6868
rev: v0.12.8
6969
hooks:

python/treelite/sklearn/isolation_forest.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
"""Utility functions for loading IsolationForest models"""
22

33
import numpy as np
4-
from scipy.special import psi
54

65

76
def harmonic(number):
87
"""Calculates the n-th harmonic number"""
9-
return psi(number + 1) + np.euler_gamma
8+
return np.log(number) + np.euler_gamma
109

1110

1211
def expected_depth(n_remainder):
@@ -15,7 +14,7 @@ def expected_depth(n_remainder):
1514
return 0.0
1615
if n_remainder == 2:
1716
return 1.0
18-
return float(2 * (harmonic(n_remainder) - 1))
17+
return float(2 * harmonic(n_remainder - 1) - 2 * (n_remainder - 1) / n_remainder)
1918

2019

2120
def calculate_depths(isolation_depths, tree, curr_node, curr_depth):

tests/python/test_sklearn_integration.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ def test_skl_converter_iforest(dataset):
192192

193193
tl_model = treelite.sklearn.import_model(clf)
194194
out_pred = treelite.gtil.predict(tl_model, X)
195-
np.testing.assert_almost_equal(out_pred, expected_pred, decimal=2)
195+
np.testing.assert_almost_equal(out_pred, expected_pred)
196196

197197

198198
@given(

0 commit comments

Comments
 (0)