FIX Isolation forest path length for small samples #19087

Konrad0 · Jan 1, 2021

Reference Issues/PRs

Fixes #15724: iForest average path length for small samples (sklearn.ensemble.IsolationForest)
Also, to mitigate the numerical instability issue outlined in #16721 and #16967, a threshold value is implemented as suggested in the aforementioned PRs.

What does this implement/fix?

An improved approximation for the average path length in isolation forests is introduced, based on more a accurate harmonic number calculation. The routine _average_path_length() is mostly replaced. This leads to more accurate anomaly scores.

Any other comments?

For the harmonic number calculation, see Wikipedia (https://en.wikipedia.org/wiki/Harmonic_number#Calculation) or the following publications and references therein. Further improvement regarding accuracy and/or performance may be feasible using e.g. a Ramanujan expansion.
Villarino, M. Ramanujan's Harmonic Number Expansion into Negative Powers of a Triangular Number. https://arxiv.org/abs/0707.3950
or
Wang, W. Harmonic Number Expansions of the Ramanujan Type. Results Math 73, 161 (2018). https://doi.org/10.1007/s00025-018-0920-8

…#15724) An improved approximation for the average path length in isolation forests is introduced, based on more a accurate harmonic number calculation. The routine _average_path_length() is mostly replaced.

Since the decision function is subject to numerical fluctuations, an anomaly threshold of -1e-15 is defined as suggested in issues

kyrajeep · Jan 4, 2021

Thank you. Was the paper you cite published? Also, there were a couple tests that didn't pass.

Konrad0 · Jan 4, 2021

Hi @kyrajeep,

yes, both references are published, the latter as mentioned above, the former here: Villarino, M.B.: Ramanujan’s harmonic number expansion into negative powers of a triangular number. JIPAM. J. Inequal. Pure Appl. Math. 9(3), 89 (2008). (https://www.emis.de/journals/JIPAM/article1026.html?sid=1026).
Both references contain the asymptotic expansion applied here.

The failing tests seem to be unrelated to these changes as far as I can see, maybe you could take a look? Thanks!

kyrajeep · Jan 5, 2021

Hi @ Konrad0,

Thank you for providing the links to the publications. I am not familiar with the testing framework here yet but I may be able to review in the next few days after learning it. Perhaps could someone more experienced with testing for scikit learn take a look? Thanks!

Hi @kyrajeep,

yes, both references are published, the latter as mentioned above, the former here: Villarino, M.B.: Ramanujan’s harmonic number expansion into negative powers of a triangular number. JIPAM. J. Inequal. Pure Appl. Math. 9(3), 89 (2008). (https://www.emis.de/journals/JIPAM/article1026.html?sid=1026).
Both references contain the asymptotic expansion applied here.

The failing tests seem to be unrelated to these changes as far as I can see, maybe you could take a look? Thanks!

Konrad0 · Jan 13, 2021

Hi @albertcthomas, @jnothman and all,

could one of you review this PR or maybe suggest someone else who might be able to do it? Thanks!
(As mentioned above I suspect at least some of the failed tests are unrelated to the changes implemented here.)

Konrad0 · Jan 24, 2021

Hi @cmarmo,

since you commented on the related PR #16967, could you maybe take a look at this PR? As mentioned above I suspect the test failures are unrelated to the changes implemented here. Thanks!

cmarmo · Jan 25, 2021

Hi @Konrad0 thanks for your patience! The linting issue is related to the renaming of the main branch, I can't tell about the other one as the build is no longer available. I'm going to close and reopen the PR in order to trig the builds.

Konrad0 · Jan 25, 2021

Thanks @cmarmo, now all checks passed 😄

Konrad0 · Jan 27, 2021

Hi @david-cortes,
I saw that you were working on another issue regarding isolation forests, maybe you'd like to review this one? It's a small fix and all checks pass already.

david-cortes · Jan 27, 2021

@Konrad0 I'm not a member of scikit-learn's team so I'm afraid I wouldn't qualify as a reviewer.

That being said, I took a look at the PR and it looks fine, but would wonder: why stop at 50? why not for example use a non-hard-coded harmonic calculation for higher up numbers up to a bigger threshold? At 50 observations the gap between real and approximate is roughly 1e-2, which is the same difference as when having 1 more observation in a node.

As a suggestion, this technique is quite fast and precise (algorithm 4):
http://fredrik-j.blogspot.com/2009/02/how-not-to-compute-harmonic-numbers.html

You can check an implementation of it here in this alternative isolation forest package that I maintain:
https://github.com/david-cortes/isotree/blob/b5f5688bf59e2c6efebd84d6e0ccb33081d65ad6/src/utils.cpp#L114
(hard-codes first couple numbers, then uses that formula up to some threshold, then uses approximation)

Konrad0 · Jan 27, 2021

Hi @david-cortes,
thanks for your input. I'm not sure where you get the number 1e-2 for 50 observations. Besides the lookup table, an improved approximation was implemented which is accurate at the level of the roundoff error for double precision (~2e-16) for more than 50 observations. Please let me know if I've misunderstood your point, I'd be glad to improve the implemented approximation!

david-cortes · Jan 27, 2021

@Konrad0 Apologies, hadn't noticed the part in which you added an extra term on the approximation. I did some testing against actual harmonic numbers up to 512 and can indeed confirm that the difference is at most 1e-15.

I'm now looking at the PR in more detail, and would like to comment:

In function _average_path_length_small, I think it'd be faster to put the hard-coded array outside of the function instead of re-generating it every time the function is called.
Why do you reshape n_samples_leaf? Why not better reshape average_path_length at the end?
Would be better to delete the commented-out line in _average_path_length_small.

Konrad0 · Jan 27, 2021

Hi @david-cortes,
thanks for your suggestions, they are implemented in the latest commit.

* In function `_average_path_length_small`, I think it'd be faster to put the hard-coded array outside of the function instead of re-generating it every time the function is called.

You're right, changed. BTW, I expect some PEP8 problems here since I wanted to keep the list compact, so I didn't indent it as far as suggested.

* Why do you reshape `n_samples_leaf`? Why not better reshape `average_path_length` at the end?

That was kept from the previously existing code, so I'm also not sure why it was originally done this way.

* Would be better to delete the commented-out line in `_average_path_length_small`.

Resolved in first point.

@cmarmo: Can anything be done about the failing lint tests?

cmarmo · Jan 28, 2021

HI @Konrad0 it is a renaming problem again... have you synchronized with upstream/main? This should solve the issue.

david-cortes · Jan 28, 2021

For the failing tests, you can use the command line utility flake8 to auto-correct the code formatting. Also you'd probably need to add an entry in the what's new for the next version.

…small_samples

Konrad0 · Jan 28, 2021

Thanks @cmarmo for the hint, now the lint tests pass.

@david-cortes The entry in what's new was added. For PEP8 compliance I had just missed a line break at the start of the array...

Konrad0 · Jul 4, 2021

Any chance of getting the review going? Do you have an idea, @cmarmo?

cmarmo · Jul 5, 2021

Hi @Konrad0, thanks for your patience and your work so far! Do you mind synchronizing with upstream? Then perhaps @jjerphan and @albertcthomas might want to have a look? Thanks!

albertcthomas

Sorry for the late review @Konrad0. Please see my first comments. Just to make sure I understand, the new implementation (including the lookup table) is based on the 2 references you mentionned in the PR description and the code? You are solving the numerical issue at the same time so please add it to the PR description with the already opened PRs (I think you already added this information in a comment of the associated issue but it would be better to find it here as well). Thanks a lot!

albertcthomas · Jul 5, 2021

sklearn/ensemble/_iforest.py

@@ -311,7 +312,8 @@ def predict(self, X):
        check_is_fitted(self)
        X = check_array(X, accept_sparse='csr')
        is_inlier = np.ones(X.shape[0], dtype=int)
-        is_inlier[self.decision_function(X) < 0] = -1
+        # is_inlier[self.decision_function(X) < 0] = -1
+        is_inlier[self.decision_function(X) < -1.0e-15] = -1


Could we use -np.finfo(float).eps instead of -1.0e-15?

Sure, but I had to double it to -2*np.finfo(float).eps. Anyway, I have a proper fix for the numerical instability lined up, but in order for it to work, the path length has to be fixed first.

albertcthomas · Jul 5, 2021

sklearn/ensemble/tests/test_iforest.py

-    result_one = 2.0 * (np.log(4.0) + np.euler_gamma) - 2.0 * 4.0 / 5.0
-    result_two = 2.0 * (np.log(998.0) + np.euler_gamma) - 2.0 * 998.0 / 999.0
+    result_5 = 77.0/30.0
+    result_999 = 12.9689417211006898253130364


where is this number coming from?

These are the exact values for the test cases, can be calculated using e.g. SymPy.

so this is a number with a finite number of decimals?

No, it's a rounded rational number. It can be written as an exact fraction, but then the numerator as well as the denominator are both several hundred digits long.

Then maybe replace the exact value comment by saying that it is a rounded rational number

albertcthomas · Jul 5, 2021

sklearn/ensemble/_iforest.py

@@ -463,6 +465,27 @@ def _more_tags(self):
        }


+_average_path_length_small = np.array((


could you add a small comment above to say that this is a lookup table used in the _average_path_length function below?

albertcthomas · Jul 5, 2021

sklearn/ensemble/_iforest.py

+    average_path_length[mask_small] = _average_path_length_small[
+        n_samples_leaf[mask_small]]
+
+    # Average path length equals 2*(H(n)-1), with H(n) the nth harmonic number.


maybe put this in a Notes section of the function dosctring?

Thanks for precising it with a comment @Konrad0.

I would also recommend adding a reference with Sphinx for the resource provided bellow.

Great suggestion, I tried to put in a note and a reference. All tests pass, but maybe one of you could check the doocumentation anyway, since for some strange reason Sphinx is crashing on my machine, so I couldn't look at the result.

I don't think sphinx render the docstrings of private functions.

are you suggesting we move it to the top level description? To me, this is a rather minute detail, so I wouldn't expect someone looking to use the isolation forest to be interested in the implementation of the harmonic number calculation...

No this is fine here.

jjerphan

Thanks @Konrad0 for your fix proposal!

Here are a few suggestions.

Can you prefix the title of this PR with FIX, please?

jjerphan · Jul 5, 2021

sklearn/ensemble/_iforest.py

+    average_path_length[mask_small] = _average_path_length_small[
+        n_samples_leaf[mask_small]]
+
+    # Average path length equals 2*(H(n)-1), with H(n) the nth harmonic number.


Thanks for precising it with a comment @Konrad0.

I would also recommend adding a reference with Sphinx for the resource provided bellow.

jjerphan · Jul 5, 2021

sklearn/ensemble/_iforest.py

-    mask_1 = n_samples_leaf <= 1
-    mask_2 = n_samples_leaf == 2
-    not_mask = ~np.logical_or(mask_1, mask_2)
+    mask_small = n_samples_leaf < 52


Suggested change

mask_small = n_samples_leaf < 52

mask_small = n_samples_leaf < len(_average_path_length_small)

More elegant, done.

jjerphan · Jul 5, 2021

sklearn/ensemble/_iforest.py

+    # Powers Of A Triangular Number. JIPAM. J. Inequal. Pure Appl. Math. 9(3),
+    # 89 (2008). https://www.emis.de/journals/JIPAM/article1026.html?sid=1026.
+    # Preprint at https://arxiv.org/abs/0707.3950.


The parameter can be removed.

Suggested change

# Powers Of A Triangular Number. JIPAM. J. Inequal. Pure Appl. Math. 9(3),

# 89 (2008). https://www.emis.de/journals/JIPAM/article1026.html?sid=1026.

# Preprint at https://arxiv.org/abs/0707.3950.

# Powers Of A Triangular Number. JIPAM. J. Inequal. Pure Appl. Math. 9(3),

# 89 (2008). https://www.emis.de/journals/JIPAM/article1026.html.

# Preprint at https://arxiv.org/abs/0707.3950.

jjerphan · Jul 5, 2021

sklearn/ensemble/_iforest.py

-    average_path_length[mask_2] = 1.
+    tmp = 1.0/np.square(n_samples_leaf[not_mask])
    average_path_length[not_mask] = (
-        2.0 * (np.log(n_samples_leaf[not_mask] - 1.0) + np.euler_gamma)
-        - 2.0 * (n_samples_leaf[not_mask] - 1.0) / n_samples_leaf[not_mask]
+        2.0 * (np.log(n_samples_leaf[not_mask]) - 1.0 + np.euler_gamma)
+        + 1.0/n_samples_leaf[not_mask]
+        - tmp*(1.0/6.0 - tmp*(1.0/60.0 - tmp/126.0))


Can you give a better name to tmp? Also how can one find those constants here?

Sure, renamed to n2_inv.

About the second question, the coefficients are from the Euler asymptotic expansion of the harmonic numbers H(n) as found on Wikipedia (https://en.wikipedia.org/wiki/Harmonic_number#Calculation) or in the references provided above. The coefficients just have to be doubled since the path length is 2*(H(n)-1).

jjerphan · Jul 5, 2021

sklearn/ensemble/tests/test_iforest.py

@@ -322,6 +322,7 @@ def test_iforest_with_uniform_data():

    rng = np.random.RandomState(0)

+    assert all(np.abs(iforest.decision_function(X)) < 1.0e-15)


To complete @albertcthomas's remark.

Suggested change

assert all(np.abs(iforest.decision_function(X)) < 1.0e-15)

assert all(np.abs(iforest.decision_function(X)) < np.finfo(float).eps)

As above, I had to double it to 2*np.finfo(float).eps, but hopefully the numerical instability issue will be resolved altogether soon.

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn into iForest_pathlength_small_samples

Konrad0 · Jul 6, 2021

Thanks @albertcthomas and @jjerphan for your reviews!
To answer a couple of questions: the implementation is based on an expansion that can be found in either of the two references or on Wikipedia (link added to comment and notes). The lookup table was calculated separately, it cannot be found in the references.
As mentioned above, a workaround for the numerical instability issue (PRs are now mentioned in the description) was applied here out of necessity, but I'll submit a PR for a proper fix without the need for thresholds like2*np.finfo(float).eps after the path length is working correctly.

albertcthomas · Jul 7, 2021

Thanks @Konrad0, I'll have a look soon.

One important thing: please change the last commit message where you put @albertcthomas and @jjerphan otherwise we'll get intempestive notifications. As this is your last commit this can easily be done with git commit --amend to change the commit message and then force push.

jjerphan · Jul 7, 2021

Note that you can also add co-authors to commits if relevant.

Edit: regarding the mention in the commit title, notification-wise it's fine for me, @Konrad0.

Konrad0 · Jul 7, 2021

Sorry about that, didn't mean to inundate you with useless notifications. Commit message has been changed.

Konrad0 · Aug 2, 2021

Hi @albertcthomas and @jjerphan,

are there any more adjustments you'd suggest? The previous ones should be implemented. Thanks!

jjerphan · Aug 2, 2021

sklearn/ensemble/_iforest.py

+_average_path_length_small = np.array(
+    (
+        0.0,
+        0.0,
+        1.0,
+        1.6666666666666666667,
+        2.1666666666666666667,
+        2.5666666666666666667,
+        2.9000000000000000000,
+        3.1857142857142857143,
+        3.4357142857142857143,
+        3.6579365079365079365,
+        3.8579365079365079365,
+        4.0397546897546897547,
+        4.2064213564213564214,
+        4.3602675102675102675,
+        4.5031246531246531247,
+        4.6364579864579864580,
+        4.7614579864579864580,
+        4.8791050452815158698,
+        4.9902161563926269809,
+        5.0954793142873638230,
+        5.1954793142873638230,
+        5.2907174095254590611,
+        5.3816265004345499702,
+        5.4685830221736804049,
+        5.5519163555070137383,
+        5.6319163555070137383,
+        5.7088394324300906613,
+        5.7829135065041647354,
+        5.8543420779327361640,
+        5.9233075951741154743,
+        5.9899742618407821410,
+        6.0544903908730402055,
+        6.1169903908730402055,
+        6.1775964514791008116,
+        6.2364199808908655175,
+        6.2935628380337226603,
+        6.3491183935892782159,
+        6.4031724476433322699,
+        6.4558040265907006910,
+        6.5070860778727519730,
+        6.5570860778727519730,
+        6.6058665656776300218,
+        6.6534856132966776409,
+        6.6999972412036543850,
+        6.7454517866581998396,
+        6.7898962311026442840,
+        6.8333744919722095014,
+        6.8759276834615712036,
+        6.9175943501282378702,
+        6.9584106766588501151,
+        6.9984106766588501151,
+        7.0376263629333599190,
+    )
+)


Where is this table coming from? Could you generate it instead of hard-coding it?

Having a lookup table here is useful because it's fast and accurate. The corresponding value could be calculated each time from the definition of the harmonic numbers, but especially for the later values, this isn't very efficient. It is significantly slower than the asymptotic expansion, which can be used for larger values, where it becomes sufficiently accurate.
BTW, I'm not really happy about the formatting, but that's what the utility Black generated.

Yes, I agree with your point but I think having it definition be explicitly analytic than having it hard-coded is better for readers and for maintenance.

Probably we can find a common ground combining efficiency and explicitness.

What is the snippet that you have used to generate this table?

Thanks for your suggestion. The values were generated using SymPy to get correct results. One could do something like np.sum(2/np.arange(2, n)), but as you can easily see this would be much slower and may include roundoff errors. I understand your point about readability, so how about we keep the list but document it explicitly with definition of the harmonic numbers and a code snippet explaining how the list can be generated? This way it will be clear to anyone reading the code where the list came from and we also keep the benefit of superior performance and accuracy.

It would be great including the snippet generating this list. 👍

Alright, I added to documentation to the code. The main line to generate the list is [Sum(2/k, (k, 2, max(1, i))).evalf(22).round(19) for i in range(52)], it is mentioned just above the list. Let me know what you think of it, I also added an explanation how the calculation is split for large and small samples.

…nto iForest_pathlength_small_samples

jjerphan · Aug 7, 2021

sklearn/ensemble/_iforest.py

+# Lookup table used below in _average_path_length() for small samples.
+# Since the average path length equals 2*(H(n)-1), with H(n) the nth
+# harmonic number, it can be calculated as Sum(2/k, k=2...n) for n>=2
+# and is zero for n<=1.
+# To achieve correct results to full rounding precision, the list below
+# can be generated using SymPy (https://www.sympy.org) as follows (r is
+# the result)
+#
+# from sympy import Sum, Symbol
+# k = Symbol('k')
+# r = [Sum(2/k, (k, 2, max(1, i))).evalf(22).round(19) for i in range(52)]
+#
 _average_path_length_small = np.array(


Thanks for the snippet!

This seems to have a quadratic complexity, using a cumulative can make it linear. This can be done with numpy easily with a similar runtime.

In [1]: import warnings, numpy as np In [2]: def f(n): ...: # Catch the RuntimeWarning due to the ...: # division by 0 for the first term ...: with warnings.catch_warnings(): ...: warnings.simplefilter("ignore") ...: terms = 2 / np.arange(n) ...: terms[0] = terms[1] = 0. ...: # Compute the series of harmonic number ...: _average_path_length_small = np.cumsum(terms) In [3]: %timeit f(52) 8.57 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [4]: %timeit f(100000) 594 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: def a(): ...: # insert this inline definition of the array In [6]: %timeit a() 3.34 µs ± 7.51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each

Importing warnings at the top of the file:

Suggested change

# Lookup table used below in _average_path_length() for small samples.

# Since the average path length equals 2*(H(n)-1), with H(n) the nth

# harmonic number, it can be calculated as Sum(2/k, k=2...n) for n>=2

# and is zero for n<=1.

# To achieve correct results to full rounding precision, the list below

# can be generated using SymPy (https://www.sympy.org) as follows (r is

# the result)

#

# from sympy import Sum, Symbol

# k = Symbol('k')

# r = [Sum(2/k, (k, 2, max(1, i))).evalf(22).round(19) for i in range(52)]

#

_average_path_length_small = np.array(

# Lookup table used below in _average_path_length() for small samples.

# Since the average path length equals 2*(H(n)-1), with H(n) the nth

# harmonic number, it can be calculated as Sum(2/k, k=2...n) for n>=2

# and is zero for n<=1.

# Catch the RuntimeWarning due to the division by 0 for the first term

with warnings.catch_warnings():

warnings.simplefilter("ignore")

terms = 2 / np.arange(n)

terms[0] = terms[1] = 0.

# Compute the series of harmonic number

_average_path_length_small = np.cumsum(terms)

What do you think?

Hi @jjerphan,
thanks for the suggestion, I appreciate it. Sorry for the misunderstanding, of course I'm aware that the list can be generated in linear time, the code snippet in the comment was written to be as easily understandable as possible, not as an actual implementation. IMO we should keep the list, it's faster as you've shown (can't beat a lookup table on that ;-)) and I honestly don't think your code is more comprehensible to an uninitiated reader. Let me know what you think

I think we need someone else to express their opinion on this point (cc @albertcthomas?). 🙂

Agreed, both solutions would work, we just have different preferences 😄

jjerphan · Aug 7, 2021

sklearn/ensemble/_iforest.py

+    # The path length is determined in different ways depending on
+    # n_samples_leaf. For small values, a lookup table is used, see above for a
+    # more detailed explanation. For large values, an asymptotic expansion is
+    # used as described below.


cmarmo · Dec 13, 2022

Hi @Konrad0 if you are still interested in working on this do you mind fixing conflicts and synchronizing with main?
Thank you so much for your patience and your work so far.

Konrad0 added 2 commits January 1, 2021 16:22

FIX iForest average path length for small samples (issue scikit-learn…

9860306

…#15724) An improved approximation for the average path length in isolation forests is introduced, based on more a accurate harmonic number calculation. The routine _average_path_length() is mostly replaced.

PEP8 cleanup

9f3f617

github-actions bot added the module:ensemble label Jan 1, 2021

Konrad0 changed the title ~~I forest pathlength small samples~~ Isolation forest path length for small samples Jan 1, 2021

Threshold for decision function introduced and tests adjusted

2e81081

Since the decision function is subject to numerical fluctuations, an anomaly threshold of -1e-15 is defined as suggested in issues

Konrad0 mentioned this pull request Jan 4, 2021

Average path length in iForest is inaccurate for small sizes #15724

Open

Base automatically changed from master to main January 22, 2021 10:53

cmarmo closed this Jan 25, 2021

cmarmo reopened this Jan 25, 2021

cmarmo added the Waiting for Reviewer label Jan 25, 2021

Improvements suggested by @david-cortes implemented

44c9c72

Konrad0 added 4 commits January 28, 2021 17:33

small PEP8 adjustment

bec8ccd

Merge remote-tracking branch 'upstream/main' into iForest_pathlength_…

ffacf4d

…small_samples

Added entry to what's new

e43646f

Corrected entry in what's new

0330fcb

albertcthomas reviewed Jul 5, 2021

View reviewed changes

jjerphan requested changes Jul 5, 2021

View reviewed changes

cmarmo removed the Waiting for Reviewer label Jul 6, 2021

Konrad0 added 2 commits July 6, 2021 20:32

Resolved merge conflicts.

f81e131

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn into iForest_pathlength_small_samples

Formatting adjusted by black.

ed967c7

Konrad0 changed the title ~~Isolation forest path length for small samples~~ FIX Isolation forest path length for small samples Jul 6, 2021

Suggestions from review implemented.

5b9edfb

Konrad0 force-pushed the iForest_pathlength_small_samples branch from 782225c to 5b9edfb Compare July 7, 2021 21:05

inline comments corrected

7f9d3cb

jjerphan reviewed Aug 2, 2021

View reviewed changes

Konrad0 added 2 commits August 7, 2021 11:02

better documentation of the path length calculation

4c144fa

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

750eb9b

…nto iForest_pathlength_small_samples

jjerphan reviewed Aug 7, 2021

View reviewed changes

david-cortes mentioned this pull request Oct 28, 2021

Support for isolation forests dmlc/treelite#322

Merged

jjerphan added the Waiting for Reviewer label Nov 15, 2021

thomasjpfan mentioned this pull request Apr 24, 2022

Fix #7141 #16967

Closed

cmarmo removed the Waiting for Reviewer label Dec 13, 2022

cmarmo added Stalled help wanted labels Dec 24, 2022

	mask_small = n_samples_leaf < 52
	mask_small = n_samples_leaf < len(_average_path_length_small)

	assert all(np.abs(iforest.decision_function(X)) < 1.0e-15)
	assert all(np.abs(iforest.decision_function(X)) < np.finfo(float).eps)

Search code, repositories, users, issues, pull requests...

Uh oh!

FIX Isolation forest path length for small samples #19087

Are you sure you want to change the base?

FIX Isolation forest path length for small samples #19087

Uh oh!

Conversation

Konrad0 commented Jan 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix?

Any other comments?

Uh oh!

kyrajeep commented Jan 4, 2021

Uh oh!

Konrad0 commented Jan 4, 2021

Uh oh!

kyrajeep commented Jan 5, 2021

Uh oh!

Konrad0 commented Jan 13, 2021

Uh oh!

Konrad0 commented Jan 24, 2021

Uh oh!

cmarmo commented Jan 25, 2021

Uh oh!

Konrad0 commented Jan 25, 2021

Uh oh!

Konrad0 commented Jan 27, 2021

Uh oh!

david-cortes commented Jan 27, 2021

Uh oh!

Konrad0 commented Jan 27, 2021

Uh oh!

david-cortes commented Jan 27, 2021

Uh oh!

Konrad0 commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmarmo commented Jan 28, 2021

Uh oh!

david-cortes commented Jan 28, 2021

Uh oh!

Konrad0 commented Jan 28, 2021

Uh oh!

Konrad0 commented Jul 4, 2021

Uh oh!

cmarmo commented Jul 5, 2021

Uh oh!

albertcthomas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Konrad0 Jul 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Konrad0 commented Jan 1, 2021 •

edited

Loading

Konrad0 commented Jan 27, 2021 •

edited

Loading

Konrad0 Jul 19, 2021 •

edited

Loading

jjerphan left a comment •

edited

Loading

Konrad0 commented Jul 6, 2021 •

edited

Loading

jjerphan commented Jul 7, 2021 •

edited

Loading