Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 1e392d3

Browse filesBrowse files
ArturoAmorQjeremiedbbglemaitre
authored and
Itay
committed
DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017)
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
1 parent 4e99ac5 commit 1e392d3
Copy full SHA for 1e392d3

File tree

Expand file treeCollapse file tree

1 file changed

+48
-29
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+48
-29
lines changed

‎examples/neighbors/plot_lof_outlier_detection.py

Copy file name to clipboardExpand all lines: examples/neighbors/plot_lof_outlier_detection.py
+48-29Lines changed: 48 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,56 +6,74 @@
66
The Local Outlier Factor (LOF) algorithm is an unsupervised anomaly detection
77
method which computes the local density deviation of a given data point with
88
respect to its neighbors. It considers as outliers the samples that have a
9-
substantially lower density than their neighbors. This example shows how to
10-
use LOF for outlier detection which is the default use case of this estimator
11-
in scikit-learn. Note that when LOF is used for outlier detection it has no
12-
predict, decision_function and score_samples methods. See
13-
:ref:`User Guide <outlier_detection>`: for details on the difference between
14-
outlier detection and novelty detection and how to use LOF for novelty
15-
detection.
16-
17-
The number of neighbors considered (parameter n_neighbors) is typically
18-
set 1) greater than the minimum number of samples a cluster has to contain,
19-
so that other samples can be local outliers relative to this cluster, and 2)
20-
smaller than the maximum number of close by samples that can potentially be
21-
local outliers.
22-
In practice, such information is generally not available, and taking
23-
n_neighbors=20 appears to work well in general.
9+
substantially lower density than their neighbors. This example shows how to use
10+
LOF for outlier detection which is the default use case of this estimator in
11+
scikit-learn. Note that when LOF is used for outlier detection it has no
12+
`predict`, `decision_function` and `score_samples` methods. See the :ref:`User
13+
Guide <outlier_detection>` for details on the difference between outlier
14+
detection and novelty detection and how to use LOF for novelty detection.
15+
16+
The number of neighbors considered (parameter `n_neighbors`) is typically set 1)
17+
greater than the minimum number of samples a cluster has to contain, so that
18+
other samples can be local outliers relative to this cluster, and 2) smaller
19+
than the maximum number of close by samples that can potentially be local
20+
outliers. In practice, such information is generally not available, and taking
21+
`n_neighbors=20` appears to work well in general.
2422
2523
"""
2624

25+
# %%
26+
# Generate data with outliers
27+
# ---------------------------
28+
29+
# %%
2730
import numpy as np
28-
import matplotlib.pyplot as plt
29-
from sklearn.neighbors import LocalOutlierFactor
3031

3132
np.random.seed(42)
3233

33-
# Generate train data
3434
X_inliers = 0.3 * np.random.randn(100, 2)
3535
X_inliers = np.r_[X_inliers + 2, X_inliers - 2]
36-
37-
# Generate some outliers
3836
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
3937
X = np.r_[X_inliers, X_outliers]
4038

4139
n_outliers = len(X_outliers)
4240
ground_truth = np.ones(len(X), dtype=int)
4341
ground_truth[-n_outliers:] = -1
4442

45-
# fit the model for outlier detection (default)
43+
# %%
44+
# Fit the model for outlier detection (default)
45+
# ---------------------------------------------
46+
#
47+
# Use `fit_predict` to compute the predicted labels of the training samples
48+
# (when LOF is used for outlier detection, the estimator has no `predict`,
49+
# `decision_function` and `score_samples` methods).
50+
51+
from sklearn.neighbors import LocalOutlierFactor
52+
4653
clf = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
47-
# use fit_predict to compute the predicted labels of the training samples
48-
# (when LOF is used for outlier detection, the estimator has no predict,
49-
# decision_function and score_samples methods).
5054
y_pred = clf.fit_predict(X)
5155
n_errors = (y_pred != ground_truth).sum()
5256
X_scores = clf.negative_outlier_factor_
5357

54-
plt.title("Local Outlier Factor (LOF)")
58+
# %%
59+
# Plot results
60+
# ------------
61+
62+
# %%
63+
import matplotlib.pyplot as plt
64+
from matplotlib.legend_handler import HandlerPathCollection
65+
66+
67+
def update_legend_marker_size(handle, orig):
68+
"Customize size of the legend marker"
69+
handle.update_from(orig)
70+
handle.set_sizes([20])
71+
72+
5573
plt.scatter(X[:, 0], X[:, 1], color="k", s=3.0, label="Data points")
5674
# plot circles with radius proportional to the outlier scores
5775
radius = (X_scores.max() - X_scores) / (X_scores.max() - X_scores.min())
58-
plt.scatter(
76+
scatter = plt.scatter(
5977
X[:, 0],
6078
X[:, 1],
6179
s=1000 * radius,
@@ -67,7 +85,8 @@
6785
plt.xlim((-5, 5))
6886
plt.ylim((-5, 5))
6987
plt.xlabel("prediction errors: %d" % (n_errors))
70-
legend = plt.legend(loc="upper left")
71-
legend.legendHandles[0]._sizes = [10]
72-
legend.legendHandles[1]._sizes = [20]
88+
plt.legend(
89+
handler_map={scatter: HandlerPathCollection(update_func=update_legend_marker_size)}
90+
)
91+
plt.title("Local Outlier Factor (LOF)")
7392
plt.show()

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.