Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit c854b83

Browse filesBrowse files
albertcthomasTomDLTogrisel
authored
[MRG] Linear One-Class SVM using SGD implementation (#10027)
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
1 parent 8110214 commit c854b83
Copy full SHA for c854b83

File tree

12 files changed

+1381
-48
lines changed
Filter options

12 files changed

+1381
-48
lines changed

‎benchmarks/bench_online_ocsvm.py

Copy file name to clipboard
+279Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
"""
2+
=====================================
3+
SGDOneClassSVM benchmark
4+
=====================================
5+
This benchmark compares the :class:`SGDOneClassSVM` with :class:`OneClassSVM`.
6+
The former is an online One-Class SVM implemented with a Stochastic Gradient
7+
Descent (SGD). The latter is based on the LibSVM implementation. The
8+
complexity of :class:`SGDOneClassSVM` is linear in the number of samples
9+
whereas the one of :class:`OneClassSVM` is at best quadratic in the number of
10+
samples. We here compare the performance in terms of AUC and training time on
11+
classical anomaly detection datasets.
12+
13+
The :class:`OneClassSVM` is applied with a Gaussian kernel and we therefore
14+
use a kernel approximation prior to the application of :class:`SGDOneClassSVM`.
15+
"""
16+
17+
from time import time
18+
import numpy as np
19+
20+
from scipy.interpolate import interp1d
21+
22+
from sklearn.metrics import roc_curve, auc
23+
from sklearn.datasets import fetch_kddcup99, fetch_covtype
24+
from sklearn.preprocessing import LabelBinarizer, StandardScaler
25+
from sklearn.pipeline import make_pipeline
26+
from sklearn.utils import shuffle
27+
from sklearn.kernel_approximation import Nystroem
28+
from sklearn.svm import OneClassSVM
29+
from sklearn.linear_model import SGDOneClassSVM
30+
31+
import matplotlib.pyplot as plt
32+
import matplotlib
33+
34+
font = {'weight': 'normal',
35+
'size': 15}
36+
37+
matplotlib.rc('font', **font)
38+
39+
print(__doc__)
40+
41+
42+
def print_outlier_ratio(y):
43+
"""
44+
Helper function to show the distinct value count of element in the target.
45+
Useful indicator for the datasets used in bench_isolation_forest.py.
46+
"""
47+
uniq, cnt = np.unique(y, return_counts=True)
48+
print("----- Target count values: ")
49+
for u, c in zip(uniq, cnt):
50+
print("------ %s -> %d occurrences" % (str(u), c))
51+
print("----- Outlier ratio: %.5f" % (np.min(cnt) / len(y)))
52+
53+
54+
# for roc curve computation
55+
n_axis = 1000
56+
x_axis = np.linspace(0, 1, n_axis)
57+
58+
datasets = ['http', 'smtp', 'SA', 'SF', 'forestcover']
59+
60+
novelty_detection = False # if False, training set polluted by outliers
61+
62+
random_states = [42]
63+
nu = 0.05
64+
65+
results_libsvm = np.empty((len(datasets), n_axis + 5))
66+
results_online = np.empty((len(datasets), n_axis + 5))
67+
68+
for dat, dataset_name in enumerate(datasets):
69+
70+
print(dataset_name)
71+
72+
# Loading datasets
73+
if dataset_name in ['http', 'smtp', 'SA', 'SF']:
74+
dataset = fetch_kddcup99(subset=dataset_name, shuffle=False,
75+
percent10=False, random_state=88)
76+
X = dataset.data
77+
y = dataset.target
78+
79+
if dataset_name == 'forestcover':
80+
dataset = fetch_covtype(shuffle=False)
81+
X = dataset.data
82+
y = dataset.target
83+
# normal data are those with attribute 2
84+
# abnormal those with attribute 4
85+
s = (y == 2) + (y == 4)
86+
X = X[s, :]
87+
y = y[s]
88+
y = (y != 2).astype(int)
89+
90+
# Vectorizing data
91+
if dataset_name == 'SF':
92+
# Casting type of X (object) as string is needed for string categorical
93+
# features to apply LabelBinarizer
94+
lb = LabelBinarizer()
95+
x1 = lb.fit_transform(X[:, 1].astype(str))
96+
X = np.c_[X[:, :1], x1, X[:, 2:]]
97+
y = (y != b'normal.').astype(int)
98+
99+
if dataset_name == 'SA':
100+
lb = LabelBinarizer()
101+
# Casting type of X (object) as string is needed for string categorical
102+
# features to apply LabelBinarizer
103+
x1 = lb.fit_transform(X[:, 1].astype(str))
104+
x2 = lb.fit_transform(X[:, 2].astype(str))
105+
x3 = lb.fit_transform(X[:, 3].astype(str))
106+
X = np.c_[X[:, :1], x1, x2, x3, X[:, 4:]]
107+
y = (y != b'normal.').astype(int)
108+
109+
if dataset_name in ['http', 'smtp']:
110+
y = (y != b'normal.').astype(int)
111+
112+
print_outlier_ratio(y)
113+
114+
n_samples, n_features = np.shape(X)
115+
if dataset_name == 'SA': # LibSVM too long with n_samples // 2
116+
n_samples_train = n_samples // 20
117+
else:
118+
n_samples_train = n_samples // 2
119+
120+
n_samples_test = n_samples - n_samples_train
121+
print('n_train: ', n_samples_train)
122+
print('n_features: ', n_features)
123+
124+
tpr_libsvm = np.zeros(n_axis)
125+
tpr_online = np.zeros(n_axis)
126+
fit_time_libsvm = 0
127+
fit_time_online = 0
128+
predict_time_libsvm = 0
129+
predict_time_online = 0
130+
131+
X = X.astype(float)
132+
133+
gamma = 1 / n_features # OCSVM default parameter
134+
135+
for random_state in random_states:
136+
137+
print('random state: %s' % random_state)
138+
139+
X, y = shuffle(X, y, random_state=random_state)
140+
X_train = X[:n_samples_train]
141+
X_test = X[n_samples_train:]
142+
y_train = y[:n_samples_train]
143+
y_test = y[n_samples_train:]
144+
145+
if novelty_detection:
146+
X_train = X_train[y_train == 0]
147+
y_train = y_train[y_train == 0]
148+
149+
std = StandardScaler()
150+
151+
print('----------- LibSVM OCSVM ------------')
152+
ocsvm = OneClassSVM(kernel='rbf', gamma=gamma, nu=nu)
153+
pipe_libsvm = make_pipeline(std, ocsvm)
154+
155+
tstart = time()
156+
pipe_libsvm.fit(X_train)
157+
fit_time_libsvm += time() - tstart
158+
159+
tstart = time()
160+
# scoring such that the lower, the more normal
161+
scoring = -pipe_libsvm.decision_function(X_test)
162+
predict_time_libsvm += time() - tstart
163+
fpr_libsvm_, tpr_libsvm_, _ = roc_curve(y_test, scoring)
164+
165+
f_libsvm = interp1d(fpr_libsvm_, tpr_libsvm_)
166+
tpr_libsvm += f_libsvm(x_axis)
167+
168+
print('----------- Online OCSVM ------------')
169+
nystroem = Nystroem(gamma=gamma, random_state=random_state)
170+
online_ocsvm = SGDOneClassSVM(nu=nu, random_state=random_state)
171+
pipe_online = make_pipeline(std, nystroem, online_ocsvm)
172+
173+
tstart = time()
174+
pipe_online.fit(X_train)
175+
fit_time_online += time() - tstart
176+
177+
tstart = time()
178+
# scoring such that the lower, the more normal
179+
scoring = -pipe_online.decision_function(X_test)
180+
predict_time_online += time() - tstart
181+
fpr_online_, tpr_online_, _ = roc_curve(y_test, scoring)
182+
183+
f_online = interp1d(fpr_online_, tpr_online_)
184+
tpr_online += f_online(x_axis)
185+
186+
tpr_libsvm /= len(random_states)
187+
tpr_libsvm[0] = 0.
188+
fit_time_libsvm /= len(random_states)
189+
predict_time_libsvm /= len(random_states)
190+
auc_libsvm = auc(x_axis, tpr_libsvm)
191+
192+
results_libsvm[dat] = ([fit_time_libsvm, predict_time_libsvm,
193+
auc_libsvm, n_samples_train,
194+
n_features] + list(tpr_libsvm))
195+
196+
tpr_online /= len(random_states)
197+
tpr_online[0] = 0.
198+
fit_time_online /= len(random_states)
199+
predict_time_online /= len(random_states)
200+
auc_online = auc(x_axis, tpr_online)
201+
202+
results_online[dat] = ([fit_time_online, predict_time_online,
203+
auc_online, n_samples_train,
204+
n_features] + list(tpr_libsvm))
205+
206+
207+
# -------- Plotting bar charts -------------
208+
fit_time_libsvm_all = results_libsvm[:, 0]
209+
predict_time_libsvm_all = results_libsvm[:, 1]
210+
auc_libsvm_all = results_libsvm[:, 2]
211+
n_train_all = results_libsvm[:, 3]
212+
n_features_all = results_libsvm[:, 4]
213+
214+
fit_time_online_all = results_online[:, 0]
215+
predict_time_online_all = results_online[:, 1]
216+
auc_online_all = results_online[:, 2]
217+
218+
219+
width = 0.7
220+
ind = 2 * np.arange(len(datasets))
221+
x_tickslabels = [(name + '\n' + r'$n={:,d}$' + '\n' + r'$d={:d}$')
222+
.format(int(n), int(d))
223+
for name, n, d in zip(datasets, n_train_all, n_features_all)]
224+
225+
226+
def autolabel_auc(rects, ax):
227+
"""Attach a text label above each bar displaying its height."""
228+
for rect in rects:
229+
height = rect.get_height()
230+
ax.text(rect.get_x() + rect.get_width() / 2., 1.05 * height,
231+
'%.3f' % height, ha='center', va='bottom')
232+
233+
234+
def autolabel_time(rects, ax):
235+
"""Attach a text label above each bar displaying its height."""
236+
for rect in rects:
237+
height = rect.get_height()
238+
ax.text(rect.get_x() + rect.get_width() / 2., 1.05 * height,
239+
'%.1f' % height, ha='center', va='bottom')
240+
241+
242+
fig, ax = plt.subplots(figsize=(15, 8))
243+
ax.set_ylabel('AUC')
244+
ax.set_ylim((0, 1.3))
245+
rect_libsvm = ax.bar(ind, auc_libsvm_all, width=width, color='r')
246+
rect_online = ax.bar(ind + width, auc_online_all, width=width, color='y')
247+
ax.legend((rect_libsvm[0], rect_online[0]), ('LibSVM', 'Online SVM'))
248+
ax.set_xticks(ind + width / 2)
249+
ax.set_xticklabels(x_tickslabels)
250+
autolabel_auc(rect_libsvm, ax)
251+
autolabel_auc(rect_online, ax)
252+
plt.show()
253+
254+
255+
fig, ax = plt.subplots(figsize=(15, 8))
256+
ax.set_ylabel('Training time (sec) - Log scale')
257+
ax.set_yscale('log')
258+
rect_libsvm = ax.bar(ind, fit_time_libsvm_all, color='r', width=width)
259+
rect_online = ax.bar(ind + width, fit_time_online_all, color='y', width=width)
260+
ax.legend((rect_libsvm[0], rect_online[0]), ('LibSVM', 'Online SVM'))
261+
ax.set_xticks(ind + width / 2)
262+
ax.set_xticklabels(x_tickslabels)
263+
autolabel_time(rect_libsvm, ax)
264+
autolabel_time(rect_online, ax)
265+
plt.show()
266+
267+
268+
fig, ax = plt.subplots(figsize=(15, 8))
269+
ax.set_ylabel('Testing time (sec) - Log scale')
270+
ax.set_yscale('log')
271+
rect_libsvm = ax.bar(ind, predict_time_libsvm_all, color='r', width=width)
272+
rect_online = ax.bar(ind + width, predict_time_online_all,
273+
color='y', width=width)
274+
ax.legend((rect_libsvm[0], rect_online[0]), ('LibSVM', 'Online SVM'))
275+
ax.set_xticks(ind + width / 2)
276+
ax.set_xticklabels(x_tickslabels)
277+
autolabel_time(rect_libsvm, ax)
278+
autolabel_time(rect_online, ax)
279+
plt.show()

‎doc/modules/classes.rst

Copy file name to clipboardExpand all lines: doc/modules/classes.rst
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -762,6 +762,7 @@ Linear classifiers
762762
linear_model.RidgeClassifier
763763
linear_model.RidgeClassifierCV
764764
linear_model.SGDClassifier
765+
linear_model.SGDOneClassSVM
765766

766767
Classical linear regressors
767768
---------------------------

‎doc/modules/outlier_detection.rst

Copy file name to clipboardExpand all lines: doc/modules/outlier_detection.rst
+27-5Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,14 @@ does not perform very well for outlier detection. That being said, outlier
110110
detection in high-dimension, or without any assumptions on the distribution
111111
of the inlying data is very challenging. :class:`svm.OneClassSVM` may still
112112
be used with outlier detection but requires fine-tuning of its hyperparameter
113-
`nu` to handle outliers and prevent overfitting. Finally,
114-
:class:`covariance.EllipticEnvelope` assumes the data is Gaussian and learns
115-
an ellipse. For more details on the different estimators refer to the example
113+
`nu` to handle outliers and prevent overfitting.
114+
:class:`linear_model.SGDOneClassSVM` provides an implementation of a
115+
linear One-Class SVM with a linear complexity in the number of samples. This
116+
implementation is here used with a kernel approximation technique to obtain
117+
results similar to :class:`svm.OneClassSVM` which uses a Gaussian kernel
118+
by default. Finally, :class:`covariance.EllipticEnvelope` assumes the data is
119+
Gaussian and learns an ellipse. For more details on the different estimators
120+
refer to the example
116121
:ref:`sphx_glr_auto_examples_miscellaneous_plot_anomaly_comparison.py` and the
117122
sections hereunder.
118123

@@ -173,6 +178,23 @@ but regular, observation outside the frontier.
173178
:scale: 75%
174179

175180

181+
Scaling up the One-Class SVM
182+
----------------------------
183+
184+
An online linear version of the One-Class SVM is implemented in
185+
:class:`linear_model.SGDOneClassSVM`. This implementation scales linearly with
186+
the number of samples and can be used with a kernel approximation to
187+
approximate the solution of a kernelized :class:`svm.OneClassSVM` whose
188+
complexity is at best quadratic in the number of samples. See section
189+
:ref:`sgd_online_one_class_svm` for more details.
190+
191+
.. topic:: Examples:
192+
193+
* See :ref:`sphx_glr_auto_examples_linear_model_plot_sgdocsvm_vs_ocsvm.py`
194+
for an illustration of the approximation of a kernelized One-Class SVM
195+
with the `linear_model.SGDOneClassSVM` combined with kernel approximation.
196+
197+
176198
Outlier Detection
177199
=================
178200

@@ -278,8 +300,8 @@ allows you to add more trees to an already fitted model::
278300
for a comparison of :class:`ensemble.IsolationForest` with
279301
:class:`neighbors.LocalOutlierFactor`,
280302
:class:`svm.OneClassSVM` (tuned to perform like an outlier detection
281-
method) and a covariance-based outlier detection with
282-
:class:`covariance.EllipticEnvelope`.
303+
method), :class:`linear_model.SGDOneClassSVM`, and a covariance-based
304+
outlier detection with :class:`covariance.EllipticEnvelope`.
283305

284306
.. topic:: References:
285307

‎doc/modules/sgd.rst

Copy file name to clipboardExpand all lines: doc/modules/sgd.rst
+52Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,58 @@ For regression with a squared loss and a l2 penalty, another variant of
232232
SGD with an averaging strategy is available with Stochastic Average
233233
Gradient (SAG) algorithm, available as a solver in :class:`Ridge`.
234234

235+
.. _sgd_online_one_class_svm:
236+
237+
Online One-Class SVM
238+
====================
239+
240+
The class :class:`sklearn.linear_model.SGDOneClassSVM` implements an online
241+
linear version of the One-Class SVM using a stochastic gradient descent.
242+
Combined with kernel approximation techniques,
243+
:class:`sklearn.linear_model.SGDOneClassSVM` can be used to approximate the
244+
solution of a kernelized One-Class SVM, implemented in
245+
:class:`sklearn.svm.OneClassSVM`, with a linear complexity in the number of
246+
samples. Note that the complexity of a kernelized One-Class SVM is at best
247+
quadratic in the number of samples.
248+
:class:`sklearn.linear_model.SGDOneClassSVM` is thus well suited for datasets
249+
with a large number of training samples (> 10,000) for which the SGD
250+
variant can be several orders of magnitude faster.
251+
252+
Its implementation is based on the implementation of the stochastic
253+
gradient descent. Indeed, the original optimization problem of the One-Class
254+
SVM is given by
255+
256+
.. math::
257+
258+
\begin{aligned}
259+
\min_{w, \rho, \xi} & \quad \frac{1}{2}\Vert w \Vert^2 - \rho + \frac{1}{\nu n} \sum_{i=1}^n \xi_i \\
260+
\text{s.t.} & \quad \langle w, x_i \rangle \geq \rho - \xi_i \quad 1 \leq i \leq n \\
261+
& \quad \xi_i \geq 0 \quad 1 \leq i \leq n
262+
\end{aligned}
263+
264+
where :math:`\nu \in (0, 1]` is the user-specified parameter controlling the
265+
proportion of outliers and the proportion of support vectors. Getting rid of
266+
the slack variables :math:`\xi_i` this problem is equivalent to
267+
268+
.. math::
269+
270+
\min_{w, \rho} \frac{1}{2}\Vert w \Vert^2 - \rho + \frac{1}{\nu n} \sum_{i=1}^n \max(0, \rho - \langle w, x_i \rangle) \, .
271+
272+
Multiplying by the constant :math:`\nu` and introducing the intercept
273+
:math:`b = 1 - \rho` we obtain the following equivalent optimization problem
274+
275+
.. math::
276+
277+
\min_{w, b} \frac{\nu}{2}\Vert w \Vert^2 + b\nu + \frac{1}{n} \sum_{i=1}^n \max(0, 1 - (\langle w, x_i \rangle + b)) \, .
278+
279+
This is similar to the optimization problems studied in section
280+
:ref:`sgd_mathematical_formulation` with :math:`y_i = 1, 1 \leq i \leq n` and
281+
:math:`\alpha = \nu/2`, :math:`L` being the hinge loss function and :math:`R`
282+
being the L2 norm. We just need to add the term :math:`b\nu` in the
283+
optimization loop.
284+
285+
As :class:`SGDClassifier` and :class:`SGDRegressor`, :class:`SGDOneClassSVM`
286+
supports averaged SGD. Averaging can be enabled by setting ``average=True``.
235287

236288
Stochastic Gradient Descent for sparse data
237289
===========================================

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.