Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DOC Merge plot_svm_margin.py and plot_separating_hyperplane.py into plot_svm_hyperplane_margin.py #31045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
Loading
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
30bf31a
Merged plot_svm_margin.py and plot_separating_hyperplane.py into a si…
SwathiR1999 Mar 21, 2025
b5018ef
Merge branch 'scikit-learn:main' into doc_addlink
SwathiR1999 Mar 21, 2025
4211cca
Formatted code using Black and fixed linting issues with Ruff
SwathiR1999 Mar 21, 2025
cbcffba
Merge branch 'main' into doc_addlink
SwathiR1999 Mar 21, 2025
3116f1f
Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…
SwathiR1999 May 3, 2025
8c1fa62
Merge branch 'doc_addlink' of https://github.com/SwathiR1999/scikit-l…
SwathiR1999 May 3, 2025
4216bec
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
499fc1b
updated conf.py
SwathiR1999 May 3, 2025
7aa4078
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
b50fefa
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
902a787
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
d29225d
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
9fb3c50
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
2de5c47
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
49ee55e
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
2ef1d5f
updated plot_svm_hyperplane_margin.py
SwathiR1999 May 3, 2025
53bd994
Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…
SwathiR1999 May 7, 2025
2257e4f
formatted plot_svm_hyperplane_margin.py
SwathiR1999 May 7, 2025
a730252
formatted plot_svm_hyperplane_margin.py
SwathiR1999 May 7, 2025
bd74817
formatted plot_svm_hyperplane_margin.py
SwathiR1999 May 7, 2025
8b1eb56
Update SVM docs and remove deprecated example
SwathiR1999 May 7, 2025
762644d
modified svm.rst
SwathiR1999 May 7, 2025
8612d2d
modified svm.rst
SwathiR1999 May 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions 3 doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,9 @@ def add_js_css_files(app, pagename, templatename, context, doctree):
"auto_examples/linear_model/plot_sgd_comparison": (
"auto_examples/linear_model/plot_sgd_loss_functions"
),
"auto_examples/svm/plot_svm_margin": (
"auto_examples/svm/plot_svm_hyperplane_margin"
),
}
html_context["redirects"] = redirects
for old_link in redirects:
Expand Down
12 changes: 10 additions & 2 deletions 12 doc/modules/svm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -404,13 +404,14 @@ Tips on Practical Use

* **Setting C**: ``C`` is ``1`` by default and it's a reasonable default
choice. If you have a lot of noisy observations you should decrease it:
decreasing C corresponds to more regularization.
decreasing C corresponds to more regularization (see example below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decreasing C corresponds to more regularization (see example below).
decreasing C corresponds to more regularization.

I think it's better to leave this unchanged. We add links in suitable spots, but these quasi-links just occupy the reader's attention without adding real value.


:class:`LinearSVC` and :class:`LinearSVR` are less sensitive to ``C`` when
it becomes large, and prediction results stop improving after a certain
threshold. Meanwhile, larger ``C`` values will take more time to train,
sometimes up to 10 times longer, as shown in [#3]_.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Let's also leave this unchanged.

* Support Vector Machine algorithms are not scale invariant, so **it
is highly recommended to scale your data**. For example, scale each
attribute on the input vector X to [0,1] or [-1,+1], or standardize it
Expand Down Expand Up @@ -468,6 +469,9 @@ Tips on Practical Use
The ``C`` value that yields a "null" model (all weights equal to zero) can
be calculated using :func:`l1_min_c`.

.. rubric:: Examples

* :ref:`sphx_glr_auto_examples_svm_plot_svm_hyperplane_margin.py`
Comment on lines +472 to +474
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add the reference to the example in the text instead, including a description of what the example is about.


.. _svm_kernels:

Expand Down Expand Up @@ -632,7 +636,11 @@ indicates a perfect prediction. But problems are usually not always perfectly
separable with a hyperplane, so we allow some samples to be at a distance :math:`\zeta_i` from
their correct margin boundary. The penalty term `C` controls the strength of
this penalty, and as a result, acts as an inverse regularization parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this penalty, and as a result, acts as an inverse regularization parameter
this penalty, and as a result, acts as an inverse regularization parameter:

(see note below).
(see the figure below). Also please refer to the note below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(see the figure below). Also please refer to the note below.

I would suggest to simply finish the last sentence with a colon and then show the plot directly.


.. figure:: ../auto_examples/svm/images/sphx_glr_plot_svm_hyperplane_margin_001.png
:align: center
:scale: 75
Comment on lines +641 to +643
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. figure:: ../auto_examples/svm/images/sphx_glr_plot_svm_hyperplane_margin_001.png
:align: center
:scale: 75
.. figure:: ../auto_examples/svm/images/sphx_glr_plot_svm_hyperplane_margin_001.png
:target: ../auto_examples/examples/svm/plot_svm_hyperplane_margin.py
:align: center
:scale: 75

If we add a target line, the plot serves as a link to the example. I think it should be like this.


The dual problem to the primal is

Expand Down
108 changes: 108 additions & 0 deletions 108 examples/svm/plot_svm_hyperplane_margin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""
=========================================================================
SVM: Effect of Regularization (C) on Maximum Margin Separating Hyperplane
=========================================================================

This script demonstrates the concept of maximum margin separating hyperplane
in a two-class separable dataset using a Support Vector Machine (SVM)
with a linear kernel and how different values of `C` influence margin width.
Comment on lines +6 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a short note on what a margin is to make this more beginner-friendly.


- **Small C (e.g., 0.05)**:
- Allows some misclassifications, resulting in wider margin.
- **Moderate C (e.g., 1)**:
- Balances classification accuracy and margin width.
- **Large C (e.g., 1000)**:
- Prioritizes classifying all points correctly, leading to narrower margin.

"""

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
import matplotlib.pyplot as plt

from sklearn import svm
from sklearn.datasets import make_blobs
from sklearn.inspection import DecisionBoundaryDisplay

# %%
# Create 40 separable points
X, y = make_blobs(n_samples=40, centers=2, cluster_std=1.5, random_state=6)

# %%
# Define different values of C to observe its effect on the margin
C_values = [0.05, 1, 1000]

# %%
# Visualize
plt.figure(figsize=(12, 4))
for i, C_val in enumerate(C_values, 1):
clf = svm.SVC(kernel="linear", C=C_val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a random state here, so this example looks the same with every run.

clf.fit(X, y)
y_pred = clf.predict(X)
misclassified = y_pred != y

plt.subplot(1, 3, i)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired, edgecolors="k")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add axis labels. Something simple like "Feature 1" and "Feature 2" would be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved. (Just the "resolved" button doesn't show here.)

# misclassified samples
plt.scatter(
X[misclassified, 0],
X[misclassified, 1],
facecolors="none",
edgecolors="k",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same color (edgecolors="k") for correctly classified and for misclassified points, doesn't make them stand out as expected. Could you fix this please?

s=80,
linewidths=1.5,
label="Misclassified",
)

# plot the decision function
ax = plt.gca()
DecisionBoundaryDisplay.from_estimator(
clf,
X,
plot_method="contour",
colors="k",
levels=[-1, 0, 1],
alpha=0.5,
linestyles=["--", "-", "--"],
ax=ax,
)

# plot support vectors
ax.scatter(
clf.support_vectors_[:, 0],
clf.support_vectors_[:, 1],
s=120,
linewidth=1.5,
facecolors="none",
edgecolors="r",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having a different color for miss-classified samples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved. (Just the "resolved" button doesn't show here.)

label="Support Vectors",
)

plt.title(f"SVM Decision Boundary (C={C_val})")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()

plt.tight_layout()
plt.show()

# %% [markdown]
# - **Small `C` (e.g., 0.01, 0.05)**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have the representation of C (like in: "Small `C`") the same as above, either with or without the backticks everywhere.

# - Use when:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Use when are not rendered correctly.

# - You expect noisy or overlapping data.
# - You can tolerate some misclassification in training.
# - Your priority is better generalization on unseen data.
# - Note:
# - May underfit if the margin is too lenient.
# - **Moderate `C` (e.g., 1)**:
# - Use when:
# - You're unsure about noise levels.
# - You want good balance between margin width and classification accuracy.
# - **Large `C` (e.g., 1000)**:
# - Use when:
# - The data is clean and linearly separable.
# - You want to avoid any training misclassification.
# - Note:
# - May overfit noisy data by trying to classify all samples correctly.
Comment on lines +107 to +108
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not rendered as part of the Large C section, but it should be.

90 changes: 0 additions & 90 deletions 90 examples/svm/plot_svm_margin.py

This file was deleted.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.