Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FIX improve error message when no samples are available in mutual information #25192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
Loading
from
Open
4 changes: 4 additions & 0 deletions 4 doc/whats_new/v1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,10 @@ Changelog
:pr:`24935` by :user:`Seladus <seladus>`, :user:`Guillaume Lemaitre <glemaitre>`, and
:user:`Dea María Léon <deamarialeon>`, :pr:`25257` by :user:`Gleb Levitski <glevv>`.

- |Enhancement| Improve error handling in :func:`feature_selection.mutual_info_classif`
that now checks if instances are left after masking of unique labels.
:pr:`25192` by :user:`makoeppel`.

Code and Documentation Contributors
-----------------------------------

Expand Down
9 changes: 8 additions & 1 deletion 9 sklearn/feature_selection/_mutual_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from ..neighbors import NearestNeighbors, KDTree
from ..preprocessing import scale
from ..utils import check_random_state
from ..utils.validation import check_array, check_X_y
from ..utils.validation import check_array, check_X_y, _num_samples
from ..utils.multiclass import check_classification_targets


Expand Down Expand Up @@ -135,6 +135,13 @@ def _compute_mi_cd(c, d, n_neighbors):
c = c[mask]
radius = radius[mask]

if _num_samples(c) == 0:
raise ValueError(
"Found array with 0 samples after masking"
" points with unique labels. Ensure that at least"
" two instances share the same label."
)

kd = KDTree(c)
m_all = kd.query_radius(c, radius, count_only=True, return_distance=False)
m_all = np.array(m_all)
Expand Down
16 changes: 16 additions & 0 deletions 16 sklearn/feature_selection/tests/test_mutual_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,3 +236,19 @@ def test_mutual_information_symmetry_classif_regression(correlated, global_rando
)

assert mi_classif == pytest.approx(mi_regression)


def test_mutual_info_error_handling_for_unique_labels():
"""Check that the correct ValueError is raised when calling `mutual_info_classif`
with only unique labels.
"""

a = [[1, 0, 1], [0, 1, 1]]
b = [0, 1]
err_msg = (
"Found array with 0 samples after masking"
" points with unique labels. Ensure that at least"
" two instances share the same label."
)
with pytest.raises(ValueError, match=err_msg):
mutual_info_classif(a, b)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.