Closed
Description
Describe the bug
LabelPropagation (and LabelSpreading) error out for sparse matrices.
Steps/Code to Reproduce
import sklearn
from scipy.sparse import csr_matrix
from sklearn.datasets import make_classification
from sklearn.semi_supervised import LabelPropagation
print(sklearn.__version__)
X, y = make_classification()
classifier = LabelPropagation(kernel='knn')
classifier.fit(X, y)
y_pred = classifier.predict(X)
X, y = make_classification()
classifier = LabelPropagation(kernel='knn')
classifier.fit(csr_matrix(X), y)
y_pred = classifier.predict(csr_matrix(X))
Expected Results
Sparse case should work as does the dense one.
Actual Results
0.22.2.post1
Traceback (most recent call last):
[...]
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Fix
Changing
X, y = check_X_y(X, y)
in _label_propagation.py line 224 to
X, y = check_X_y(X, y, accept_sparse=['csc', 'csr', 'coo', 'dok',
'bsr', 'lil', 'dia'])
seems to fix the problem for me (BTW: a similar check accepting sparse matrices is done in BaseLabelPropagations predict_proba at line 189). This fix also heals LabelSpreading.