Description
Context
scikit-learn builds with Cython, C and C++ warnings (when building wheels or for when installing locally (for development)). There are several kinds of warnings, each kind having its own cause, solutions and mitigations.
🏷 Use of the deprecated NumPy API (via Cython) (-Wcpp
)
Some Cython implementations might use old Cython idiomatic constructs or old symbols of NumPy directly via NumPy's Cython API (generally used via cnp
from cimport numpy as cnp
) which are deprecated.
Examples of such symbols non-exhaustively involves:
- the
cnp.ndarray
- some constants (see this enum for instance)
Such warnings are not raised compiling Cython code, but are when compiling C and C++ code, generated by Cython (or even in standalone implementations using the NumPy C API directly).
🧰 Resolutions involves:
-
using newest Cython idiomatic constructs (notably memoryviews) and up-to-date symbols of NumPy via NumPy's Cython API
-
adding extensions in the
USE_NEWEST_NUMPY_C_API
list (so that it#define
macros indicating not to use of old deprecated symbols from NumPy)
Lines 64 to 70 in 3e47fa9
-
at least the following Cython extensions:
-
sklearn.cluster._dbscan_inner
MAINT Remove -Wcpp warnings when compiling sklearn.cluster._dbscan_inner #24881 -
sklearn.cluster._hierarchical_fast
MAINT Remove -Wcpp warnings when compiling sklearn.cluster._hierarchical_fast #24914 -
sklearn.linear_model._sgd_fast
MAINT Remove -Wcpp warnings when compiling sklearn.linear_model._sgd_fast #24883 -
sklearn.utils.murmurhash
MAINT Remove -Wcpp warnings when compilingsklearn.utils.murmurhash
#24958 -
sklearn.ensemble._gradient_boosting
MAINT Remove -Wcpp warnings when compiling sklearn.ensemble._gradient_boosting #25033 -
sklearn.manifold._utils
MAINT Remove-Wcpp
warnings when compilingsklearn.manifold._utils
#24925 -
sklearn.neighbors._ball_tree
MAINT Remove -Wcpp warnings when compiling_kd_tree
and_ball_tree
#24965 -
sklearn.neighbors._kd_tree
MAINT Remove -Wcpp warnings when compiling_kd_tree
and_ball_tree
#24965 -
sklearn.linear_model._sag_fast
MAINT Remove -Wcpp warnings when compiling sklearn.linear_model._sag_fast #24975 -
sklearn.svm._libsvm
MAINT Remove -Wcpp warnings when compiling sklearn.svm._libsvm #25064 -
sklearn.svm._liblinear
MAINT Remove -Wcpp warnings when compiling sklearn.svm._liblinear #25112 -
sklearn.svm._libsvm_sparse
MAINT Remove -Wcpp warnings when compiling sklearn.svm._libsvm #25064 -
sklearn.decomposition._online_lda_fast
MAINT Remove -Wcpp warnings when compiling sklearn.decomposition._online_lda_fast #25020 -
sklearn.preprocessing._csr_polynomial_expansion
MAINT remove -Wcpp warnings when compiling sklearn.preprocessing._csr_polynomial_expansion #25041 -
sklearn.tree._tree
MAINT Replace cnp.ndarray with memory views in sklearn.tree._tree (where possible) #25540 -
sklearn.tree._criterion
(this one is hard) MAINT Use newest NumPy C API in tree._criterion #25615 -
sklearn.utils.arrayfuncs
MAINT Remove -Wcpp warnings when compiling arrayfuncs #25415 -
sklearn.neighbors._quad_tree
MAINT Remove -Wsign-compare warnings when compiling sklearn.neighbors._quad_tree #25271 -
sklearn.utils.sparsefuncs_fast
MAINT remove -Wsign-compare when compilingsklearn.utils.sparsefuncs_fast
#25244 -
sklearn.linear_model._cd_fast
MAINT Remove -Wsign-compare when compiling sklearn.linear_model._cd_fast #24895 -
sklearn.metrics._dist_metrics
MAINT Use newest NumPy C API in metrics._dist_metrics #25702 -
sklearn.utils._seq_dataset
-
Example of such warning in context
building 'sklearn.cluster._dbscan_inner' extension
creating /tmp/tmpuptw2_72.build-temp/sklearn/cluster
gcc -pthread -B /home/jjerphan/.local/share/miniconda3/envs/sk/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/jjerphan/.local/share/miniconda3/envs/sk/include -fPIC -O2 -isystem /home/jjerphan/.local/share/miniconda3/envs/sk/include -fPIC -I/home/jjerphan/.local/share/miniconda3/envs/sk/lib/python3.11/site-packages/numpy/core/include -I/home/jjerphan/.local/share/miniconda3/envs/sk/include/python3.11 -c sklearn/cluster/_dbscan_inner.cpp -o /tmp/tmpuptw2_72.build-temp/sklearn/cluster/_dbscan_inner.o -O3 -fopenmp
In file included from /home/jjerphan/.local/share/miniconda3/envs/sk/lib/python3.11/site-packages/numpy/core/include/numpy/ndarraytypes.h:1948,
from /home/jjerphan/.local/share/miniconda3/envs/sk/lib/python3.11/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/jjerphan/.local/share/miniconda3/envs/sk/lib/python3.11/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from sklearn/cluster/_dbscan_inner.cpp:794:
/home/jjerphan/.local/share/miniconda3/envs/sk/lib/python3.11/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
🏷 Discarded const
-qualification (-Wdiscarded-qualifiers
)
In some implementation, some variables are being defined using const
-qualified variables.
For instance in C:
// a is const-qualified and can't be modified then
const int a = 5;
// The const-qualification of a is discarded for the definition of b.
// b is not const-qualified and thus can be modified.
int b = a;
// Note that modifying b does not modify a.
// This would if a and b were pointers to objects
// (hence the reason for this kind of warning).
While most of such definitions aren't harmful, some might be and compilers generally raise warning when the const
-qualifications are discarded.
🧰 Resolutions involves:
const
qualifying variables (like b) which currently aren't
Example of such warning in context
sklearn/metrics/_dist_metrics.c: In function ‘__pyx_pf_7sklearn_7metrics_13_dist_metrics_14DistanceMetric_22_pairwise_sparse_dense’:
sklearn/metrics/_dist_metrics.c:9143:29: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
9143 | __pyx_v_x2_data = ((&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_Y_data.data + __pyx_t_14 * __pyx_v_Y_data.strides[0]) )) + __pyx_t_18)) )))) + (__pyx_v_i2 * __pyx_v_n_features));
| ^
🏷 Comparison on different integral-typed values (Wsign-compare
) and integer overflow (-Woverflow
)
Various integral dtypes (namely (unsigned) {short,int,long}
, an NumPy aliases i.e. cnp.int*
, size
, Py_ssize_t
, other custom aliases in scikit-learn, etc.) are used in scikit-learn.
In Cython and C and C++ implementation, it is possible that binary operations (comparison, assignation, etc.) are applied on variable having different dtypes. Some of
While such instructions aren't harmful, some might be and compilers generally raise warning in their presence.
🧰 Resolutions involves:
- adapting type so that both values have the same one
- explicitly casting some variables' value to other temporary variables of the same type
- reworking the use of fused types which generate specialised versions of functions for various combinations of types
Example of such warning in context
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpp: In function ‘void __pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_17_radius_neighbors_17RadiusNeighbors64_compute_exact_distances(__pyx_obj_7sklearn_7metrics_29_pairwise_distances_reduction_17_radius_neighbors_RadiusNeighbors64*)’:
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpp:6285:59: warning: comparison of integer expressions of different signedness: ‘__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t’ {aka ‘long int’} and ‘std::vector<long int>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
6285 | for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
| ~~~~~~~~~~^~~~~~~~~~~
🏷 Incompatible pointers types casts (-Wincompatible-pointer-types
) (Difficulty level: 🧙)
On some Cython class, polymorphism implementations apparently cast to a incompatible pointers type.
This runs and works fine but this is a code smell which yet is to be inspected.
This problems mainly is present for the DistanceMetric
and BinaryTree
class hierarchies.
A similar problem has been reported in cython/cython#2747.
🧰 Resolutions involves:
- experimenting with the
@final
Cython decorator forDistanceMetrics
' methods' - understanding how polymorphism is implemented
- changing
DistanceMetrics
'' methods' signature
Example of such warning in context
sklearn/metrics/_dist_metrics.c: In function ‘__pyx_f_7sklearn_7metrics_13_dist_metrics_17EuclideanDistance_dist_csr’:
sklearn/metrics/_dist_metrics.c:11320:88: warning: passing argument 1 of ‘__pyx_f_7sklearn_7metrics_13_dist_metrics_17EuclideanDistance_rdist_csr’ from incompatible pointer type [-Wincompatible-pointer-types]
11320 | __pyx_t_1 = __pyx_f_7sklearn_7metrics_13_dist_metrics_17EuclideanDistance_rdist_csr(((struct __pyx_obj_7sklearn_7metrics_13_dist_metrics_DistanceMetric *)__pyx_v_self), __pyx_v_x1_data, __pyx_v_x1_indices, __pyx_v_x2_data, __pyx_v_x2_indices, __pyx_v_x1_start, __pyx_v_x1_end, __pyx_v_x2_start, __pyx_v_x2_end, __pyx_v_size); if (unlikely(__pyx_t_1 == ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t)-1.0))) __PYX_ERR(1, 927, __pyx_L1_error)
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| struct __pyx_obj_7sklearn_7metrics_13_dist_metrics_DistanceMetric *
Proposed solution: treat one kind of warning for one file/extension at a time
-
Only perform changes necessary to remove one kind of warning (🏷) for one file or one Cython extension at a time using information given in 🧰.
- ⚠ It might not be possible to remove all warnings at once
- ⚠ Changes might not be trivial or might necessitate adaption. In this case, please do explain the logic.
-
Create PRs for each warning with the following title:
MAINT Remove `-Wkind` warnings when compiling `source_file.ext`
where:
kind
is the kind of warning (🏷), here represented as the formatted string in the subtitles, i.e. an element of (cpp
,incompatible-pointer-types
,sign-compare
,overflow
)source_file.ext
is the file being treated
-
To be discussed among maintainers: once a kind of warnings as been entirely removed, turn on strict compilations compilation flags to raise error for this kind of warning.
Miscellaneous
Example of compilation trace
Examples of warnings in their context stem from a verbose trace when building scikit-learn locally.
To see this full development installation trace (as of 2e481f1), see this gist.
Filtering compilation trace
You can compile the whole project and filter the compilation trace for some module or source, using grep
for instance. Note that stderr
better be redirected to stdout
to have all the information.
For instance, to perform this for sklearn.linear_model._cd_fast
, you can running in the scikit-learn's worktree (using -C
to have 6 surroundings lines in the context of matchings):
pip install --verbose --editable . 2>&1 | grep -C 3 "sklearn.linear_model._cd_fast"