Description
As reported here: https://skops.readthedocs.io/en/latest/auto_examples/plot_intelex.html
For small n_features
, KNeighborsClassifier
uses either BallTree of KD-tree algorithms. At this time, scikit-learn builds this tree sequentially.
We could recursively build the branches in parallel using a ThreadPoolExecutor
instance and call submit for each children of a given node. Note that each node splitting operation (starting from the root) is one Cython function call that releases the GIL, so using thread-based parallelism should be efficient.
The second speed-up reported in plot_intelex.html above (at inference time) is mostly linked to the fact that scikit-learn KNeighborsClassifier
does not enable parallel neighbors queries with the default value of n_jobs
when using the kd-tree or ball-tree algorithms.
From a user's point of view, this is not ideal either: when using the brute-force algorithm, we rely on OpenMP and automatically use as many threads as CPUs available.