-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Improved parallel execution of plot_partial_dependence / partial_dependence #19392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…llelization to _partial_dependence.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @timo-stoettner. This indeed looks like an interesting proposition for parallelisation.
To better conclude on the performances of your solution, I would suggest performing a benchmark between main
and your branch on a varying number of features and jobs.
You can report results in a table and/or graphically here; as an example, see this comment from @thomasjpfan.
input_shape = (20, 3, 3) | ||
data = np.ones(input_shape) | ||
|
||
def shapes_after_split(n_jobs): | ||
return [x.shape for x in | ||
_split_data_for_parallel_execution(data, n_jobs)] | ||
|
||
assert shapes_after_split(None)[0] == input_shape | ||
assert shapes_after_split(2)[0] == (10, 3, 3) | ||
assert shapes_after_split(2)[-1] == (10, 3, 3) | ||
assert shapes_after_split(32)[0] == (1, 3, 3) | ||
assert len(shapes_after_split(32)) == 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test with input_shape
being odd. In that later case:
input_shape = (20, 3, 3) | |
data = np.ones(input_shape) | |
def shapes_after_split(n_jobs): | |
return [x.shape for x in | |
_split_data_for_parallel_execution(data, n_jobs)] | |
assert shapes_after_split(None)[0] == input_shape | |
assert shapes_after_split(2)[0] == (10, 3, 3) | |
assert shapes_after_split(2)[-1] == (10, 3, 3) | |
assert shapes_after_split(32)[0] == (1, 3, 3) | |
assert len(shapes_after_split(32)) == 20 | |
input_shape = (21, 3, 3) | |
data = np.ones(input_shape) | |
def shapes_after_split(n_jobs): | |
return [x.shape for x in | |
_split_data_for_parallel_execution(data, n_jobs)] | |
assert shapes_after_split(None)[0] == input_shape | |
assert shapes_after_split(2)[0] == (10, 3, 3) | |
assert shapes_after_split(2)[-1] == (11, 3, 3) | |
assert len(shapes_after_split(32)) == 21 |
Thanks @jjerphan! I followed your suggestion and ran some more extensive benchmarks. In some cases that I haven't tested before |
Thanks for the proposal @timo-stoettner
as far as I understand this PR splits the data into Parallelizing over samples might be a good idea but indeed we need to figure out in which cases this makes sense. Typically if
and |
Thanks @NicolasHug! Good points, I agree that we should stick to the current parallelization if there are more features / plots than jobs. I've tried that already and that seems to catch at least most cases where my proposal led to a decrease in performance. I'll run some more extensive benchmarks and will update the PR in the next days when I get to it. |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
… based on ratio of n_features / effective_n_jobs
@NicolasHug As suggested I've added different parallelization strategies based on the comparison of |
What does this implement/fix? Explain your changes.
Currently, when specifying
n_jobs
forplot_partial_dependence
, it is parallelized by concurrently computing the partial dependences per feature (by calling the functionpartial_dependence
). When only one feature is to be plotted, specifyingn_jobs
will lead to no performance boost at best. (Or more general, whenn_jobs
> number of features).Apart from the fact that I don't think this is documented anywhere, this can be implemented differently to also provide a speedup when the number of features is smaller than n_jobs.
I moved the parallelization to partial_dependence instead. It divides up the grid for which the values are computed and parallelizes among those different parts of the grid. This has two benefits:
n_jobs
now also leads to a performance boost whenn_jobs
> number of featurespartial_dependence
itself now has a parametern_jobs
and can therefore also be parallelizedBenchmarks
Some results of timeit before and after the changes with different parameter settings (sorted by relative speedup, executed on my 4-core MacBook):
So the modification seems to lead to a speedup in cases when
effective_n_jobs > n_features
and to no significant change in most other cases.The code I used to come up with those benchmarks: