-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Specialized path collection #16872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Specialized path collection #16872
Conversation
Previously PathIterator has kept vertices and codes as (owned) PyArrayObjects, while after this patch is keeps them as strided memory regions. This allows to generalize the kinds of objects those iterators can be constructed from, opening up opportunities for optimized path collections.
Right now most paths are usually stored as separate objects in regular Python lists. This format is unfortunately very inefficient when the paths are homogenous (i.e. have the same length and codes). This patch adds a custom collection that still allows iteration over individual paths, while also supporting a more efficient code path in the rendering pipeline.
cc @anntzer |
#define INHERIT_CONSTRUCTORS(CLS, DIM) \ | ||
CLS() {} \ | ||
explicit CLS(char *data, npy_intp *strides): StridedMemoryBase<T, DIM>(data, strides) {} \ | ||
explicit CLS(PyArrayObject *array): StridedMemoryBase<T, DIM>(array) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward, but I don't think there's a way around it unless I can use C++11. Are there plans to migrate to C++11 by any chance?
Py_XDECREF(vertices_obj); | ||
Py_XDECREF(codes_obj); | ||
Py_XDECREF(should_simplify_obj); | ||
Py_XDECREF(simplify_threshold_obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look at this code the more I want to write a RAII wrapper for managing Python refcounts. I've seen some potential memory leaks in the PathIterator
code before changes too, so it might be useful to start using it in the whole codebase instead of dealing with it all manually. pybind11
is quite standard for that too, but it requires C++11...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree pybind11 would be nice, I'll mention this during the dev call, this may save a lot of energy on reviewing this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well you don't need all of pybind here, so I could easily write RAII wrappers for refcount management. I just don't want to make this patch too large and it seems like it should undergo a larger discussion. This patch is definitely more consistent with the rest of the code for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even a trivial RAII wrapper (basically the equivalent of pybind11::object
) would be nice as a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Would you rather merge the RAII thing before this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -19,6 +19,59 @@ | ||
from .cbook import _to_unmasked_float_array, simple_linear_interpolation | ||
|
||
|
||
class PathCollection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps another name would avoid confusion with collections.PathCollection
?
A tiny bit of docstring would be nice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, name isn't important for me. I didn't know this one is taken already. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing comes to my mind for now, but we can change this later (before this is merged).
@apaszke were you still interested in this? |
I could finish it in the next few weeks, if there's interest from your side. |
You can probably use C++11 at this point. |
As an update, we now have pybind11 as a build time dependency. |
PR Summary
This PR adds an optimized path collection that can be used to store a set of paths of the same length without the need to create temporary
Path
objects. It still allows iterating over individual paths so that it is pretty much a drop-in replacement, but it also allows for using an optimized path in the renderer.Note that this PR is a bit of a WIP that I'm putting out to gather feedback about the solution. It does introduce a bit of complexity, and I want to explore whether it would be possible to make
Collection._paths
use this data structure more often. In particular, I'm pretty sure it opens up more opportunities for vectorization which don't exist today. As for the performance, after all my previous PRs making a 400x400 surface plot takes around 0.7s, while with this patch applied this time drops to 0.4s. The breakdown now looks like this:Axes3D.plot_surface
(0.04s if colormap is used instead of shading)Poly3DCollection.do_3d_projection
The fact that saving the PNG slowly starts to show up as a measurable portion of the plotting time makes me quite happy, as it means that matplotlib stops being the main reason for slowness as we start hitting some natural performance barriers.
PR Checklist