-
Notifications
You must be signed in to change notification settings - Fork 28
Add iterator to memoryview #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits and a bugfix.
Your next step sounds right -- "hoist" the checks that validate the object each time. I would think you can start by adding members to the memoryiterobject struct to hold the length and the fmt code. After that we can look into inlining ptr_from_index (which is mostly lookup_dimension). I see all kinds of redundant checks there.
In the meantime, let's keep the eye on the ball.
I used this benchmark, two lines of shell code:
./python.exe -m timeit -s "a=memoryview(b'x'*1000000)" "for x in a: pass"
./python.exe -m timeit -s "a=b'x'*1000000" "for x in a: pass"
Right now memoryview is about 11% slower than bytes.
Objects/memoryobject.c
Outdated
The function is used by memory_subscript(), memoryiter_next() and | ||
memory_as_sequence. | ||
Note: Iteration of multi-dimensional arrays is not yet implemented | ||
(as of Python 3.10.0a0) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the version -- whoever implements it should remove the comment. :-)
Objects/memoryobject.c
Outdated
} | ||
if (seq->view.ndim != 1) { | ||
PyErr_SetString(PyExc_NotImplementedError, | ||
"multi-dimensional sub-views are not implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return NULL;
-- the PyErr...
function doesn't automagically return!
You may be able to save a few usec by moving this check to memory_iter.
(Note that the error also gets invoked for a 0-dimensional memoryview object. In the past that would give TypeError: invalid indexing of 0-dim memory
, so still an error. But not really "multi-dimensional". :-)
Objects/memoryobject.c
Outdated
@@ -3200,4 +3289,4 @@ PyTypeObject PyMemoryView_Type = { | ||
0, /* tp_init */ | ||
0, /* tp_alloc */ | ||
memoryview, /* tp_new */ | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the last line of the file ends in a newline character! (Most editors have a setting to make sure of this.)
The last 2 commits address the issues you raised. If everything checks out for you at this stage, I look forward to continue working on this patch and try to close the performance difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Now let's optimize. Do you have ideas where to start?
I think it is worth to try out the suggestions you made above:
I actually just tried the first part and there was no noticeable difference in comparison to the actual implementation. Another idea could be to try and make a new lookup_dimension(), which would be called directly (instead of from ptr_to_index() ), which could do without some of the existent checks that the current version has. What do you think? Any other ideas? |
Just try everything and see what makes a difference.--
--Guido (mobile)
|
Ok, I will work on it tonight. |
Hi Guido, Please find this first attempt at optimizing our patch. It might be possible to push it further but I would like to hear your thoughts on this current version before proceeding. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving along nicely!
Objects/memoryobject.c
Outdated
Py_ssize_t it_index; | ||
PyMemoryViewObject *it_seq; // Set to NULL when iterator is exhausted | ||
Py_ssize_t length; | ||
const char *fmt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot the it_
prefix here. :-)
Objects/memoryobject.c
Outdated
{ | ||
Py_ssize_t nitems = view->shape[0]; | ||
|
||
// handle negative lookups (i.e [-2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you even handle these? In ...iter_next
the index is always 0 or greater. You can replace this check with an assert.
Objects/memoryobject.c
Outdated
if (index < 0) { | ||
index += nitems; | ||
} | ||
if (index < 0 || index >= nitems) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since your only caller already checks that the index is within bounds, you could skip the index >= nitems
check as well. And then you don't even need the nitems
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honest mistake - was reading lookup_dimension() and forgot that I didn't need those extra checks coming from memoryiter_next()
Objects/memoryobject.c
Outdated
return 0; | ||
} | ||
|
||
static char * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this function inline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and it didn't add any noticeable performance improvement, however "inlining" the function myself did yield a very noticeable performance improvement as iterating over memoryview is now faster than iterating over bytes, when compilling without optimizations enabled (i.e debug mode).
Do you have an explanation as to why enabling optimizations seems to have less effect on memoryview in comparison to bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not faster than bytes for me yet, but it's definitely the fastest I've seen it. On my platform the time went from 12.7 to 9.96 msec (looping over a memoryview 1,000,000 of bytes.) Congratulations!
The main overhead that --with-pydebug
adds is not so much compiler optimizations as extra asserts. E.g. remember the assert you added to your GET_SIZE
macro? Probably the memoryview code still has more asserts than the bytes iterator.
A fun experiment: if the fmt code is 'B'
and the data is contiguous, inline even more (you can just call PyLong_FromLong
, see unpack_single
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All credit goes to you. Thank you once again for all the time you are taking to teach me, I can't wait to learn more from you and I couldn't be more grateful.
I will try your idea after dinner and report on it. On that note, wouldn't we be able to use that same hack for all the different formats, if we could infer that the data is contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you don't want to duplicate all the logic from unpack_single()
. In the end, who's going to care whether iterating over a memoryview takes 10 nsec or 11 nsec per iteration. For yourself, what's the most important data type when you're iterating over a numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float.
I tried the idea you mentioned above and my specific implementation (which might be non-optimal) made things significantly slower.
Back to the current commit, when compiling with optimizations enabled, iterating over bytes is still ~30% faster than iterating over memoryview on my setup, however
with debug enabled, memoryview is ~1% faster.
Do you see any further optimizations, besides the one you mentioned, that could be done to try and close the difference between bytes and memoryview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we should leave well enough alone. This is only a micro-optimization anyway -- nobody's going to care about any further optimizations, but the core devs would object against the added complexity of the code (e.g. if someone were to change the implementation of ptr_from_index
or lookup_dimension
they would have to remember to also update memoryiter_next
.
i have one further suggestion before we move on to bigger and better things. If you move PyMemoryIter_Type
after the various functions you won't need to add those forward declarations. This is how things are traditionally written. (For example, this is why PyMemoryView_Type
is at the end of the file.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Guido,
I pushed a new commit that addresses your suggestion above, however after some consideration I opted to forward declare PyMemoryIter_type (instead of all the functions) because otherwise I would have to leave it above memor_iter() - which uses it to register with the garbage collector - and in my opinion it would negatively affect the reading flow.
I think the code is very readable this way, but please let me know if you would prefer no forward declarations at all.
(...) before we move on to bigger and better things.
Can't wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, just one nit. Let's get this submitted as a PR for cpython. In order to do that it's probably best to first file an issue on bugs.python.org where you show that memoryview iteration is slower than it could be. You can then submit your PR to fix the issue (read the instructions in the template carefully).
One final thing. Does your GitHub username really have to be so obscure? Most people just use some abbreviation of their name. :-)
Objects/memoryobject.c
Outdated
The function is used by memory_subscript(), memoryiter_next() and | ||
memory_as_sequence. | ||
Note: Iteration of multi-dimensional arrays is not yet implemented */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can revert this whole comment, you no longer use memory_item
in memoryiter_next
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I will re-read the dev-guide part related to PR.
No. I will fix that now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Congrats!
Thank you! to the first of many. |
Hi Guido,
Here's the cleaned up version, without the extra commits.
Please let me know what you think.
PS: I think a next step in this module could be to add a new memory_item() function to work exclusively with our new memoryiter_next(), this would avoid a few redudant/duplicated checks that now happen when we call memory_item() from memoryiter_next(). (Note: memory_item() serves two extra functions, so those checks must exist for when the function is not called from memoryiter_next(), although they are duplicated when called from memoryiter_next() since we already perform those on memoryiter_next() itself)