Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add iterator to memoryview #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Add iterator to memoryview #19

wants to merge 8 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 2, 2020

Hi Guido,

Here's the cleaned up version, without the extra commits.

Please let me know what you think.

PS: I think a next step in this module could be to add a new memory_item() function to work exclusively with our new memoryiter_next(), this would avoid a few redudant/duplicated checks that now happen when we call memory_item() from memoryiter_next(). (Note: memory_item() serves two extra functions, so those checks must exist for when the function is not called from memoryiter_next(), although they are duplicated when called from memoryiter_next() since we already perform those on memoryiter_next() itself)

Copy link
Owner

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits and a bugfix.

Your next step sounds right -- "hoist" the checks that validate the object each time. I would think you can start by adding members to the memoryiterobject struct to hold the length and the fmt code. After that we can look into inlining ptr_from_index (which is mostly lookup_dimension). I see all kinds of redundant checks there.

In the meantime, let's keep the eye on the ball.

I used this benchmark, two lines of shell code:

./python.exe -m timeit -s "a=memoryview(b'x'*1000000)" "for x in a: pass"
./python.exe -m timeit -s "a=b'x'*1000000" "for x in a: pass"

Right now memoryview is about 11% slower than bytes.

The function is used by memory_subscript(), memoryiter_next() and
memory_as_sequence.
Note: Iteration of multi-dimensional arrays is not yet implemented
(as of Python 3.10.0a0) */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the version -- whoever implements it should remove the comment. :-)

}
if (seq->view.ndim != 1) {
PyErr_SetString(PyExc_NotImplementedError,
"multi-dimensional sub-views are not implemented");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add return NULL; -- the PyErr... function doesn't automagically return!

You may be able to save a few usec by moving this check to memory_iter.

(Note that the error also gets invoked for a 0-dimensional memoryview object. In the past that would give TypeError: invalid indexing of 0-dim memory, so still an error. But not really "multi-dimensional". :-)

@@ -3200,4 +3289,4 @@ PyTypeObject PyMemoryView_Type = {
0, /* tp_init */
0, /* tp_alloc */
memoryview, /* tp_new */
};
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the last line of the file ends in a newline character! (Most editors have a setting to make sure of this.)

@ghost
Copy link
Author

ghost commented Sep 3, 2020

The last 2 commits address the issues you raised.
Regarding performance compared to bytes, I can also confirm that memoryview is still slower (even more than 11% if optimizations are enabled)

If everything checks out for you at this stage, I look forward to continue working on this patch and try to close the performance difference.

Objects/memoryobject.c Show resolved Hide resolved
Copy link
Owner

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Now let's optimize. Do you have ideas where to start?

@ghost
Copy link
Author

ghost commented Sep 3, 2020

I think it is worth to try out the suggestions you made above:

you can start by adding members to the memoryiterobject struct to hold the length and the fmt code"

I actually just tried the first part and there was no noticeable difference in comparison to the actual implementation.
The second part might be more successful as there are multiple indirections that could be cut-off.

Another idea could be to try and make a new lookup_dimension(), which would be called directly (instead of from ptr_to_index() ), which could do without some of the existent checks that the current version has.

What do you think? Any other ideas?

@gvanrossum
Copy link
Owner

gvanrossum commented Sep 3, 2020 via email

@ghost
Copy link
Author

ghost commented Sep 3, 2020

Ok, I will work on it tonight.

@ghost
Copy link
Author

ghost commented Sep 3, 2020

Hi Guido,

Please find this first attempt at optimizing our patch. It might be possible to push it further but I would like to hear your thoughts on this current version before proceeding.

Thank you.

Copy link
Owner

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving along nicely!

Py_ssize_t it_index;
PyMemoryViewObject *it_seq; // Set to NULL when iterator is exhausted
Py_ssize_t length;
const char *fmt;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot the it_ prefix here. :-)

{
Py_ssize_t nitems = view->shape[0];

// handle negative lookups (i.e [-2])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you even handle these? In ...iter_next the index is always 0 or greater. You can replace this check with an assert.

if (index < 0) {
index += nitems;
}
if (index < 0 || index >= nitems) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your only caller already checks that the index is within bounds, you could skip the index >= nitems check as well. And then you don't even need the nitems variable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honest mistake - was reading lookup_dimension() and forgot that I didn't need those extra checks coming from memoryiter_next()

return 0;
}

static char *
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this function inline.

Copy link
Author

@ghost ghost Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and it didn't add any noticeable performance improvement, however "inlining" the function myself did yield a very noticeable performance improvement as iterating over memoryview is now faster than iterating over bytes, when compilling without optimizations enabled (i.e debug mode).

Do you have an explanation as to why enabling optimizations seems to have less effect on memoryview in comparison to bytes?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not faster than bytes for me yet, but it's definitely the fastest I've seen it. On my platform the time went from 12.7 to 9.96 msec (looping over a memoryview 1,000,000 of bytes.) Congratulations!

The main overhead that --with-pydebug adds is not so much compiler optimizations as extra asserts. E.g. remember the assert you added to your GET_SIZE macro? Probably the memoryview code still has more asserts than the bytes iterator.

A fun experiment: if the fmt code is 'B' and the data is contiguous, inline even more (you can just call PyLong_FromLong, see unpack_single.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All credit goes to you. Thank you once again for all the time you are taking to teach me, I can't wait to learn more from you and I couldn't be more grateful.

I will try your idea after dinner and report on it. On that note, wouldn't we be able to use that same hack for all the different formats, if we could infer that the data is contiguous?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you don't want to duplicate all the logic from unpack_single(). In the end, who's going to care whether iterating over a memoryview takes 10 nsec or 11 nsec per iteration. For yourself, what's the most important data type when you're iterating over a numpy array?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float.

I tried the idea you mentioned above and my specific implementation (which might be non-optimal) made things significantly slower.

Back to the current commit, when compiling with optimizations enabled, iterating over bytes is still ~30% faster than iterating over memoryview on my setup, however
with debug enabled, memoryview is ~1% faster.

Do you see any further optimizations, besides the one you mentioned, that could be done to try and close the difference between bytes and memoryview?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we should leave well enough alone. This is only a micro-optimization anyway -- nobody's going to care about any further optimizations, but the core devs would object against the added complexity of the code (e.g. if someone were to change the implementation of ptr_from_index or lookup_dimension they would have to remember to also update memoryiter_next.

i have one further suggestion before we move on to bigger and better things. If you move PyMemoryIter_Type after the various functions you won't need to add those forward declarations. This is how things are traditionally written. (For example, this is why PyMemoryView_Type is at the end of the file.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Guido,

I pushed a new commit that addresses your suggestion above, however after some consideration I opted to forward declare PyMemoryIter_type (instead of all the functions) because otherwise I would have to leave it above memor_iter() - which uses it to register with the garbage collector - and in my opinion it would negatively affect the reading flow.
I think the code is very readable this way, but please let me know if you would prefer no forward declarations at all.

(...) before we move on to bigger and better things.

Can't wait.

@gvanrossum gvanrossum mentioned this pull request Sep 3, 2020
Copy link
Owner

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just one nit. Let's get this submitted as a PR for cpython. In order to do that it's probably best to first file an issue on bugs.python.org where you show that memoryview iteration is slower than it could be. You can then submit your PR to fix the issue (read the instructions in the template carefully).

One final thing. Does your GitHub username really have to be so obscure? Most people just use some abbreviation of their name. :-)

Comment on lines 2335 to 2337
The function is used by memory_subscript(), memoryiter_next() and
memory_as_sequence.
Note: Iteration of multi-dimensional arrays is not yet implemented */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can revert this whole comment, you no longer use memory_item in memoryiter_next.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@ghost
Copy link
Author

ghost commented Sep 5, 2020

I will re-read the dev-guide part related to PR.

Does your GitHub username really have to be so obscure?

No. I will fix that now.

Copy link
Owner

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Congrats!

@ghost
Copy link
Author

ghost commented Sep 5, 2020

Thank you! to the first of many.

@ghost
Copy link
Author

ghost commented Sep 6, 2020

CPython PR

@gvanrossum gvanrossum closed this Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.