Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DOC: Clarify (potentially misleading) nbytes docstring #28943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
Loading
from

Conversation

zvun
Copy link

@zvun zvun commented May 11, 2025

The documentation for numpy.ndarray.nbytes has the potentially misleading description that it's the "total bytes consumed by the elements of the array", but the nbytes for a view doesn't reflect the memory consumption of its elements, but rather what that consumption would've been if it were a copy. This has been mentioned before in #22925, but the issue was closed before this was clarified. I have included an additional example in the docstring that demonstrates this.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, adding a note seems good, but that much is much too complicated for the extra information.


Notes
-----
If the array is a view, this shows how much memory it *would* use
if it were copied into a separate array.
Does not include memory consumed by non-element attributes of the
array object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we ca add that it also doesn't include memory indirectly held by the elements.
(I.e. if you store Python objects or the new StringDType)

>>> arr_1.nbytes
800000
>>> arr_2.nbytes
2400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces way too much complexity for very little gain. If anything at all, just do some slicing like arr[::2] or so.

@@ -2698,6 +2701,17 @@
>>> np.prod(x.shape) * x.itemsize
480

>>> import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need I think, but a sentence on why the next thing comes would help.

@ngoldbaum
Copy link
Member

Another wrinkle is that it's only the memory used by the array buffer. For object arrays or StringDType arrays, it's an underestimate.

@zvun
Copy link
Author

zvun commented May 12, 2025

Thank you, I have edited the docstring based on the suggestions.

@mattip
Copy link
Member

mattip commented May 12, 2025

Maybe it is enough to use qualifiers like "approximately" and "at least"/"at most" rather than try to describe all the ways the number is wrong. Then point to a documentation page like https://numpy.org/devdocs/dev/internals.code-explanations.html#memory-model, and maybe add nuanced qualifications there instead of in the docstring.

@zvun
Copy link
Author

zvun commented May 14, 2025

I guess one future-proof way could be to describe how nbytes is calculated, and then mention some examples for different dtypes. With a quick search, it seems to be the product of the array's dimensions multiplied by the item size. For object elements the latter is probably the size of the pointers, not sure how it would be for StringDType though. What do you think? @mattip @seberg

@ngoldbaum
Copy link
Member

What do you think?

That sounds good. Something like "This is the memory used by the main array buffer and does not account for any memory used for array metadata or for data stored outside of the array buffer. For example, nbytes, is a lower limit for the object and StringDType types because these types can store data outside the main array buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.