-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
DOC: Clarify (potentially misleading) nbytes docstring #28943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, adding a note seems good, but that much is much too complicated for the extra information.
numpy/_core/_add_newdocs.py
Outdated
|
||
Notes | ||
----- | ||
If the array is a view, this shows how much memory it *would* use | ||
if it were copied into a separate array. | ||
Does not include memory consumed by non-element attributes of the | ||
array object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we ca add that it also doesn't include memory indirectly held by the elements.
(I.e. if you store Python objects or the new StringDType
)
numpy/_core/_add_newdocs.py
Outdated
>>> arr_1.nbytes | ||
800000 | ||
>>> arr_2.nbytes | ||
2400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces way too much complexity for very little gain. If anything at all, just do some slicing like arr[::2]
or so.
numpy/_core/_add_newdocs.py
Outdated
@@ -2698,6 +2701,17 @@ | ||
>>> np.prod(x.shape) * x.itemsize | ||
480 | ||
|
||
>>> import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need I think, but a sentence on why the next thing comes would help.
Another wrinkle is that it's only the memory used by the array buffer. For object arrays or StringDType arrays, it's an underestimate. |
Thank you, I have edited the docstring based on the suggestions. |
Maybe it is enough to use qualifiers like "approximately" and "at least"/"at most" rather than try to describe all the ways the number is wrong. Then point to a documentation page like https://numpy.org/devdocs/dev/internals.code-explanations.html#memory-model, and maybe add nuanced qualifications there instead of in the docstring. |
I guess one future-proof way could be to describe how |
That sounds good. Something like "This is the memory used by the main array buffer and does not account for any memory used for array metadata or for data stored outside of the array buffer. For example, |
The documentation for
numpy.ndarray.nbytes
has the potentially misleading description that it's the "total bytes consumed by the elements of the array", but thenbytes
for a view doesn't reflect the memory consumption of its elements, but rather what that consumption would've been if it were a copy. This has been mentioned before in #22925, but the issue was closed before this was clarified. I have included an additional example in the docstring that demonstrates this.