Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Show shape any time it cannot be inferred in repr #27482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Sep 30, 2024

As noted in #27461, when an array is summarized, the repr no longer makes it clear what the shape is. This is a bit odd, since for the case of a zero-sized array, the shape is shown when it is not clear what it would be (i.e., when the size is zero but the shape multi-dimensional). Similarly, if the dtype is not obvious, it is shown. So, this PR also shows the shape when summaries are made:

In [2]: np.arange(1001)
Out[2]: array([   0,    1,    2, ...,  998,  999, 1000], shape=(1001,))
In [3]: np.arange(1001, dtype="i2")
Out[3]: 
array([   0,    1,    2, ...,  998,  999, 1000],
      shape=(1001,), dtype=int16)

One can get the old behaviour by using legacy='2.1' - I'm not sure this is really required; happy to remove it if desired.

Fixes #27461

p.s. The first commit reorganizes the routine to make it easier to add the shape, while the second has the actual enhancement and tests.

...,
[[ 0.]]]])""")
)
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does two things: correctly put the printoptions setting inside a try/finally, and ensure that there are never any spaces at the end of a line (my editor removed those, thus causing the tests to fail).

@rkern
Copy link
Member

rkern commented Sep 30, 2024

We're not going to change the default representation without a wider discussion on the mailing list and a NEP. This will cause a ton of churn for docstring examples in all downstream projects, so we won't pull the trigger on this without a significant consensus from a lot of stakeholders. I have my doubts that we'll get that kind of strong agreement at this time. NEP 51 was the last change of this kind, and it was significantly smaller in scope.

In any case, since there is no shape= argument to np.array(), this particular choice for __repr__ is probably not going to happen.

@mhvk mhvk force-pushed the arrayprint-shape-for-summarized branch from 2530d2d to 20b2bab Compare September 30, 2024 16:44
@mhvk
Copy link
Contributor Author

mhvk commented Sep 30, 2024

In any case, since there is no shape= argument to np.array(), this particular choice for __repr__ is probably not going to happen.

Hmm, we do give the shape already for shapes that include zeros, and the repr cannot be used anyway to round-trip the array if it is being summarized... Also, I don't actually think this is going to be very disruptive, since the shape is only shown in the rare case that summarization is done (needs a size > 1000). Indeed, in numpy, there is all of one docstring test that now fails. (EDIT: I also checked astropy: also one test failure for a case where it explicitly checks what happens for an array that exceeds the threshold).

Anyway, point taken that this should go by the mailing list (though if the consensus is that this requires a NEP, I will just close this...).

@stefanv
Copy link
Contributor

stefanv commented Sep 30, 2024

since there is no shape= argument to np.array(), this particular choice for __repr__

This part does strike me as contrary to what a repr does, so would prefer the version with of shape=(...) hinted at in the mailing list post (given that our repr's cannot be eval'd anyway).

@seberg
Copy link
Member

seberg commented Oct 1, 2024

I don't mind this change. It seems right that this is far less impactful than the scalar repr change if it only kicks in when eval round-tripping can't possibly work (if we need a NEP, then just a tiny one for the process of agreeing to it, I think).
Using shape= as if it was a call is a bit awkward, but not sure that trailing of shape=(...) will read better, so I honestly don't have an opinion on which part.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 2, 2024

Given the feedback on the mailing list, I tried implementing adding # shape=... for the cases where the shape is not obvious, but at least locally spin check-docs now hangs where that happens (in np.set_printoptions docstring): it looks like one may be able to have a comment that does not get checked by the doctest, but not have output that has a comment... I pushed it up so people can see it, but one may have to kill the relevant test run... Any pointers to where these doctests are run?

Current sample output

In [4]: np.arange(1001)
Out[4]: array([   0,    1,    2, ...,  998,  999, 1000])  # shape=(1001,)

In [5]: np.arange(1001, dtype="i2")
Out[5]: 
array([   0,    1,    2, ...,  998,  999, 1000],
      dtype=int16)  # shape=(1001,)

In [6]: np.empty((10,2,0), dtype="i2")
Out[6]: array([], dtype=int16)  # shape=(10, 2, 0)

@mhvk mhvk force-pushed the arrayprint-shape-for-summarized branch from 98e5bbf to 4eb54b2 Compare October 2, 2024 19:47
@mhvk
Copy link
Contributor Author

mhvk commented Oct 2, 2024

OK, I figured out the doctests - it is just very slow if something fails. Must admit that the comment after the array works less well when the array is part of a list, as is the case in the examples for the various split functions. E.g.,

x = np.arange(16.0).reshape(2, 2, 4)
np.dsplit(x, np.array([3, 6]))
[array([[[ 0.,  1.,  2.],
         [ 4.,  5.,  6.]],
 
        [[ 8.,  9., 10.],
         [12., 13., 14.]]]),
 array([[[ 3.],
         [ 7.]],
 
        [[11.],
         [15.]]]),
 array([], dtype=float64)  # shape=(2, 2, 0)]

But arguably rare enough that it doesn't matter much. I do like not having the shape argument inside the array(...) part.

@stefanv
Copy link
Contributor

stefanv commented Oct 3, 2024

I guess the whole point here is not to be disruptive, otherwise <ndarray data=[...] shape=(...) dtype=float> may have been the better choice.

That # shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

There is no constructor for which it makes sense to derive shape from an input object AND specify the shape (ndarray comes closest, but takes a buffer instead of obj), so I suppose it's futile.

Slightly radical, perhaps, but we could add a shape argument to array(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless, but at least then we have some valid excuse to have shape inside the keyword list. (We'll tell marketing to sell it as a Shape Verifier™.)

@seberg
Copy link
Member

seberg commented Oct 3, 2024

Yeah, while an ndim or ndmax would be useful information, a shape tuple is probably really just useful for verification (even if you allow (-1, 3)); unless you allow reshaping, but that seems too strange to me.
FWIW, my "I don't care", is actually being in favor of using , shape=; I suppose I don't care about the "broken" syntax all that much.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 3, 2024

Slightly radical, perhaps, but we could add a shape argument to array(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless,

Actually, not totally useless for dtype=object - it could tell what the desired depth is to which lists of lists should be parsed, i.e., which level the object is meant to refer to.

Back to this PR, I find the comment inside the list rather bothersome too, and agree with @seberg that the "broken" syntax is not that bad - after all, it is only shown for cases where the data string itself could not be parsed anyway. Shall I go ahead and remove the last commit that tried it?

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the number of elements omitted. I felt it was getting too complex, but could revisit...

@seberg
Copy link
Member

seberg commented Oct 3, 2024

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the

I remember seeing this discussed as an issue or mailing list post at some point (i.e. someone felt strong about it). Personally, I tend to think it adds much visual clutter (maybe in an html repr where you can reduce the clutter).

@mattip
Copy link
Member

mattip commented Oct 30, 2024

Summarizing the more difficult problem: changing the repr breaks doc tests. It seems there are about a dozen in scipy, but only a handful in NumPy. If we go ahead with this, we could recommend projects use legacy='2.1' if they update to 2.x and their doc tests are broken.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 30, 2024

What was the final verdict on where to have the shape information? As a "fake" argument (perhaps to be added later as a real one), or as the comment? If the former, I'll need to remove the last commit.

@stefanv
Copy link
Contributor

stefanv commented Oct 31, 2024

That # shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

I've warmed to the non-existent keyword argument, because of the above.

@mattip
Copy link
Member

mattip commented Oct 31, 2024

Agreed. That was also the sentiment at the triage meeting: a fake argument not a comment.

To help set oneself up for using shape for different reasons.
@mhvk mhvk force-pushed the arrayprint-shape-for-summarized branch from 4eb54b2 to c5f2516 Compare October 31, 2024 11:31
@mhvk
Copy link
Contributor Author

mhvk commented Oct 31, 2024

OK, removed the last commit (and rebased), so it is back to a fake shape argument.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A short release note that mentions the legacy=2.1 fix for projects with many doctest failures should be added.

So that it is always there when the shape cannot be inferred from list
@mhvk mhvk force-pushed the arrayprint-shape-for-summarized branch from c5f2516 to 3c6d763 Compare November 5, 2024 18:30
@mhvk
Copy link
Contributor Author

mhvk commented Nov 5, 2024

OK, thanks, I added a changelog snippet.

@mattip mattip merged commit fbffb8c into numpy:main Nov 7, 2024
67 checks passed
@mattip
Copy link
Member

mattip commented Nov 7, 2024

Thanks @mhvk

@ev-br
Copy link
Contributor

ev-br commented Dec 10, 2024

PSA note for projects using scipy-doctest: upgrade to scipy-doctest==1.5.1, update your doctests to use the new repr, and your doctesting will work on both numpy 2.2 and older versions.

@TomNicholas
Copy link

TomNicholas commented Dec 10, 2024

This change broke xarray's printed representation (xref pydata/xarray#9873) - thank you for providing an escape hatch, but is there a way for me to enable that as a global option?

@seberg
Copy link
Member

seberg commented Dec 10, 2024

You can also use legacy="..." printoptions to avoid the new repr (that defers the problem a bit maybe).

Although, in this case since it seems like a single test, maybe it is just as easy to adapt the test to allow both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add shape control parameter to set_printoptions
7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.