Show shape any time it cannot be inferred in repr #27482

mhvk · Sep 30, 2024

As noted in #27461, when an array is summarized, the repr no longer makes it clear what the shape is. This is a bit odd, since for the case of a zero-sized array, the shape is shown when it is not clear what it would be (i.e., when the size is zero but the shape multi-dimensional). Similarly, if the dtype is not obvious, it is shown. So, this PR also shows the shape when summaries are made:

In [2]: np.arange(1001)
Out[2]: array([   0,    1,    2, ...,  998,  999, 1000], shape=(1001,))
In [3]: np.arange(1001, dtype="i2")
Out[3]: 
array([   0,    1,    2, ...,  998,  999, 1000],
      shape=(1001,), dtype=int16)

One can get the old behaviour by using legacy='2.1' - I'm not sure this is really required; happy to remove it if desired.

Fixes #27461

p.s. The first commit reorganizes the routine to make it easier to add the shape, while the second has the actual enhancement and tests.

mhvk · Sep 30, 2024

numpy/_core/tests/test_arrayprint.py

-                    ..., 
-                    [[ 0.]]]])""")
-        )
+        try:


This change does two things: correctly put the printoptions setting inside a try/finally, and ensure that there are never any spaces at the end of a line (my editor removed those, thus causing the tests to fail).

rkern · Sep 30, 2024

We're not going to change the default representation without a wider discussion on the mailing list and a NEP. This will cause a ton of churn for docstring examples in all downstream projects, so we won't pull the trigger on this without a significant consensus from a lot of stakeholders. I have my doubts that we'll get that kind of strong agreement at this time. NEP 51 was the last change of this kind, and it was significantly smaller in scope.

In any case, since there is no shape= argument to np.array(), this particular choice for __repr__ is probably not going to happen.

mhvk · Sep 30, 2024

In any case, since there is no shape= argument to np.array(), this particular choice for __repr__ is probably not going to happen.

Hmm, we do give the shape already for shapes that include zeros, and the repr cannot be used anyway to round-trip the array if it is being summarized... Also, I don't actually think this is going to be very disruptive, since the shape is only shown in the rare case that summarization is done (needs a size > 1000). Indeed, in numpy, there is all of one docstring test that now fails. (EDIT: I also checked astropy: also one test failure for a case where it explicitly checks what happens for an array that exceeds the threshold).

Anyway, point taken that this should go by the mailing list (though if the consensus is that this requires a NEP, I will just close this...).

stefanv · Sep 30, 2024

since there is no shape= argument to np.array(), this particular choice for __repr__

This part does strike me as contrary to what a repr does, so would prefer the version with of shape=(...) hinted at in the mailing list post (given that our repr's cannot be eval'd anyway).

seberg · Oct 1, 2024

I don't mind this change. It seems right that this is far less impactful than the scalar repr change if it only kicks in when eval round-tripping can't possibly work (if we need a NEP, then just a tiny one for the process of agreeing to it, I think).
Using shape= as if it was a call is a bit awkward, but not sure that trailing of shape=(...) will read better, so I honestly don't have an opinion on which part.

mhvk · Oct 2, 2024

Given the feedback on the mailing list, I tried implementing adding # shape=... for the cases where the shape is not obvious, but at least locally spin check-docs now hangs where that happens (in np.set_printoptions docstring): it looks like one may be able to have a comment that does not get checked by the doctest, but not have output that has a comment... I pushed it up so people can see it, but one may have to kill the relevant test run... Any pointers to where these doctests are run?

Current sample output

In [4]: np.arange(1001)
Out[4]: array([   0,    1,    2, ...,  998,  999, 1000])  # shape=(1001,)

In [5]: np.arange(1001, dtype="i2")
Out[5]: 
array([   0,    1,    2, ...,  998,  999, 1000],
      dtype=int16)  # shape=(1001,)

In [6]: np.empty((10,2,0), dtype="i2")
Out[6]: array([], dtype=int16)  # shape=(10, 2, 0)

mhvk · Oct 2, 2024

OK, I figured out the doctests - it is just very slow if something fails. Must admit that the comment after the array works less well when the array is part of a list, as is the case in the examples for the various split functions. E.g.,

x = np.arange(16.0).reshape(2, 2, 4)
np.dsplit(x, np.array([3, 6]))
[array([[[ 0.,  1.,  2.],
         [ 4.,  5.,  6.]],
 
        [[ 8.,  9., 10.],
         [12., 13., 14.]]]),
 array([[[ 3.],
         [ 7.]],
 
        [[11.],
         [15.]]]),
 array([], dtype=float64)  # shape=(2, 2, 0)]

But arguably rare enough that it doesn't matter much. I do like not having the shape argument inside the array(...) part.

stefanv · Oct 3, 2024

I guess the whole point here is not to be disruptive, otherwise <ndarray data=[...] shape=(...) dtype=float> may have been the better choice.

That # shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

There is no constructor for which it makes sense to derive shape from an input object AND specify the shape (ndarray comes closest, but takes a buffer instead of obj), so I suppose it's futile.

Slightly radical, perhaps, but we could add a shape argument to array(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless, but at least then we have some valid excuse to have shape inside the keyword list. (We'll tell marketing to sell it as a Shape Verifier™.)

seberg · Oct 3, 2024

Yeah, while an ndim or ndmax would be useful information, a shape tuple is probably really just useful for verification (even if you allow (-1, 3)); unless you allow reshaping, but that seems too strange to me.
FWIW, my "I don't care", is actually being in favor of using , shape=; I suppose I don't care about the "broken" syntax all that much.

mhvk · Oct 3, 2024

Slightly radical, perhaps, but we could add a shape argument to array(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless,

Actually, not totally useless for dtype=object - it could tell what the desired depth is to which lists of lists should be parsed, i.e., which level the object is meant to refer to.

Back to this PR, I find the comment inside the list rather bothersome too, and agree with @seberg that the "broken" syntax is not that bad - after all, it is only shown for cases where the data string itself could not be parsed anyway. Shall I go ahead and remove the last commit that tried it?

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the number of elements omitted. I felt it was getting too complex, but could revisit...

seberg · Oct 3, 2024

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the

I remember seeing this discussed as an issue or mailing list post at some point (i.e. someone felt strong about it). Personally, I tend to think it adds much visual clutter (maybe in an html repr where you can reduce the clutter).

mattip · Oct 30, 2024

Summarizing the more difficult problem: changing the repr breaks doc tests. It seems there are about a dozen in scipy, but only a handful in NumPy. If we go ahead with this, we could recommend projects use legacy='2.1' if they update to 2.x and their doc tests are broken.

mhvk · Oct 30, 2024

What was the final verdict on where to have the shape information? As a "fake" argument (perhaps to be added later as a real one), or as the comment? If the former, I'll need to remove the last commit.

stefanv · Oct 31, 2024

That # shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

I've warmed to the non-existent keyword argument, because of the above.

mattip · Oct 31, 2024

Agreed. That was also the sentiment at the triage meeting: a fake argument not a comment.

To help set oneself up for using shape for different reasons.

mhvk · Oct 31, 2024

OK, removed the last commit (and rebased), so it is back to a fake shape argument.

mattip

LGTM. A short release note that mentions the legacy=2.1 fix for projects with many doctest failures should be added.

So that it is always there when the shape cannot be inferred from list

mhvk · Nov 5, 2024

OK, thanks, I added a changelog snippet.

mattip · Nov 7, 2024

Thanks @mhvk

ev-br · Dec 10, 2024

PSA note for projects using scipy-doctest: upgrade to scipy-doctest==1.5.1, update your doctests to use the new repr, and your doctesting will work on both numpy 2.2 and older versions.

TomNicholas · Dec 10, 2024

This change broke xarray's printed representation (xref pydata/xarray#9873) - thank you for providing an escape hatch, but is there a way for me to enable that as a global option?

seberg · Dec 10, 2024

You can also use legacy="..." printoptions to avoid the new repr (that defers the problem a bit maybe).

Although, in this case since it seems like a single test, maybe it is just as easy to adapt the test to allow both?

mhvk added 01 - Enhancement component: numpy._core labels Sep 30, 2024

mhvk added this to the 2.2.0 release milestone Sep 30, 2024

mhvk commented Sep 30, 2024

View reviewed changes

mhvk mentioned this pull request Sep 30, 2024

ENH: Add shape control parameter to set_printoptions #27461

Closed

mhvk force-pushed the arrayprint-shape-for-summarized branch from 2530d2d to 20b2bab Compare September 30, 2024 16:44

mhvk force-pushed the arrayprint-shape-for-summarized branch from 98e5bbf to 4eb54b2 Compare October 2, 2024 19:47

MAINT: refactor array_repr to treat shape and dtype similary

4faf84e

To help set oneself up for using shape for different reasons.

mhvk force-pushed the arrayprint-shape-for-summarized branch from 4eb54b2 to c5f2516 Compare October 31, 2024 11:31

mattip approved these changes Nov 4, 2024

View reviewed changes

ENH: include shape in repr also for summarized arrays

3c6d763

So that it is always there when the shape cannot be inferred from list

mhvk force-pushed the arrayprint-shape-for-summarized branch from c5f2516 to 3c6d763 Compare November 5, 2024 18:30

mattip merged commit fbffb8c into numpy:main Nov 7, 2024
67 checks passed

mhvk deleted the arrayprint-shape-for-summarized branch November 7, 2024 12:07

neutrinoceros mentioned this pull request Nov 10, 2024

TST: fix running array repr tests with an ellipsis on numpy 2.2 (dev) astropy/astropy#17364

Merged

1 task

ev-br mentioned this pull request Dec 9, 2024

Adapt to numpy==2.2 repr changes scipy/scipy_doctest#175

Closed

TomNicholas mentioned this pull request Dec 10, 2024

numpy formatting change breaks tests pydata/xarray#9873

Closed

d-v-b mentioned this pull request Dec 10, 2024

Cache pre-existing Zarr arrays in Zarr backend pydata/xarray#9861

Merged

peterjc mentioned this pull request Dec 16, 2024

setup.py: don't error on setuptools 75.3.0.post0 biopython/biopython#4897

Merged

3 tasks

Search code, repositories, users, issues, pull requests...

Uh oh!

Show shape any time it cannot be inferred in repr #27482

Show shape any time it cannot be inferred in repr #27482

Uh oh!

Conversation

mhvk commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

rkern commented Sep 30, 2024

Uh oh!

mhvk commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefanv commented Sep 30, 2024

Uh oh!

seberg commented Oct 1, 2024

Uh oh!

mhvk commented Oct 2, 2024

Uh oh!

mhvk commented Oct 2, 2024

Uh oh!

stefanv commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Oct 3, 2024

Uh oh!

mhvk commented Oct 3, 2024

Uh oh!

seberg commented Oct 3, 2024

Uh oh!

mattip commented Oct 30, 2024

Uh oh!

mhvk commented Oct 30, 2024

Uh oh!

stefanv commented Oct 31, 2024

Uh oh!

mattip commented Oct 31, 2024

Uh oh!

mhvk commented Oct 31, 2024

Uh oh!

mattip left a comment

Choose a reason for hiding this comment

Uh oh!

mhvk commented Nov 5, 2024

Uh oh!

Uh oh!

mattip commented Nov 7, 2024

Uh oh!

ev-br commented Dec 10, 2024

Uh oh!

TomNicholas commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Dec 10, 2024

Uh oh!

Uh oh!

mhvk commented Sep 30, 2024 •

edited

Loading

mhvk commented Sep 30, 2024 •

edited

Loading

stefanv commented Oct 3, 2024 •

edited

Loading

TomNicholas commented Dec 10, 2024 •

edited

Loading