-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: ndarray.__format__ implementation for numeric dtypes #19550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@mattip do you mind having a look at this? |
I don't remember if this idea hit the mailing list. This comment in the issue goes through expected behaviour for python objects. How does your solution handle this? Will you raise an error if a formatting string for int, float, string is passed to a non-compliant array? Can the formatting of int128 or longdouble handle format strings without losing precision? How will this interact with the options of |
At first, since we are only considering formatting the numeric type, floating numbers specifically, we are only interested in being able to change the precision, the sign, and possibly the rounding or truncation. Since the I propose a mini format specification inspired in the Format Specification Mini-Language.
We are going to consider only 3 arguments of the As @brandon-rhodes said in #5543, the behavior when you try to format an array containing Python objects, the behavior should be the same as Python has implemented by default in the What remains to be defined is the behavior when trying to format an array with a non-numeric data type ( If you consider it necessary, I can send the previous explanation to the mailing list. |
Just a couple points I thought were relevant to the discussion: As mentioned here, Python doesn't support this type of format string for sequences; for example: >>> l = [1, 2, 3, 4]
>>> print(f"{l:.2f}")
Traceback (most recent call last)
...
TypeError: unsupported format string passed to list.__format__ It'd be nice to do some research into related discussions to provide context for the proposed change. Also, it's worth noting that numpy already has a (quite Pythonic) way to customize array printing --- using >>> a = np.array([-np.pi, np.pi])
>>> with np.printoptions(precision=2, sign="+"):
... print(a)
[-3.14 +3.14] |
I suspect that's because Python sequences are heterogeneous — and it would make sense for an ndarray-of-objects to similarly refuse to perform string formatting if we wanted strict symmetry. But the case of an ndarray-of-floats doesn't have a strict analogy among the builtin Python data types of list, dict, and set.
I would suggest that the
|
Things get messy when you consider arrays with
Both of these concerns are addressed by changing our context managers to use |
I think the discussion about the formatter is not related to my implementation but to the already used -- in str and repr-- >>> a = np.array([-np.pi, np.pi], dtype=np.object_)
>>> np.array2string(a, precision=2)
'[-3.141592653589793 3.141592653589793]'
I still think the discussion is not over that direction because it is still more convenient to use a formatting spec than using |
I think this is ready to review. |
How would |
>>> import numpy as np
>>> np.__version__
'1.22.0.dev0+660.g8ad01d7ed'
>>> f"{np.float32(101.1)}"
'101.0999984741211' The "conversion" for scalar types (0-dim arrays) is done at |
I think it's related to #10645 as the float32 is converted to double precision for native python. Are scalars not covered in this PR? Update: no. But see #10645 (comment) for specific advice to implement. |
numpy/core/arrayprint.py
Outdated
@array_function_dispatch(_array_format_dispatcher, module='numpy') | ||
def array_format(a, format_spec): | ||
print("[DEBUG] array_format") | ||
|
||
return _array_format_implementation(a, format_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to expose a new public function here?
Won't builtins.format
suffice for public usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree — "there should only be one way to do it", and leaving this name private will encourage folks to encounter this through the ecosystem of f'', .format(), and builtins.format(), rather than through a bespoke mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you have a very good point. I really didn't think about this and I tried to do it the same way as it's done with array2string
. Do you know if there is any other use for the array_function_dispatch
decorator apart from exposing the function with another name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we have need array_function_dispatch
here in the first place. The relevant format
output can already be fully customized via custom implementations of __format__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code, there are several examples in which they use it and I can't seem to understand them: array_str
and array_repr
instead of using plain str()
and repr()
. Is there any use for that? What I researched is that is related to NEP-18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code, there are several examples in which they use it and I can't seem to understand them:
array_str
andarray_repr
instead of using plainstr()
andrepr()
. Is there any use for that?
Considering all these functions are at least 17 years old, I suspect it is mostly due to "historical reasons". NEP 18 came quite a bit latter and by that time support was added to array_str
and array_repr
as they were already well established.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is still exposed in the namespace. Either remove it or start a process to expose it: add a proper docstring and post the justification to the mailing list.
Thanks for the reference @mwtoews. I will try to implement it the same way as I did with arrays: using >>> np.array2string(np.float32(101.1))
'101.1' The only question I have is if I should use the same function for scalar types and numpy arrays?, i.e., use the |
Speaking merely as a user to provide some data for the real NumPy folks here to work off of: I think I would expect scalar formatting to work like normal Python int and float formatting. I often, in fact, can't predict ahead of time when I'm going to wind up with a plain float and when I'm going to wind up with a NumPy scalar, so it would be a bit disruptive for them to behave differently when given something like On the other hand, folks with more NumPy experience might think differently? |
In [1]: import numpy as np
In [2]: f4 = np.float32(1)
In [3]: format(f4, ".2f")
Out[3]: '1.00' |
Scalar formatting is still broken, as it's not currently handled in this PR. Internally, the float32 gets cast to a double precision Python float. v = np.float32(101.1)
format(v, "") # 101.0999984741211
format(v, ".6f") # 101.099998
# these are good (via dragon4)
np.format_float_positional(v, 6) # 101.1
np.format_float_positional(v, 16) # 101.1 |
fb7ba29
to
5f3dadd
Compare
My new commits solve the issue partially: >>> np.__version__
'1.22.0.dev0+729.g5f3dadd94'
>>> v = np.float32(101.1)
>>> format(v, "") # 101.0999984741211
'101.1'
>>> format(v, ".6f") # 101.099998
'101.099998' The thing is that there is a lot of code that relies on scalar types behaving like Python built-ins. For example, the 'g' specifier is used a lot and would require substantial work to make it behave the same as Python do: >>> np.__version__
'1.19.2'
>>> a = 1.0
>>> f"{a:.3g}"
'1'
>>> a = np.float_(1.0)
>>> np.array2string(a, precision=3)
'1.' We would need to use the I think what Python does is fine because it reminds you that the number you are working with is not exact if you specify 'g': >>> np.__version__
'1.19.2'
>>> a = np.float32(101.1)
>>> f"{a:.5}"
'101.1'
>>> f"{a:.5g}"
'101.1'
>>> f"{a:.6}"
'101.1'
>>> f"{a:.6g}"
'101.1'
>>> f"{a:.6f}"
'101.099998' but it is fine to give some "syntactic sugar" when using bare >>> np.__version__
'1.22.0.dev0+729.g5f3dadd94'
>>> a = np.float32(101.1)
>>> str(a)
'101.1' that is what I only binded
|
a2bbce1
to
13181ea
Compare
Last comment from #5543 was:
Did something along those lines ever materialize? One thing that surprised me in the example is that the formatted string representation of arrays had no commas between elements. I'm assuming that the discussion of this sort of choices with default formatting was one of the reasons @charris suggested a NEP? |
I discussed with @seberg about maybe also trying to make the default behavior of |
Also to float it: There was an older proposal here: https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069 and a brief discussion about it: https://mail.python.org/archives/list/numpy-discussion@python.org/message/4RYDHI3Y7D5OYKJODZQ57FYVXF3LBCTQ/ |
@scratchmex before floating this to the list, this code uses just the normal I am asking, because I think the scalar formatting for Assuming this is the case (and I think this is), the next thing is indeed to just make a call on whether we want to allow Assuming the scalars are really broken, it would be nice to fix that up also. |
If the dim is 0 (NumPy scalar) then it converts the type to Python scalar and the calls |
@seberg ping |
Close/reopen |
What still needs to be done here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small comments. I like the start here, although unfortunately, I think there is a bigger overhaul necessary for formatting (not just this path though!).
That is also related to my push to change the representation of scalars probably...
For now, there is the slight oddity that the scalar may behave differently from the array code for float16, float32, or longdouble. (and the complex codes I guess).
I am fine with the current oddities though and I don't think adding this here makes revising the machinery much harder.
numpy/core/arrayprint.py
Outdated
# TODO: implement code in `FloatingFormat` such that | ||
# the tests in test_multiarray.py:test_general_format pass | ||
# note: remember to add the tests to `test_arrayprint.py:test_type` | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we either implement this or just make it an error here? Ignoring the format code in this path seems not helpful?
Co-authored-by: Matti Picus <matti.picus@gmail.com>
): | ||
format(v, "+.2f") | ||
|
||
# TODO: this behaviour should not occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: This is not behavior that was changed and the array code path rejects the format spec for non 0-D arrays (we drop through for 0-D).
(Also auto-reformatted some whitespace...)
6bfc36c
to
9fdb396
Compare
OK, I have now added a commit to simply reject |
One thing I noticed is that Python rejects the OTOH, the approach here simply ignore it for integers. That seems OK, or should this be discussed? |
I think it should be an error |
@scratchmex We have discussed your PR at today's triage meeting. Please proceed with adding an error. |
@InessaPawson @seberg done. Last ping to hope this can be merged with this. Thanks for the feedback |
How happy are others with merging something that may still be a bit in flux? I am probing things a bit more, and a few things I noticed which I had not noticed previously:
I guess the question is largely how close we want to stick to the Python formatting here. And if we want to stick closely to it, maybe we really need something like an hypothesis test to ensure we actually match up. (I would like to push the formatting further down, but that is not important for the PR as such.) |
The original use case did refer to float, so maybe restricting the scope to float formatting is OK. Other than that, I think we should just go ahead, call this "experimental" and wait for people to try it out and suggest improvements. |
Might fix the two long lines. |
I would much rather reduce scope (e.g., by issuing errors for some dtypes) than introduce behavior that we will almost certainly want to change later. Users have strong expectations of API stability for NumPy. If we want to merge this in an experimental state, we could keep it off by default and only enable it when users explicitly set an experimental flag or context manager. |
__all__ = ["array2string", "array_str", "array_repr", "array_format", | ||
"set_string_function", "set_printoptions", "get_printoptions", | ||
"printoptions", "format_float_positional", | ||
"format_float_scientific"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to expose array_format
here? You can still import it directly without exposing it in __all__
.
@@ -57,13 +59,18 @@ | ||
'infstr': 'inf', | ||
'sign': '-', | ||
'formatter': None, | ||
'exp_format': False, # force exp formatting as Python do when ".2e" | ||
'trim': '.', # Controls post-processing trimming of trailing digits | ||
# see Dragon4 arguments at `format_float_scientific` below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is trim
necessary? What was wrong with the previous way it was set?
@@ -198,6 +212,11 @@ def set_printoptions(precision=None, threshold=None, edgeitems=None, | ||
but if every element in the array can be uniquely | ||
represented with an equal number of fewer digits, use that | ||
many digits for all elements. | ||
exp_format : bool, optional | ||
Prints in scientific notation (1.1e+01). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prints in scientific notation (1.1e+01). | |
-- versionadded:: 1.24.0 | |
Prints in scientific notation (1.1e+01). |
exp_format : bool, optional | ||
Prints in scientific notation (1.1e+01). | ||
trim : str, optional | ||
Controls post-processing trimming of trailing digits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controls post-processing trimming of trailing digits. | |
-- versionadded:: 1.24.0 | |
Controls post-processing trimming of trailing digits. |
@@ -198,6 +212,11 @@ def set_printoptions(precision=None, threshold=None, edgeitems=None, | ||
but if every element in the array can be uniquely | ||
represented with an equal number of fewer digits, use that | ||
many digits for all elements. | ||
exp_format : bool, optional | ||
Prints in scientific notation (1.1e+01). | ||
trim : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trim : str, optional | |
trim : one of 'k.0-', optional |
Prints in scientific notation (1.1e+01). | ||
trim : str, optional | ||
Controls post-processing trimming of trailing digits. | ||
See Dragon4 arguments at ``format_float_scientific``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Dragon4 arguments at ``format_float_scientific``. | |
See ``trim`` argument to `numpy.format_float_scientific`. |
Ping @scratchmex |
The reviewed things is the only thing missing? I will complete them next week |
This is an initial proof of concept for #5543. Suggestions are welcome.
Basically what we want is: