Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Use Dragon4 algorithm to print floating values #9941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 5, 2017

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Oct 29, 2017

This is a C port of Ryan Juckett's C++ implementation (link) of the Dragon4 algorithm for printing floating-point values to numpy. See earlier discussion in #9919 and #9932, which are both replaced by this PR. Please check the license at the top of the dragon4.c file to confirm it is suitable.

This is a WIP and I am still deciding some of the behavior, but it is in "working" state. I am putting it up now to run tests, and if anyone wants to try it out. Here's a quick summary:

  • Adds a python interface to the Dragon4 algorithms in np.core.multiarray.dragon4_positional and np.core.multiarray.dragon4_scientific. Documentation of arguments is in the C code, but you can call it with a single python floating scalar argument to try it out.
  • Numpy scalars use the dragon4 algorithm in "unique" mode (see below) for str/repr, in a way that tries to match python float output. Note: one possible bugfix is that the scientific mode cutoff for all floats is now 1e16, instead of 1e10 for float32 and 1e17 for float64 like before. CPython's cutoff is 1e16. This also fixes complex reprs like in ENH: fix 0d array printing using str or formatter. #9332.
  • updates arrayprint.py code to use dragon4. I also added a new option, that you can set np.printoptions(precision='unique'), which will print the minimum number of digits to uniquely specify the value. I left the default at precision=8 for now.
  • In arrayprint.py I've also stolen/reworked the FormatFloat generalizations from WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919. Now all floating types just use a single FloatingFormat formatter, which calls the dragon4 routines.

Some examples that you can compare to the old behavior:

print(complex(0,np.inf), np.complex128(complex(0,np.inf)), str(np.array([np.complex128(complex(0,np.inf))])))
np.set_printoptions(precision='unique')
print(np.arange(10, dtype='f4')/10.)
a = np.random.rand(4)
print(repr(a))
print(repr(a.astype('f4')))
print(repr(a.astype('f2')))
print(repr(10.**np.arange(20)))

I still need to do some cleanups and tests. Also, there may be a bug in the original implementation because np.float64(1e23) isn't nicely rounded, unlike the other powers of 10.

@ahaldane
Copy link
Member Author

Output of the code in my last comment, with this PR:

(infj, infj, '[ 0.+infj]')
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
array([0.41337804458542926, 0.9634011951702381 , 0.38924457984371086,
       0.6889281918519422 ])
array([0.41337803, 0.9634012 , 0.3892446 , 0.6889282 ], dtype=float32)
array([0.4133, 0.9634, 0.3892, 0.689 ], dtype=float16)
array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07,
       1.e+08, 1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15,
       1.e+16, 1.e+17, 1.e+18, 1.e+19])

Output with master:

(infj, inf*j, '[ 0.+infj]')
[ 0.          0.1         0.2         0.30000001  0.40000001  0.5
  0.60000002  0.69999999  0.80000001  0.89999998]
array([ 0.5864939 ,  0.15714977,  0.26403041,  0.7634838 ])
array([ 0.58649391,  0.15714976,  0.2640304 ,  0.76348382], dtype=float32)
array([ 0.58642578,  0.15710449,  0.26391602,  0.76367188], dtype=float16)
array([  1.00000000e+00,   1.00000000e+01,   1.00000000e+02,
         1.00000000e+03,   1.00000000e+04,   1.00000000e+05,
         1.00000000e+06,   1.00000000e+07,   1.00000000e+08,
         1.00000000e+09,   1.00000000e+10,   1.00000000e+11,
         1.00000000e+12,   1.00000000e+13,   1.00000000e+14,
         1.00000000e+15,   1.00000000e+16,   1.00000000e+17,
         1.00000000e+18,   1.00000000e+19])

@ahaldane ahaldane force-pushed the dragon4 branch 2 times, most recently from dd3b272 to a96bc04 Compare October 30, 2017 01:17
@eric-wieser
Copy link
Member

Out of curiosity - what keeps us from directly using the C++ version?

@eric-wieser
Copy link
Member

eric-wieser commented Oct 30, 2017

The spaces in

array([0.41337803, 0.9634012 , 0.3892446 , 0.6889282 ], dtype=float32)

look pretty bad to me, but I understand the rationale. I'd perhaps argue for zero padding to make them all the same length. Otherwise it causes confusion for users comparing the last digits.

What does this give?

a = 10.**np.arange(20)
a[0] = 123456

@ahaldane
Copy link
Member Author

Re: the spacing in the float32 array: That is behavior is still to be decided, now that we have more choices due to the existence of a dragon4 interface. Note that example only gets output for precision='unique', but we don't need to make that the default. (In fact, it is not the default in this PR... note I manually set it in my example).

Also, here is what this PR currently outputs by default for your code:

[1]: a = 10.**np.arange(20)
[2]: a[0] = 123456
[3]: a
array([1.23456e+05, 1.00000e+01, 1.00000e+02, 1.00000e+03, 1.00000e+04,
       1.00000e+05, 1.00000e+06, 1.00000e+07, 1.00000e+08, 1.00000e+09,
       1.00000e+10, 1.00000e+11, 1.00000e+12, 1.00000e+13, 1.00000e+14,
       1.00000e+15, 1.00000e+16, 1.00000e+17, 1.00000e+18, 1.00000e+19])

@ahaldane
Copy link
Member Author

ahaldane commented Oct 30, 2017

Also, another comment on the spacing in unique mode:

We already get funny spacing like that in current numpy master:

[31]: array([0.32254803, 0.8584671, 0.07836795, 0.2622], dtype=float32)
array([ 0.32254803,  0.8584671 ,  0.07836795,  0.2622    ], dtype=float32)

(in fact you can also see this in the "master" output I pasted above)

@ahaldane
Copy link
Member Author

ahaldane commented Oct 30, 2017

As for leaving it in C++, I honestly didn't consider that carefully since I assumed it would be harder/more confusing to try to interface to it than to just port to C. Also, while I didn't modify the dragon4 code itself much, I did pretty heavily rewrite the "wrapper" part of the code that converts dragon4 output to a string.

@ahaldane
Copy link
Member Author

ahaldane commented Oct 30, 2017

More thoughts on spacing: I don't think padding with 0s is correct, since the "unnecessary" digits which were dropped are not necessarily 0.

To avoid the trailing spaces, probably we could do as follows: In the first pass in FloatingFormat, figure out the maximum fractional length of the data, when output with all the trimming. Then, set the precision to that value in the second pass, and don't do any trimming in the second pass (trim-mode 'k'), thus keeping the extra digits in the values that were a bit shorter in the first pass.

@ahaldane ahaldane force-pushed the dragon4 branch 3 times, most recently from ef7cf07 to b133177 Compare October 30, 2017 04:10
@mdickinson
Copy link
Contributor

Presumably this would fix #9360? (And also close #2643 and #6136.)

@charris
Copy link
Member

charris commented Oct 30, 2017

The test failures are due to ieee extended precision on 32 linux being aligned on 4 byte boundaries, hence float96 instead of float128.

@ahaldane
Copy link
Member Author

Yes, I thought I accounted for that with the #ifdef NPY_FLOAT96, but apparently that macro-constant isn't defined...

@ahaldane
Copy link
Member Author

ahaldane commented Oct 30, 2017

Also I figured out the 1e23 strangeness: It is because the implementation currently doesn't use IEEE unbiased rounding. It should be easy to modify it to do so.

Also, in principle it is possible to change the rounding mode, see GNU libc's discussion (link). However, I think CPython has forced/assumed the rounding mode to be IEEE unbiased, so I think it is safe to assume the same for numpy.


typedef enum TrimMode
{
TrimMode_None, // don't trim zeros, always leave a decimal point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should convert these to C style comments.

@charris
Copy link
Member

charris commented Oct 30, 2017

Yes, I thought I accounted for that with the #ifdef NPY_FLOAT96, but apparently that macro-constant isn't defined...

Missing include statement. Need #include <numpy/npy_common.h>, maybe something in setup also.

@mdickinson
Copy link
Contributor

CPython has forced/assumed the rounding mode to be IEEE unbiased

Assumed, rather than forced. The dtoa.c code is a bit of a mix: some parts (mainly the integer arithmetic parts) explicitly construct the round-ties-to-even results, without help from the FPU. Others (the fast paths that use floating-point), assume that the FPU control word remains at its default of round-ties-to-even. If the FPU rounding mode were set to something else, it's probable that Python's float repr would give wrong results in some corner cases, and conceivable that it would do something even more horrible, like entering an infinite loop. On the one hand, that's probably bad, and CPython should (but doesn't) make an effort at least to check that the FPU control word is doing the right thing. On the other hand, it doesn't seem to have caused any error reports to date, and the code has been in use for a good number of years now.

So yes, while in principle assuming that the x87/SSE2 rounding mode settings will be the right ones makes me a bit nervous, in practice it appears to be a fairly safe assumption.

Also, the dtoa.c code is written in such a way that infinite loops could happen (at one point I started refactoring to eliminate the potential for infinite loops, but that project never got completed); I'd hope that the Dragon4 code is a bit better behaved in that respect.

[Full disclosure: I was responsible, with Eric Smith, for implementing the CPython shortest repr implementation, though of course the original core dtoa.c code comes from David Gay.]

@ahaldane
Copy link
Member Author

@mdickinson thanks for the links to the issues.

Assuming IEEE unbiased rounding in this PR is "safe" in the sense that it won't crash. I think the worst case scenario is that we spit out floating point strings that don't covert back to their original values, eg in the case of 1e23. (By the way, I am discussing the rounding mode issues with Ryan Juckett over on his website, linked above).

I don't really want to go through the effort of accounting for all the rounding modes here. But at least maybe I could add a check to fegetround during numpy startup that gives a serious warning if the rounding mode isn't unbiased.

@ahaldane ahaldane force-pushed the dragon4 branch 2 times, most recently from 0901003 to eae2a80 Compare October 31, 2017 04:12
@ahaldane
Copy link
Member Author

Updated

  • Implemented IEEE unbiased rounding
  • fixed a possible bug in the Dragon4 implementation
  • did a pass at reformatting everything to match numpy style (including C comments)

When I come back to this I'm going to work on an arrayprint.py interface and on choosing the right defaults. I'm going to try adding afloatmode printing option, which controls controls the interpretation of the precision printing option. The doc would be:

    floatmode : str, optional
        Controls the interpretation of the `precision` option for
        floating-point types. Can take the following values:
            - 'fixed' : Always print exactly `precision` fractional digits,
                    even if this would print more or fewer digits than
                    necessary to specify the value uniquely.
            - 'unique : Print the minimum number of fractional digits necessary
                    to represent each value uniquely. Different elements may
                    have a different number of digits.  The value of the
                    `precision` option is ignored.
            - 'maxprec' : Print at most `precision` fractional digits, but if
                    an element can be uniquely represented with fewer digits
                    only print it with that many.
            - 'maxprec_equal' : Print at most `precision` fractional digits,
                    but if every element in the array can be uniquely
                    represented with an equal number of fewer digits (or less),
                    use that many digits for all elements.
        Default is 'maxprec'

@charris
Copy link
Member

charris commented Nov 5, 2017

I think you need a separate closes for each issue.

@charris
Copy link
Member

charris commented Nov 5, 2017

The rounding still worries me and might be a problem. I suppose we need to rely on the tests to check if that is a problem. Also, out of curiosity, what were the python (hence compiler) versions that had trouble with 1e23?

@@ -463,6 +483,23 @@ def array2string(a, max_line_width=None, precision=None,
(whitespace character) in the sign position of positive values. If
'-', omit the sign character of positive values. If 'legacy', print a
space for positive values except in 0d arrays.
floatmode : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we don't want float_mode for consistency with suppress_small?

return format_longfloat(x, self.precision)


class LongComplexFormat(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to keep this for compatibility, as a subclass that does nothing but throw a warning in it's constructor.

@@ -597,8 +634,8 @@ def _formatArray(a, format_function, rank, max_line_len,
summary_insert).rstrip()+']\n'
return s

class FloatFormat(object):
def __init__(self, data, precision, suppress_small, sign=False):
class FloatingFormat(object):
Copy link
Member

@eric-wieser eric-wieser Nov 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about leaving a public FloatFormat object visible, assuming we're considering this public API.

Docstring saying For np.floating objects or something would be nice too

# XXX Have to add something to determine the width to use a la FloatFormat
# Right now, things won't line up properly
def __init__(self, precision, sign=False):
class ComplexFormat(object):
Copy link
Member

@eric-wieser eric-wieser Nov 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ComplexFloatingFormat to match np.complexfloating? (#9962)

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really pleased to see all the arrayprint formatters merge together, but a couple nits about names and compatibility in arrayprint, which we should probably look at before 1.14

*/

#define PREC @NAME@PREC_@KIND@
static PyObject *
@name@type_@kind@_either(npy_@name@ val, TrimMode trim_pos, TrimMode trim_sci,
Copy link
Member

@eric-wieser eric-wieser Nov 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a comment explaining what this function is for - "either" isn't super clear

static int
c@name@type_print(PyObject *v, FILE *fp, int flags)
static PyObject *
halftype_@kind@(PyObject *self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special compared to the other ones? Why?

elif issubclass(dtypeobj, _nt.complexfloating):
if issubclass(dtypeobj, _nt.clongfloat):
return formatdict['longcomplexfloat']()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that users setting a longcomplexfloat formatter will now see it being ignored silently

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 5, 2017
@eric-wieser
Copy link
Member

Deleted my comments about whitespace, and just added a STY commit to #9962, to reduce noise here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.