Closed
Description
The smallest denorm value that can be held in a np.float16 is 2**-24. The value 2**-25 is halfway between 0 and 2**-24, but is rounded down to 0 when converted to a np.float16 because of the rule to round to the nearest even-lsb value in the case of ties. However, any value that is slightly above 2**-25 should be rounded up to 2**-24. The following interpreter sequence shows such a case that is properly handled by the float64->float16 conversion, but not by the float32->float16 conversion (see the last line below).
$ python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.14.3'
>>> '%.40f' % np.array([2**-24]).astype(np.float64)[0]
'0.0000000596046447753906250000000000000000'
>>> '%.40f' % np.array([2**-24]).astype(np.float32)[0]
'0.0000000596046447753906250000000000000000'
>>> '%.40f' % np.array([2**-24]).astype(np.float16)[0]
'0.0000000596046447753906250000000000000000'
>>> '%.40f' % np.array([2**-25]).astype(np.float64)[0]
'0.0000000298023223876953125000000000000000'
>>> '%.40f' % np.array([2**-25]).astype(np.float32)[0]
'0.0000000298023223876953125000000000000000'
>>> '%.40f' % np.array([2**-25]).astype(np.float16)[0] # OK: expected flush to 0
'0.0000000000000000000000000000000000000000'
>>> '%.40f' % np.array([2**-25 + 2**-38]).astype(np.float64)[0]
'0.0000000298059603665024042129516601562500'
>>> '%.40f' % np.array([2**-25 + 2**-38]).astype(np.float32)[0]
'0.0000000298059603665024042129516601562500'
>>> '%.40f' % np.array([2**-25 + 2**-38]).astype(np.float64).astype(np.float16)[0] # OK: 2**-24
'0.0000000596046447753906250000000000000000'
>>> '%.40f' % np.array([2**-25 + 2**-38]).astype(np.float32).astype(np.float16)[0] # not OK: should be 2**-24
'0.0000000000000000000000000000000000000000'
>>>
Metadata
Metadata
Assignees
Labels
No labels