Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 7e3d558

Browse filesBrowse files
sebergcharris
authored andcommitted
BUG: Fix rounding of denormals in double and float to half casts … (#12722)
* BUG: Fix rounding of denormals in double and float to half casts Previously the significand was shifted right to align denormals of different magnitude. This loses some bits that can make a difference for rounding. This is fixed: 1. For floats, by inspecting the original last bits when this may make a difference (should happen rarely) 2. For doubles by shifting the bits left to align the denromals and thus not lose the lowest orginal bits. * TST: Test half denormal rounding The test assumes that half to even is active, if this is ever changed or allowed to change, the test will fail. * Fixup: Fixup for halffloat.c The one code path cannot be used. The other must have been a typo, but is a valid bug, a new test for it in the next commit. * TST: half casting lower bits are not lost for denormal results The first test only tested the off by one, this one specifically tests that all bits are used to decide if "round to nearest even" should be used, in the example of rounding towards 0. * TST: Test not just denormals but all positive finite float16s * DOC: Add lots of comments and add a short release note.
1 parent a5cace4 commit 7e3d558
Copy full SHA for 7e3d558

File tree

3 files changed

+110
-7
lines changed
Filter options

3 files changed

+110
-7
lines changed

‎doc/release/1.17.0-notes.rst

Copy file name to clipboardExpand all lines: doc/release/1.17.0-notes.rst
+7Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ Expired deprecations
3030
Compatibility notes
3131
===================
3232

33+
float16 subnormal rounding
34+
--------------------------
35+
Casting from a different floating point precision to float16 used incorrect
36+
rounding in some edge cases. This means in rare cases, subnormal results will
37+
now be rounded up instead of down, changing the last bit (ULP) of the result.
38+
39+
3340

3441
C API changes
3542
=============

‎numpy/core/src/npymath/halffloat.c

Copy file name to clipboardExpand all lines: numpy/core/src/npymath/halffloat.c
+24-7Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -301,15 +301,23 @@ npy_uint16 npy_floatbits_to_halfbits(npy_uint32 f)
301301
npy_set_floatstatus_underflow();
302302
}
303303
#endif
304+
/*
305+
* Usually the significand is shifted by 13. For subnormals an
306+
* additional shift needs to occur. This shift is one for the largest
307+
* exponent giving a subnormal `f_exp = 0x38000000 >> 23 = 112`, which
308+
* offsets the new first bit. At most the shift can be 1+10 bits.
309+
*/
304310
f_sig >>= (113 - f_exp);
305311
/* Handle rounding by adding 1 to the bit beyond half precision */
306312
#if NPY_HALF_ROUND_TIES_TO_EVEN
307313
/*
308314
* If the last bit in the half significand is 0 (already even), and
309315
* the remaining bit pattern is 1000...0, then we do not add one
310-
* to the bit after the half significand. In all other cases, we do.
316+
* to the bit after the half significand. However, the (113 - f_exp)
317+
* shift can lose up to 11 bits, so the || checks them in the original.
318+
* In all other cases, we can just add one.
311319
*/
312-
if ((f_sig&0x00003fffu) != 0x00001000u) {
320+
if (((f_sig&0x00003fffu) != 0x00001000u) || (f&0x000007ffu)) {
313321
f_sig += 0x00001000u;
314322
}
315323
#else
@@ -416,21 +424,30 @@ npy_uint16 npy_doublebits_to_halfbits(npy_uint64 d)
416424
npy_set_floatstatus_underflow();
417425
}
418426
#endif
419-
d_sig >>= (1009 - d_exp);
427+
/*
428+
* Unlike floats, doubles have enough room to shift left to align
429+
* the subnormal significand leading to no loss of the last bits.
430+
* The smallest possible exponent giving a subnormal is:
431+
* `d_exp = 0x3e60000000000000 >> 52 = 998`. All larger subnormals are
432+
* shifted with respect to it. This adds a shift of 10+1 bits the final
433+
* right shift when comparing it to the one in the normal branch.
434+
*/
435+
assert(d_exp - 998 >= 0);
436+
d_sig <<= (d_exp - 998);
420437
/* Handle rounding by adding 1 to the bit beyond half precision */
421438
#if NPY_HALF_ROUND_TIES_TO_EVEN
422439
/*
423440
* If the last bit in the half significand is 0 (already even), and
424441
* the remaining bit pattern is 1000...0, then we do not add one
425442
* to the bit after the half significand. In all other cases, we do.
426443
*/
427-
if ((d_sig&0x000007ffffffffffULL) != 0x0000020000000000ULL) {
428-
d_sig += 0x0000020000000000ULL;
444+
if ((d_sig&0x003fffffffffffffULL) != 0x0010000000000000ULL) {
445+
d_sig += 0x0010000000000000ULL;
429446
}
430447
#else
431-
d_sig += 0x0000020000000000ULL;
448+
d_sig += 0x0010000000000000ULL;
432449
#endif
433-
h_sig = (npy_uint16) (d_sig >> 42);
450+
h_sig = (npy_uint16) (d_sig >> 53);
434451
/*
435452
* If the rounding causes a bit to spill into h_exp, it will
436453
* increment h_exp from zero to one and h_sig will be zero.

‎numpy/core/tests/test_half.py

Copy file name to clipboardExpand all lines: numpy/core/tests/test_half.py
+79Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,85 @@ def test_half_conversions(self):
6969
j = np.array(i_f16, dtype=int)
7070
assert_equal(i_int, j)
7171

72+
@pytest.mark.parametrize("offset", [None, "up", "down"])
73+
@pytest.mark.parametrize("shift", [None, "up", "down"])
74+
@pytest.mark.parametrize("float_t", [np.float32, np.float64])
75+
def test_half_conversion_rounding(self, float_t, shift, offset):
76+
# Assumes that round to even is used during casting.
77+
max_pattern = np.float16(np.finfo(np.float16).max).view(np.uint16)
78+
79+
# Test all (positive) finite numbers, denormals are most interesting
80+
# however:
81+
f16s_patterns = np.arange(0, max_pattern+1, dtype=np.uint16)
82+
f16s_float = f16s_patterns.view(np.float16).astype(float_t)
83+
84+
# Shift the values by half a bit up or a down (or do not shift),
85+
if shift == "up":
86+
f16s_float = 0.5 * (f16s_float[:-1] + f16s_float[1:])[1:]
87+
elif shift == "down":
88+
f16s_float = 0.5 * (f16s_float[:-1] + f16s_float[1:])[:-1]
89+
else:
90+
f16s_float = f16s_float[1:-1]
91+
92+
# Increase the float by a minimal value:
93+
if offset == "up":
94+
f16s_float = np.nextafter(f16s_float, float_t(1e50))
95+
elif offset == "down":
96+
f16s_float = np.nextafter(f16s_float, float_t(-1e50))
97+
98+
# Convert back to float16 and its bit pattern:
99+
res_patterns = f16s_float.astype(np.float16).view(np.uint16)
100+
101+
# The above calculations tries the original values, or the exact
102+
# mid points between the float16 values. It then further offsets them
103+
# by as little as possible. If no offset occurs, "round to even"
104+
# logic will be necessary, an arbitrarily small offset should cause
105+
# normal up/down rounding always.
106+
107+
# Calculate the expecte pattern:
108+
cmp_patterns = f16s_patterns[1:-1].copy()
109+
110+
if shift == "down" and offset != "up":
111+
shift_pattern = -1
112+
elif shift == "up" and offset != "down":
113+
shift_pattern = 1
114+
else:
115+
# There cannot be a shift, either shift is None, so all rounding
116+
# will go back to original, or shift is reduced by offset too much.
117+
shift_pattern = 0
118+
119+
# If rounding occurs, is it normal rounding or round to even?
120+
if offset is None:
121+
# Round to even occurs, modify only non-even, cast to allow + (-1)
122+
cmp_patterns[0::2].view(np.int16)[...] += shift_pattern
123+
else:
124+
cmp_patterns.view(np.int16)[...] += shift_pattern
125+
126+
assert_equal(res_patterns, cmp_patterns)
127+
128+
@pytest.mark.parametrize(["float_t", "uint_t", "bits"],
129+
[(np.float32, np.uint32, 23),
130+
(np.float64, np.uint64, 52)])
131+
def test_half_conversion_denormal_round_even(self, float_t, uint_t, bits):
132+
# Test specifically that all bits are considered when deciding
133+
# whether round to even should occur (i.e. no bits are lost at the
134+
# end. Compare also gh-12721. The most bits can get lost for the
135+
# smallest denormal:
136+
smallest_value = np.uint16(1).view(np.float16).astype(float_t)
137+
assert smallest_value == 2**-24
138+
139+
# Will be rounded to zero based on round to even rule:
140+
rounded_to_zero = smallest_value / float_t(2)
141+
assert rounded_to_zero.astype(np.float16) == 0
142+
143+
# The significand will be all 0 for the float_t, test that we do not
144+
# lose the lower ones of these:
145+
for i in range(bits):
146+
# slightly increasing the value should make it round up:
147+
larger_pattern = rounded_to_zero.view(uint_t) | uint_t(1 << i)
148+
larger_value = larger_pattern.view(float_t)
149+
assert larger_value.astype(np.float16) == smallest_value
150+
72151
def test_nans_infs(self):
73152
with np.errstate(all='ignore'):
74153
# Check some of the ufuncs

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.