Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit dc787d2

Browse filesBrowse files
committed
Issue #8188: Introduce a new scheme for computing hashes of numbers
(instances of int, float, complex, decimal.Decimal and fractions.Fraction) that makes it easy to maintain the invariant that hash(x) == hash(y) whenever x and y have equal value.
1 parent 0372113 commit dc787d2
Copy full SHA for dc787d2

File tree

Expand file treeCollapse file tree

14 files changed

+569
-140
lines changed
Filter options
Expand file treeCollapse file tree

14 files changed

+569
-140
lines changed

‎Doc/library/stdtypes.rst

Copy file name to clipboardExpand all lines: Doc/library/stdtypes.rst
+103Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,109 @@ hexadecimal string representing the same number::
595595
'0x1.d380000000000p+11'
596596

597597

598+
.. _numeric-hash:
599+
600+
Hashing of numeric types
601+
------------------------
602+
603+
For numbers ``x`` and ``y``, possibly of different types, it's a requirement
604+
that ``hash(x) == hash(y)`` whenever ``x == y`` (see the :meth:`__hash__`
605+
method documentation for more details). For ease of implementation and
606+
efficiency across a variety of numeric types (including :class:`int`,
607+
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`)
608+
Python's hash for numeric types is based on a single mathematical function
609+
that's defined for any rational number, and hence applies to all instances of
610+
:class:`int` and :class:`fraction.Fraction`, and all finite instances of
611+
:class:`float` and :class:`decimal.Decimal`. Essentially, this function is
612+
given by reduction modulo ``P`` for a fixed prime ``P``. The value of ``P`` is
613+
made available to Python as the :attr:`modulus` attribute of
614+
:data:`sys.hash_info`.
615+
616+
.. impl-detail::
617+
618+
Currently, the prime used is ``P = 2**31 - 1`` on machines with 32-bit C
619+
longs and ``P = 2**61 - 1`` on machines with 64-bit C longs.
620+
621+
Here are the rules in detail:
622+
623+
- If ``x = m / n`` is a nonnegative rational number and ``n`` is not divisible
624+
by ``P``, define ``hash(x)`` as ``m * invmod(n, P) % P``, where ``invmod(n,
625+
P)`` gives the inverse of ``n`` modulo ``P``.
626+
627+
- If ``x = m / n`` is a nonnegative rational number and ``n`` is
628+
divisible by ``P`` (but ``m`` is not) then ``n`` has no inverse
629+
modulo ``P`` and the rule above doesn't apply; in this case define
630+
``hash(x)`` to be the constant value ``sys.hash_info.inf``.
631+
632+
- If ``x = m / n`` is a negative rational number define ``hash(x)``
633+
as ``-hash(-x)``. If the resulting hash is ``-1``, replace it with
634+
``-2``.
635+
636+
- The particular values ``sys.hash_info.inf``, ``-sys.hash_info.inf``
637+
and ``sys.hash_info.nan`` are used as hash values for positive
638+
infinity, negative infinity, or nans (respectively). (All hashable
639+
nans have the same hash value.)
640+
641+
- For a :class:`complex` number ``z``, the hash values of the real
642+
and imaginary parts are combined by computing ``hash(z.real) +
643+
sys.hash_info.imag * hash(z.imag)``, reduced modulo
644+
``2**sys.hash_info.width`` so that it lies in
645+
``range(-2**(sys.hash_info.width - 1), 2**(sys.hash_info.width -
646+
1))``. Again, if the result is ``-1``, it's replaced with ``-2``.
647+
648+
649+
To clarify the above rules, here's some example Python code,
650+
equivalent to the builtin hash, for computing the hash of a rational
651+
number, :class:`float`, or :class:`complex`::
652+
653+
654+
import sys, math
655+
656+
def hash_fraction(m, n):
657+
"""Compute the hash of a rational number m / n.
658+
659+
Assumes m and n are integers, with n positive.
660+
Equivalent to hash(fractions.Fraction(m, n)).
661+
662+
"""
663+
P = sys.hash_info.modulus
664+
# Remove common factors of P. (Unnecessary if m and n already coprime.)
665+
while m % P == n % P == 0:
666+
m, n = m // P, n // P
667+
668+
if n % P == 0:
669+
hash_ = sys.hash_info.inf
670+
else:
671+
# Fermat's Little Theorem: pow(n, P-1, P) is 1, so
672+
# pow(n, P-2, P) gives the inverse of n modulo P.
673+
hash_ = (abs(m) % P) * pow(n, P - 2, P) % P
674+
if m < 0:
675+
hash_ = -hash_
676+
if hash_ == -1:
677+
hash_ = -2
678+
return hash_
679+
680+
def hash_float(x):
681+
"""Compute the hash of a float x."""
682+
683+
if math.isnan(x):
684+
return sys.hash_info.nan
685+
elif math.isinf(x):
686+
return sys.hash_info.inf if x > 0 else -sys.hash_info.inf
687+
else:
688+
return hash_fraction(*x.as_integer_ratio())
689+
690+
def hash_complex(z):
691+
"""Compute the hash of a complex number z."""
692+
693+
hash_ = hash_float(z.real) + sys.hash_info.imag * hash_float(z.imag)
694+
# do a signed reduction modulo 2**sys.hash_info.width
695+
M = 2**(sys.hash_info.width - 1)
696+
hash_ = (hash_ & (M - 1)) - (hash & M)
697+
if hash_ == -1:
698+
hash_ == -2
699+
return hash_
700+
598701
.. _typeiter:
599702

600703
Iterator Types

‎Doc/library/sys.rst

Copy file name to clipboardExpand all lines: Doc/library/sys.rst
+24Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -446,6 +446,30 @@ always available.
446446
Changed to a named tuple and added *service_pack_minor*,
447447
*service_pack_major*, *suite_mask*, and *product_type*.
448448

449+
450+
.. data:: hash_info
451+
452+
A structseq giving parameters of the numeric hash implementation. For
453+
more details about hashing of numeric types, see :ref:`numeric-hash`.
454+
455+
+---------------------+--------------------------------------------------+
456+
| attribute | explanation |
457+
+=====================+==================================================+
458+
| :const:`width` | width in bits used for hash values |
459+
+---------------------+--------------------------------------------------+
460+
| :const:`modulus` | prime modulus P used for numeric hash scheme |
461+
+---------------------+--------------------------------------------------+
462+
| :const:`inf` | hash value returned for a positive infinity |
463+
+---------------------+--------------------------------------------------+
464+
| :const:`nan` | hash value returned for a nan |
465+
+---------------------+--------------------------------------------------+
466+
| :const:`imag` | multiplier used for the imaginary part of a |
467+
| | complex number |
468+
+---------------------+--------------------------------------------------+
469+
470+
.. versionadded:: 3.2
471+
472+
449473
.. data:: hexversion
450474

451475
The version number encoded as a single integer. This is guaranteed to increase

‎Include/pyport.h

Copy file name to clipboardExpand all lines: Include/pyport.h
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,20 @@ Used in: PY_LONG_LONG
126126
#endif
127127
#endif
128128

129+
/* Parameters used for the numeric hash implementation. See notes for
130+
_PyHash_Double in Objects/object.c. Numeric hashes are based on
131+
reduction modulo the prime 2**_PyHASH_BITS - 1. */
132+
133+
#if SIZEOF_LONG >= 8
134+
#define _PyHASH_BITS 61
135+
#else
136+
#define _PyHASH_BITS 31
137+
#endif
138+
#define _PyHASH_MODULUS ((1UL << _PyHASH_BITS) - 1)
139+
#define _PyHASH_INF 314159
140+
#define _PyHASH_NAN 0
141+
#define _PyHASH_IMAG 1000003UL
142+
129143
/* uintptr_t is the C9X name for an unsigned integral type such that a
130144
* legitimate void* can be cast to uintptr_t and then back to void* again
131145
* without loss of information. Similarly for intptr_t, wrt a signed

‎Lib/decimal.py

Copy file name to clipboardExpand all lines: Lib/decimal.py
+32-48Lines changed: 32 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -862,15 +862,15 @@ def _cmp(self, other):
862862
# that specified by IEEE 754.
863863

864864
def __eq__(self, other, context=None):
865-
other = _convert_other(other, allow_float=True)
865+
other = _convert_other(other, allow_float = True)
866866
if other is NotImplemented:
867867
return other
868868
if self._check_nans(other, context):
869869
return False
870870
return self._cmp(other) == 0
871871

872872
def __ne__(self, other, context=None):
873-
other = _convert_other(other, allow_float=True)
873+
other = _convert_other(other, allow_float = True)
874874
if other is NotImplemented:
875875
return other
876876
if self._check_nans(other, context):
@@ -879,7 +879,7 @@ def __ne__(self, other, context=None):
879879

880880

881881
def __lt__(self, other, context=None):
882-
other = _convert_other(other, allow_float=True)
882+
other = _convert_other(other, allow_float = True)
883883
if other is NotImplemented:
884884
return other
885885
ans = self._compare_check_nans(other, context)
@@ -888,7 +888,7 @@ def __lt__(self, other, context=None):
888888
return self._cmp(other) < 0
889889

890890
def __le__(self, other, context=None):
891-
other = _convert_other(other, allow_float=True)
891+
other = _convert_other(other, allow_float = True)
892892
if other is NotImplemented:
893893
return other
894894
ans = self._compare_check_nans(other, context)
@@ -897,7 +897,7 @@ def __le__(self, other, context=None):
897897
return self._cmp(other) <= 0
898898

899899
def __gt__(self, other, context=None):
900-
other = _convert_other(other, allow_float=True)
900+
other = _convert_other(other, allow_float = True)
901901
if other is NotImplemented:
902902
return other
903903
ans = self._compare_check_nans(other, context)
@@ -906,7 +906,7 @@ def __gt__(self, other, context=None):
906906
return self._cmp(other) > 0
907907

908908
def __ge__(self, other, context=None):
909-
other = _convert_other(other, allow_float=True)
909+
other = _convert_other(other, allow_float = True)
910910
if other is NotImplemented:
911911
return other
912912
ans = self._compare_check_nans(other, context)
@@ -935,55 +935,28 @@ def compare(self, other, context=None):
935935

936936
def __hash__(self):
937937
"""x.__hash__() <==> hash(x)"""
938-
# Decimal integers must hash the same as the ints
939-
#
940-
# The hash of a nonspecial noninteger Decimal must depend only
941-
# on the value of that Decimal, and not on its representation.
942-
# For example: hash(Decimal('100E-1')) == hash(Decimal('10')).
943-
944-
# Equality comparisons involving signaling nans can raise an
945-
# exception; since equality checks are implicitly and
946-
# unpredictably used when checking set and dict membership, we
947-
# prevent signaling nans from being used as set elements or
948-
# dict keys by making __hash__ raise an exception.
938+
939+
# In order to make sure that the hash of a Decimal instance
940+
# agrees with the hash of a numerically equal integer, float
941+
# or Fraction, we follow the rules for numeric hashes outlined
942+
# in the documentation. (See library docs, 'Built-in Types').
949943
if self._is_special:
950944
if self.is_snan():
951945
raise TypeError('Cannot hash a signaling NaN value.')
952946
elif self.is_nan():
953-
# 0 to match hash(float('nan'))
954-
return 0
947+
return _PyHASH_NAN
955948
else:
956-
# values chosen to match hash(float('inf')) and
957-
# hash(float('-inf')).
958949
if self._sign:
959-
return -271828
950+
return -_PyHASH_INF
960951
else:
961-
return 314159
962-
963-
# In Python 2.7, we're allowing comparisons (but not
964-
# arithmetic operations) between floats and Decimals; so if
965-
# a Decimal instance is exactly representable as a float then
966-
# its hash should match that of the float.
967-
self_as_float = float(self)
968-
if Decimal.from_float(self_as_float) == self:
969-
return hash(self_as_float)
970-
971-
if self._isinteger():
972-
op = _WorkRep(self.to_integral_value())
973-
# to make computation feasible for Decimals with large
974-
# exponent, we use the fact that hash(n) == hash(m) for
975-
# any two nonzero integers n and m such that (i) n and m
976-
# have the same sign, and (ii) n is congruent to m modulo
977-
# 2**64-1. So we can replace hash((-1)**s*c*10**e) with
978-
# hash((-1)**s*c*pow(10, e, 2**64-1).
979-
return hash((-1)**op.sign*op.int*pow(10, op.exp, 2**64-1))
980-
# The value of a nonzero nonspecial Decimal instance is
981-
# faithfully represented by the triple consisting of its sign,
982-
# its adjusted exponent, and its coefficient with trailing
983-
# zeros removed.
984-
return hash((self._sign,
985-
self._exp+len(self._int),
986-
self._int.rstrip('0')))
952+
return _PyHASH_INF
953+
954+
if self._exp >= 0:
955+
exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
956+
else:
957+
exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
958+
hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
959+
return hash_ if self >= 0 else -hash_
987960

988961
def as_tuple(self):
989962
"""Represents the number as a triple tuple.
@@ -6218,6 +6191,17 @@ def _format_number(is_negative, intpart, fracpart, exp, spec):
62186191
# _SignedInfinity[sign] is infinity w/ that sign
62196192
_SignedInfinity = (_Infinity, _NegativeInfinity)
62206193

6194+
# Constants related to the hash implementation; hash(x) is based
6195+
# on the reduction of x modulo _PyHASH_MODULUS
6196+
import sys
6197+
_PyHASH_MODULUS = sys.hash_info.modulus
6198+
# hash values to use for positive and negative infinities, and nans
6199+
_PyHASH_INF = sys.hash_info.inf
6200+
_PyHASH_NAN = sys.hash_info.nan
6201+
del sys
6202+
6203+
# _PyHASH_10INV is the inverse of 10 modulo the prime _PyHASH_MODULUS
6204+
_PyHASH_10INV = pow(10, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
62216205

62226206

62236207
if __name__ == '__main__':

‎Lib/fractions.py

Copy file name to clipboardExpand all lines: Lib/fractions.py
+22-9Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import numbers
99
import operator
1010
import re
11+
import sys
1112

1213
__all__ = ['Fraction', 'gcd']
1314

@@ -23,6 +24,12 @@ def gcd(a, b):
2324
a, b = b, a%b
2425
return a
2526

27+
# Constants related to the hash implementation; hash(x) is based
28+
# on the reduction of x modulo the prime _PyHASH_MODULUS.
29+
_PyHASH_MODULUS = sys.hash_info.modulus
30+
# Value to be used for rationals that reduce to infinity modulo
31+
# _PyHASH_MODULUS.
32+
_PyHASH_INF = sys.hash_info.inf
2633

2734
_RATIONAL_FORMAT = re.compile(r"""
2835
\A\s* # optional whitespace at the start, then
@@ -528,16 +535,22 @@ def __hash__(self):
528535
529536
"""
530537
# XXX since this method is expensive, consider caching the result
531-
if self._denominator == 1:
532-
# Get integers right.
533-
return hash(self._numerator)
534-
# Expensive check, but definitely correct.
535-
if self == float(self):
536-
return hash(float(self))
538+
539+
# In order to make sure that the hash of a Fraction agrees
540+
# with the hash of a numerically equal integer, float or
541+
# Decimal instance, we follow the rules for numeric hashes
542+
# outlined in the documentation. (See library docs, 'Built-in
543+
# Types').
544+
545+
# dinv is the inverse of self._denominator modulo the prime
546+
# _PyHASH_MODULUS, or 0 if self._denominator is divisible by
547+
# _PyHASH_MODULUS.
548+
dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
549+
if not dinv:
550+
hash_ = _PyHASH_INF
537551
else:
538-
# Use tuple's hash to avoid a high collision rate on
539-
# simple fractions.
540-
return hash((self._numerator, self._denominator))
552+
hash_ = abs(self._numerator) * dinv % _PyHASH_MODULUS
553+
return hash_ if self >= 0 else -hash_
541554

542555
def __eq__(a, b):
543556
"""a == b"""

‎Lib/test/test_float.py

Copy file name to clipboardExpand all lines: Lib/test/test_float.py
-9Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -914,15 +914,6 @@ def notest_float_inf(self):
914914
self.assertFalse(NAN.is_inf())
915915
self.assertFalse((0.).is_inf())
916916

917-
def test_hash_inf(self):
918-
# the actual values here should be regarded as an
919-
# implementation detail, but they need to be
920-
# identical to those used in the Decimal module.
921-
self.assertEqual(hash(float('inf')), 314159)
922-
self.assertEqual(hash(float('-inf')), -271828)
923-
self.assertEqual(hash(float('nan')), 0)
924-
925-
926917
fromHex = float.fromhex
927918
toHex = float.hex
928919
class HexFloatTestCase(unittest.TestCase):

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.