Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

KBinsDiscretizer uniform strategy bin assignment wrong due to floating point multiplication #30924

Copy link
Copy link
Open
@bnomis

Description

@bnomis
Issue body actions

Describe the bug

KBinsDiscretizer uniform strategy uses numpy.linspace to make bin edges.

numpy.linspace works out a delta like: delta = (max - min)/num_bins

Then the bin edges are computed: delta * n

The issue is the floating point multiplication introduces noise in the low bits.

For example, consider the case of floating point sample values from zero to one and five bins. Then:

delta = 1/5 = 0.2

The right edge of bin 2 (zero indexed) should be 0.6 = 0.2 * 3 but (in my tests) it's 0.6000000000000001

Example python calculation:

>>> 1/5 * 3
0.6000000000000001

This means a sample values of 0.6 get assigned to bin 2 but it should be in bin 3

One work around is to use the fractions module or better still the decimal module. The code below demonstrates the issue

#!/usr/bin/env python
import decimal
import fractions
import sys
from typing import NoReturn


def test_float_fractions():
    # check floating point multiplication
    step = 1 / 5
    f_step = fractions.Fraction(1, 5)
    d_step = decimal.Decimal(1) / decimal.Decimal(5)

    print('float vs fractions')
    for n in range(101):
        float_value = step * n
        fraction_value = f_step * n
        fraction_float = float(fraction_value)
        if float_value != fraction_float:
            fraction_str = str(fraction_value)
            print(f'{n:2} float {float_value:20.16f} fraction {fraction_float:20.16f} {fraction_str:>5}')

    print('')
    print('float vs decimals')
    for n in range(101):
        float_value = step * n
        decimal_value = d_step * n
        if float_value != decimal_value:
            print(f'{n:2} float {float_value:23.20f}  decimal {decimal_value:23.20f}')


def main(argv) -> NoReturn:
    m = 0
    try:
        test_float_fractions()
    except Exception as e:
        print(f'Exception: {e}')
    sys.exit(m)


if __name__ == '__main__':
    main(sys.argv[1:])

Running the above yields the output below:

float vs fractions
 3 float   0.6000000000000001 fraction   0.6000000000000000   3/5
 6 float   1.2000000000000002 fraction   1.2000000000000000   6/5
 7 float   1.4000000000000001 fraction   1.3999999999999999   7/5
12 float   2.4000000000000004 fraction   2.3999999999999999  12/5
14 float   2.8000000000000003 fraction   2.7999999999999998  14/5
17 float   3.4000000000000004 fraction   3.3999999999999999  17/5
19 float   3.8000000000000003 fraction   3.7999999999999998  19/5
23 float   4.6000000000000005 fraction   4.5999999999999996  23/5
24 float   4.8000000000000007 fraction   4.7999999999999998  24/5
28 float   5.6000000000000005 fraction   5.5999999999999996  28/5
29 float   5.8000000000000007 fraction   5.7999999999999998  29/5
33 float   6.6000000000000005 fraction   6.5999999999999996  33/5
34 float   6.8000000000000007 fraction   6.7999999999999998  34/5
38 float   7.6000000000000005 fraction   7.5999999999999996  38/5
39 float   7.8000000000000007 fraction   7.7999999999999998  39/5
41 float   8.2000000000000011 fraction   8.1999999999999993  41/5
46 float   9.2000000000000011 fraction   9.1999999999999993  46/5
48 float   9.6000000000000014 fraction   9.5999999999999996  48/5
51 float  10.2000000000000011 fraction  10.1999999999999993  51/5
53 float  10.6000000000000014 fraction  10.5999999999999996  53/5
56 float  11.2000000000000011 fraction  11.1999999999999993  56/5
58 float  11.6000000000000014 fraction  11.5999999999999996  58/5
61 float  12.2000000000000011 fraction  12.1999999999999993  61/5
63 float  12.6000000000000014 fraction  12.5999999999999996  63/5
66 float  13.2000000000000011 fraction  13.1999999999999993  66/5
68 float  13.6000000000000014 fraction  13.5999999999999996  68/5
71 float  14.2000000000000011 fraction  14.1999999999999993  71/5
73 float  14.6000000000000014 fraction  14.5999999999999996  73/5
76 float  15.2000000000000011 fraction  15.1999999999999993  76/5
78 float  15.6000000000000014 fraction  15.5999999999999996  78/5
82 float  16.4000000000000021 fraction  16.3999999999999986  82/5
87 float  17.4000000000000021 fraction  17.3999999999999986  87/5
92 float  18.4000000000000021 fraction  18.3999999999999986  92/5
96 float  19.2000000000000028 fraction  19.1999999999999993  96/5
97 float  19.4000000000000021 fraction  19.3999999999999986  97/5

float vs decimals
 1 float  0.20000000000000001110  decimal  0.20000000000000000000
 2 float  0.40000000000000002220  decimal  0.40000000000000000000
 3 float  0.60000000000000008882  decimal  0.60000000000000000000
 4 float  0.80000000000000004441  decimal  0.80000000000000000000
 6 float  1.20000000000000017764  decimal  1.20000000000000000000
 7 float  1.40000000000000013323  decimal  1.40000000000000000000
 8 float  1.60000000000000008882  decimal  1.60000000000000000000
 9 float  1.80000000000000004441  decimal  1.80000000000000000000
11 float  2.20000000000000017764  decimal  2.20000000000000000000
12 float  2.40000000000000035527  decimal  2.40000000000000000000
13 float  2.60000000000000008882  decimal  2.60000000000000000000
14 float  2.80000000000000026645  decimal  2.80000000000000000000
16 float  3.20000000000000017764  decimal  3.20000000000000000000
17 float  3.40000000000000035527  decimal  3.40000000000000000000
18 float  3.60000000000000008882  decimal  3.60000000000000000000
19 float  3.80000000000000026645  decimal  3.80000000000000000000
21 float  4.20000000000000017764  decimal  4.20000000000000000000
22 float  4.40000000000000035527  decimal  4.40000000000000000000
23 float  4.60000000000000053291  decimal  4.60000000000000000000
24 float  4.80000000000000071054  decimal  4.80000000000000000000
26 float  5.20000000000000017764  decimal  5.20000000000000000000
27 float  5.40000000000000035527  decimal  5.40000000000000000000
28 float  5.60000000000000053291  decimal  5.60000000000000000000
29 float  5.80000000000000071054  decimal  5.80000000000000000000
31 float  6.20000000000000017764  decimal  6.20000000000000000000
32 float  6.40000000000000035527  decimal  6.40000000000000000000
33 float  6.60000000000000053291  decimal  6.60000000000000000000
34 float  6.80000000000000071054  decimal  6.80000000000000000000
36 float  7.20000000000000017764  decimal  7.20000000000000000000
37 float  7.40000000000000035527  decimal  7.40000000000000000000
38 float  7.60000000000000053291  decimal  7.60000000000000000000
39 float  7.80000000000000071054  decimal  7.80000000000000000000
41 float  8.20000000000000106581  decimal  8.20000000000000000000
42 float  8.40000000000000035527  decimal  8.40000000000000000000
43 float  8.59999999999999964473  decimal  8.60000000000000000000
44 float  8.80000000000000071054  decimal  8.80000000000000000000
46 float  9.20000000000000106581  decimal  9.20000000000000000000
47 float  9.40000000000000035527  decimal  9.40000000000000000000
48 float  9.60000000000000142109  decimal  9.60000000000000000000
49 float  9.80000000000000071054  decimal  9.80000000000000000000
51 float 10.20000000000000106581  decimal 10.20000000000000000000
52 float 10.40000000000000035527  decimal 10.40000000000000000000
53 float 10.60000000000000142109  decimal 10.60000000000000000000
54 float 10.80000000000000071054  decimal 10.80000000000000000000
56 float 11.20000000000000106581  decimal 11.20000000000000000000
57 float 11.40000000000000035527  decimal 11.40000000000000000000
58 float 11.60000000000000142109  decimal 11.60000000000000000000
59 float 11.80000000000000071054  decimal 11.80000000000000000000
61 float 12.20000000000000106581  decimal 12.20000000000000000000
62 float 12.40000000000000035527  decimal 12.40000000000000000000
63 float 12.60000000000000142109  decimal 12.60000000000000000000
64 float 12.80000000000000071054  decimal 12.80000000000000000000
66 float 13.20000000000000106581  decimal 13.20000000000000000000
67 float 13.40000000000000035527  decimal 13.40000000000000000000
68 float 13.60000000000000142109  decimal 13.60000000000000000000
69 float 13.80000000000000071054  decimal 13.80000000000000000000
71 float 14.20000000000000106581  decimal 14.20000000000000000000
72 float 14.40000000000000035527  decimal 14.40000000000000000000
73 float 14.60000000000000142109  decimal 14.60000000000000000000
74 float 14.80000000000000071054  decimal 14.80000000000000000000
76 float 15.20000000000000106581  decimal 15.20000000000000000000
77 float 15.40000000000000035527  decimal 15.40000000000000000000
78 float 15.60000000000000142109  decimal 15.60000000000000000000
79 float 15.80000000000000071054  decimal 15.80000000000000000000
81 float 16.19999999999999928946  decimal 16.20000000000000000000
82 float 16.40000000000000213163  decimal 16.40000000000000000000
83 float 16.60000000000000142109  decimal 16.60000000000000000000
84 float 16.80000000000000071054  decimal 16.80000000000000000000
86 float 17.19999999999999928946  decimal 17.20000000000000000000
87 float 17.40000000000000213163  decimal 17.40000000000000000000
88 float 17.60000000000000142109  decimal 17.60000000000000000000
89 float 17.80000000000000071054  decimal 17.80000000000000000000
91 float 18.19999999999999928946  decimal 18.20000000000000000000
92 float 18.40000000000000213163  decimal 18.40000000000000000000
93 float 18.60000000000000142109  decimal 18.60000000000000000000
94 float 18.80000000000000071054  decimal 18.80000000000000000000
96 float 19.20000000000000284217  decimal 19.20000000000000000000
97 float 19.40000000000000213163  decimal 19.40000000000000000000
98 float 19.60000000000000142109  decimal 19.60000000000000000000
99 float 19.80000000000000071054  decimal 19.80000000000000000000

My suggestion is to use the decimal module for uniform discretisation.

Steps/Code to Reproduce

#!/usr/bin/env python
import decimal
import fractions
import sys
from typing import NoReturn

import numpy
import sklearn.preprocessing


def test_sklearn_uniform_bug():
    # sample values
    values = numpy.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
    # expected quantised bin
    # right side of bin   <.2  <.2  <.4  <.4  <.6  <.6  <.8  <.8  <1   <1   <1
    expected =           [0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   4]

    # reshape to list of lists
    reshaped = numpy.reshape(values, shape=(-1, 1))
    qnt = sklearn.preprocessing.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')
    qnt.fit(reshaped)
    fitted = qnt.transform(reshaped)
    # reshape to list of bins
    quantised = numpy.reshape(fitted, shape=(-1))

    # check bin assignment
    for i in range(len(expected)):
        if quantised[i] != expected[i]:
            print(f'bin for {values[i]} {quantised[i]} != {expected[i]}')

    # check bin edges
    expected_bins = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
    bin_edges = qnt.bin_edges_[0]
    for i in range(len(expected_bins)):
        if expected_bins[i] != bin_edges[i]:
            print(f'bin edge {expected_bins[i]} != {bin_edges[i]}')

    # check floating point multiplication
    step = 1/5
    bin_3 = step * 3
    if bin_3 != 0.6:
        print(f'floating point multiplication {bin_3} != 0.6')
    else:
        print('floating point multiplication ok')

    # check fractions multiplication
    f_step = fractions.Fraction(1, 5)
    f_bin_3 = 3 * f_step
    if float(f_bin_3) != 0.6:
        print(f'fractions multiplication {f_bin_3} != 0.6')
    else:
        print('fractions multiplication ok')

    # check decimal multiplication
    d_step = decimal.Decimal(1) / decimal.Decimal(5)
    d_bin_3 = 3 * d_step
    if float(d_bin_3) != 0.6:
        print(f'decimal multiplication {d_bin_3} != 0.6')
    else:
        print('decimal multiplication ok')


def main(argv) -> NoReturn:
    m = 0
    try:
        test_sklearn_uniform_bug()
    except Exception as e:
        print(f'Exception: {e}')
    sys.exit(m)


if __name__ == '__main__':
    main(sys.argv[1:])

Expected Results

floating point multiplication ok
fractions multiplication ok
decimal multiplication ok

Actual Results

bin for 0.6 2.0 != 3
bin edge 0.6 != 0.6000000000000001
floating point multiplication 0.6000000000000001 != 0.6
fractions multiplication ok
decimal multiplication ok

Versions

System:
    python: 3.12.8 (main, Dec 30 2024, 15:10:22) [Clang 16.0.0 (clang-1600.0.26.6)]
executable: /Users/simonb/.pyenv/versions/nmc/bin/python
   machine: macOS-15.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.1
          pip: 25.0
   setuptools: 75.8.0
        numpy: 2.2.2
        scipy: 1.15.1
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.10.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 11
         prefix: libomp
       filepath: /Users/simonb/.pyenv/versions/3.12.8/envs/nmc/lib/python3.12/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.