Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Create a similar class to KMeans that uses medians instead of means (KMedians) #22709

Copy link
Copy link
Open
@raymondj-pace

Description

@raymondj-pace
Issue body actions

Describe the workflow you want to enable

I would like a new class: sklearn.cluster.KMedians (or an option to sklearn.cluster.KMeans) that allows the methods to use medians instead of means.

K-n clustering can greatly improve some instances where there are outliers. Using the median minimizes outliers.

Describe your proposed solution

Create a new class sklearn.cluster.KMedians that works the same as sklearn.cluster.KMeans but instead uses median to compute the new centroids (instead of using mean)

Describe alternatives you've considered, if relevant

Using a version I wrote:

class mu_type(enum.IntEnum):
    mean = 1
    median = 2


#
# Euclidean distance between two vectors
#
def euclidean_distance(row1, row2):
    distance = 0.0
    for i in range(len(row1)):
        distance += (row1[i] - row2[i])**2
    return math.sqrt(distance)


#
# k-means and k-medians clustering
# mu defines if the algorithm runs as k-means or k-medians
#
def k_m_clustering_2(x, c, mu=mu_type.mean):
    
    d = [[0]*len(c) for i1 in range(len(x))]
    
    l = [0]*len(x)
    c_last = [0]*len(c)
    
    while c_last != c:
        
        # Save the last list of center points to compare for updates later
        c_last = c.copy()
        
        for i in range(len(x)):
            for j in range(len(c)):
                if DEBUG:
                    print('distance between: ', end='')
                    print(x[i], end='')
                    print(' and ', end='')
                    print(c[j], end='')
                    
                d[i][j] = euclidean_distance(x[i], c[j])
                
                if DEBUG:
                    print(' = ', end='')
                    print(d[i][j])
                    
            l[i] = d[i].index(min(d[i]))

        if DEBUG:
            print("L = ", end='')
            print(l)
            print()
            print()
        
        if DEBUG:
            print("Old center points: ", end='')
            print(c)
            print()

        #
        # Update center points
        #
        for i in range(len(c)):
    
            #
            # Compute mean or median for new center points
            #       
            if mu == mu_type.mean:
                count = 0
                _x = []
                _y = []
                for j in range(len(l)):
                    if l[j] == i:
                        count += 1
                        _x.append(x[j][0])
                        _y.append(x[j][1])
                c[i] = [(sum(_x)/count), (sum(_y)/count)]
                
            elif mu == mu_type.median:
                x_y = []
                for j in range(len(l)):
                    if l[j] == i:
                        x_y.append([x[j][0], x[j][1]])
                c[i] = median(x_y)
            
        if DEBUG:
            print("New center points: ", end='')
            print(c)
            print()
    
    if DEBUG:
        print("Center points have not changed\n")
        print("Final: ", end='')
        print(l)
        print()
    
    return c, l




# Call it:

DEBUG=True

c = [[2, 2], [3, 4], [6, 2]]
x = [[1, 2], [2, 1], [1, 3], [5, 4], [6, 3], [7, 2], [6, 1]]

c_arr, l_arr = k_m_clustering_2(x, c, mu_type.median)

for i in range(len(x)):
    print('x' + str(i+1) + ' = ' + str(l_arr[i]))
print('\n')

Output:

distance between: [1, 2] and [2, 2] = 1.0
distance between: [1, 2] and [3, 4] = 2.8284271247461903
distance between: [1, 2] and [6, 2] = 5.0
distance between: [2, 1] and [2, 2] = 1.0
distance between: [2, 1] and [3, 4] = 3.1622776601683795
distance between: [2, 1] and [6, 2] = 4.123105625617661
distance between: [1, 3] and [2, 2] = 1.4142135623730951
distance between: [1, 3] and [3, 4] = 2.23606797749979
distance between: [1, 3] and [6, 2] = 5.0990195135927845
distance between: [5, 4] and [2, 2] = 3.605551275463989
distance between: [5, 4] and [3, 4] = 2.0
distance between: [5, 4] and [6, 2] = 2.23606797749979
distance between: [6, 3] and [2, 2] = 4.123105625617661
distance between: [6, 3] and [3, 4] = 3.1622776601683795
distance between: [6, 3] and [6, 2] = 1.0
distance between: [7, 2] and [2, 2] = 5.0
distance between: [7, 2] and [3, 4] = 4.47213595499958
distance between: [7, 2] and [6, 2] = 1.0
distance between: [6, 1] and [2, 2] = 4.123105625617661
distance between: [6, 1] and [3, 4] = 4.242640687119285
distance between: [6, 1] and [6, 2] = 1.0
L = [0, 0, 0, 1, 2, 2, 2]

Old center points: [[2, 2], [3, 4], [6, 2]]

New center points: [[1, 3], [5, 4], [6, 3]]

distance between: [1, 2] and [1, 3] = 1.0
distance between: [1, 2] and [5, 4] = 4.47213595499958
distance between: [1, 2] and [6, 3] = 5.0990195135927845
distance between: [2, 1] and [1, 3] = 2.23606797749979
distance between: [2, 1] and [5, 4] = 4.242640687119285
distance between: [2, 1] and [6, 3] = 4.47213595499958
distance between: [1, 3] and [1, 3] = 0.0
distance between: [1, 3] and [5, 4] = 4.123105625617661
distance between: [1, 3] and [6, 3] = 5.0
distance between: [5, 4] and [1, 3] = 4.123105625617661
distance between: [5, 4] and [5, 4] = 0.0
distance between: [5, 4] and [6, 3] = 1.4142135623730951
distance between: [6, 3] and [1, 3] = 5.0
distance between: [6, 3] and [5, 4] = 1.4142135623730951
distance between: [6, 3] and [6, 3] = 0.0
distance between: [7, 2] and [1, 3] = 6.082762530298219
distance between: [7, 2] and [5, 4] = 2.8284271247461903
distance between: [7, 2] and [6, 3] = 1.4142135623730951
distance between: [6, 1] and [1, 3] = 5.385164807134504
distance between: [6, 1] and [5, 4] = 3.1622776601683795
distance between: [6, 1] and [6, 3] = 2.0
L = [0, 0, 0, 1, 2, 2, 2]

Old center points: [[1, 3], [5, 4], [6, 3]]

New center points: [[1, 3], [5, 4], [6, 3]]

Center points have not changed

Final: [0, 0, 0, 1, 2, 2, 2]

x1 = 0
x2 = 0
x3 = 0
x4 = 1
x5 = 2
x6 = 2
x7 = 2

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.