Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Change how KNeighborsClassifier handles tiebreakers #22818

Copy link
Copy link
Open
@raymondj-pace

Description

@raymondj-pace
Issue body actions

Describe the workflow you want to enable

I see that KNeighborsClassifier handles tiebreakers by taking the mode of the classes contained with the tie. What about using the sum of the distances as the tiebreaker instead (of using the mode.)

I feel that taking the least sum of distances is a better educated guess with minimal changes to the code.

Describe your proposed solution

Create two dictionaries to keep a count of the Class count and the sum of Class distances.
If there is no tie return the Class value that won the vote (as normal).
If there is a tie return the Class that has the minimum sum of distances (instead of what scipy.stats.mode yields which I believe will always be the smallest Class value from the Classes within the tie.)

    dict_count = {}
    dict_dist = {}
    
    for i in range(num_neighbors):
        
        # Get the class from the i-th neighbor - here my Class value is at offset 13
        c = neighbors[i][13]
        
        # Keep a dictionary of Class counts
        if c in dict_count:
            dict_count[c] = dict_count[c] + 1
        else:
            dict_count[c] = 1
          
     # Keep a dictionary of total Class distances 
        if c in dict_dist:
            dict_dist[c] = dict_dist[c] + distances[i]
        else:
            dict_dist[c] = distances[i]
    
    # Return the key within the dictionary that has the largest value
    best_keys = [key for key, value in dict_count.items() if value == max(dict_count.values())]

    if len(best_keys) == 1:
        prediction = best_keys[0]

    else:   # There was a tie
        tie_dict = {}
        for b in best_keys:
            tie_dict[b] = dict_dist[b]
            
        prediction = min(tie_dict, key=tie_dict.get)

    return prediction

Describe alternatives you've considered, if relevant

Writing my own k-NN algorithm 🙁

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.